Typed unknown values: a step towards solving the problem of representing missing information in relational databases
https://doi.org/10.15514/ISPRAS-2023-35(2)-6
Abstract
The state of affairs in the area of missing information management in relational databases leaves much to be desired. The SQL standard uses the universal null value to represent missing data, and the control is based on three-valued logic, in which the null value is identified with a third boolean value. This solution is conceptually inconsistent and often results in DBMS behavior that is not intuitive. An alternative approach using typed special values leaves all handling of missing data to users. In this article, we analyze the long history of research and development that led to this situation. We come to the conclusion that no other solution could have appeared in the SQL standard due to the choice of the mechanism of the universal null value more than 50 years ago, and the alternative mechanism cannot provide system support for special values due to the use of two-valued logic. We propose a combined approach using typed special values based on three-valued logic. This approach allows you to use the semantics of data types when processing queries with conditions that include unknown data. In addition, our approach allows us to define a full-fledged three-valued logic in which a special value of a Boolean type is the third boolean value.
About the Author
Sergey Dmitrievich KUZNETSOVRussian Federation
Doctor of Technical Sciences, Professor, Chief Researcher at ISP RAS, Professor at the Departments of System Programming of MSU, MIPT, and HSE
References
1. Codd E.F. Derivability, Redundancy and Consistency of Relations Stored in Large Data Banks. IBM Research Report RJ599 (# 12343), 1969. Reprinted at ACM SIGMOD Record, 2009, Vol. 38, No. 1, 2009, pp. 17-36. Имеется перевод на русский язык: Э.Ф. Кодд. Выводимость, избыточность и согласованность отношений, хранимых в крупных банках данных. URL: http://citforum.ru/database/classics/first_rel_paper/.
2. Codd E.F. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, Volume 13, Number 6, 1970, pp. 377-387. Имеется перевод на русский язык: Е.Ф. Кодд. Реляционная модель данных для больших совместно используемых банков данных. URL: http://citforum.ru/database/classics/codd/.
3. Codd E.F. Understanding Relations (Installment #7). Bulletin of ACM-SIGMOD: The Special Interest Group on Management of Data, vol. 7, no. 3-4, 1975, pp. 23-28.
4. Codd E.F. Implementation of Relational Data Base Management Systems (NCC 1975 Panel). Bulletin of ACM-SIGMOD: The Special Interest Group on Management of Data, vol. 7, no. 3-4, 1975, pp. 3-22.
5. Chamberlin D.D., Astrahan M.M. et al. SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control. IBM Journal of Research and Developoment, V. 20, No. 6, 1976, pp. 560-575. Имеется перевод на русский язык: Д.Д. Чамберлин, М.М. Астрахан, К.П. Эсваран, П.П. Грифитс, Р.А. Лори, Д.В. Мел, П. Райшер, Б.В. Вейд. SEQUEL 2: унифицированный подход к определению, манипулированию и контролю данных. URL: http://citforum.ru/database/classics/sequel_2/.
6. Codd E.F. Extending the Database Relational Model to Capture More Meaning. ACM Transactions on Database Systems, vol. 4, issue 4, 1979, pp. 397-434. Имеется перевод на русский язык: Э.Ф. Кодд. Расширение реляционной модели для лучшего отражения семантики. URL: http://citforum.ru/database/classics/codd_2/.
7. ISO/IEC 9075-2:1999. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation).
8. Codd E.F. The Relational Model for Database Management: Version 2. Addison-Wesley, 1990, 538 p.
9. Date C.J. The Default Values Approach to Missing Information. In C.J. Date (with Hugh Darven). Relational Database: Selected Writings 1989-1991. Addison-Wesley, 1992, pp. 343-354.
10. Date C.J. Faults and Defaults (in five parts). In C. J. Date (with Hugh Darven and David McGoveran). Relational Database: Selected Writings 1994-1997. Addison-Wesley, 1998, 608 p.
11. Date C.J. Hugh Darwen. Databases, Types and the Relational Model: The Third Manifesto. 3rd Edition. Addison-Wesley, 2006, 556 p.
12. Codd E.F. Missing information (applicable and inapplicable) in relational databases. ACM SIGMOD Record, vol. 15, issue 4, 1986, pp. 53-78.
13. Codd E.F. More commentary on missing information in relational databases (applicable and inapplicable information). ACM SIGMOD Record, vol. 16, issue 1, 1987, pp. 42-50.
14. ANSI X3.135-1986. Information Technology – Database Languages – SQL.
15. ISO 9075:1987. Information processing systems — Database language — SQL.
16. ISO/IEC 9075:1989. Information processing systems — Database Language SQL with integrity enhancement.
17. ISO/IEC 9075:1992. Information technology — Database languages — SQL.
18. ISO/IEC 9075-1:1999. Information technology — Database languages — SQL — Part 1: Framework (SQL/Framework).
19. ISO/IEC 9075-2:1999. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation).
20. ISO/IEC 9075-2:2003. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation).
21. ISO/IEC 9075-2:2008. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation)
22. ISO/IEC 9075-2:2011. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation)
23. ISO/IEC 9075-2:2016. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation)
24. Date C.J. A critique of the SQL database language. ACM SIGMOD Record, vol. 14, issue 30, 1984, pp 8-54.
25. Date C.J. Introduction to Database Systems, volume 2. Addison-Wesley, 1983, 383 p.
26. Date C.J. Null Values in Database Management. In C.J. Date. Relational Database: Selected Writings. Addison-Wesley, 1986, pp. 313-334.
27. Date C.J. NOT is Not “Not”! (Notes on Three-Valued Logic and Related Matters). In C.J. Date with a Special Contribution by Andrew Warden. Relational Database Writings 1985-1989. Addison-Wesley, 1990, pp. 217-248.
28. Date C.J., Darwen H. Foundation for Future Database Systems: The Third Manifesto. 2nd Edition. Addison-Wesley, 2000, 608 p.
29. Date C.J., Darwen H. Database Explorations: Essays on The Third Manifesto and Related Topics. Trafford Publishing, 2010, 548 p.
30. Darwen H. How to Handle Missing Information without Using NULL. Presentation Slides. Available at: https://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf, accessed 11.05.2023.
31. Darwen H., Smout E. How to Handle Missing Information Using S-by-C. Available at: https://www.dcs.warwick.ac.uk/~hugh/TTM/HTHMIUS-by-C-review-draft.pdf, accessed 11.05.2023.
32. Gessert G.H. Four Valued Logic for Relational Database Systems. ACM SIGMOD Record, vol. 19, issue 10, 1990, pp 29-35
33. Date C.J. Why Three- and Four-Valued Logic Don’t Work. In C.J. Date. Date on Database. Writings 2000–2006, Apress, 2012, pp. 329-342.
Review
For citations:
KUZNETSOV S.D. Typed unknown values: a step towards solving the problem of representing missing information in relational databases. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(2):73-100. (In Russ.) https://doi.org/10.15514/ISPRAS-2023-35(2)-6