Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Possible and certain SQL keys

Published: 01 July 2015 Publication History

Abstract

Driven by the dominance of the relational model, the requirements of modern applications, and the veracity of data, we revisit the fundamental notion of a key in relational databases with NULLs. In SQL database systems primary key columns are NOT NULL by default. NULL columns may occur in unique constraints which only guarantee uniqueness for tuples which do not feature null markers in any of the columns involved, and therefore serve a different function than primary keys. We investigate the notions of possible and certain keys, which are keys that hold in some or all possible worlds that can originate from an SQL table, respectively. Possible keys coincide with the unique constraint of SQL, and thus provide a semantics for their syntactic definition in the SQL standard. Certain keys extend primary keys to include NULL columns, and thus form a sufficient and necessary condition to identify tuples uniquely, while primary keys are only sufficient for that purpose. In addition to basic characterization, axiomatization, and simple discovery approaches for possible and certain keys, we investigate the existence and construction of Armstrong tables, and describe an indexing scheme for enforcing certain keys. Our experiments show that certain keys with NULLs do occur in real-world databases, and that related computational problems can be solved efficiently. Certain keys are therefore semantically well-founded and able to maintain data quality in the form of Codd's entity integrity rule while handling the requirements of modern applications, that is, higher volumes of incomplete data from different formats.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
S. Abiteboul and V. Vianu. Transactions and integrity constraints. In PODS, pages 193--204, 1985.
[3]
M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In SIGMOD, pages 68--79, 1999.
[4]
P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Bolt-on causal consistency. In SIGMOD, pages 761--772, 2013.
[5]
C. Beeri, M. Dowd, R. Fagin, and R. Statman. On the structure of Armstrong relations for functional dependencies. J. ACM, 31(1):30--46, 1984.
[6]
L. E. Bertossi. Database Repairing and Consistent Query Answering. Morgan & Claypool Publishers, 2011.
[7]
J. Biskup. Security in Computing Systems - Challenges, Approaches and Solutions. Springer, 2009.
[8]
P. Brown and S. Link. Probabilistic keys for data quality management. In CAiSE, pages 118--132, 2015.
[9]
A. Calì, D. Calvanese, and M. Lenzerini. Data integration under integrity constraints. In Seminal Contributions to Information Systems Engineering, pages 335--352. 2013.
[10]
D. Calvanese, W. Fischl, R. Pichler, E. Sallinger, and M. Simkus. Capturing relational schemas and functional dependencies in RDFS. In AAAI, pages 1003--1011, 2014.
[11]
E. F. Codd. The Relational Model for Database Management, Version 2. Addison-Wesley, 1990.
[12]
S. Doherty. The future of enterprise data. http://insights.wired.com/profiles/blogs/the-future-of-enterprise-data#axzz2owCB8FFn, 2013.
[13]
T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput., 24(6):1278--1304, 1995.
[14]
R. Fagin. A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst., 6(3):387--415, 1981.
[15]
R. Fagin. Horn clauses and database dependencies. J. ACM, 29(4):952--985, 1982.
[16]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005.
[17]
W. Fan, F. Geerts, and X. Jia. A revival of constraints for data cleaning. PVLDB, 1(2):1522--1523, 2008.
[18]
S. Hartmann, M. Kirchberg, and S. Link. Design by example for SQL table definitions with functional dependencies. VLDB J., 21(1):121--144, 2012.
[19]
S. Hartmann, U. Leck, and S. Link. On Codd families of keys over incomplete relations. Comput. J., 54(7):1166--1180, 2011.
[20]
S. Hartmann and S. Link. Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst., 34(2), 2009.
[21]
S. Hartmann and S. Link. The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst., 37(2):13, 2012.
[22]
A. Heise, Jorge-Arnulfo, Quiane-Ruiz, Z. Abedjan, A. Jentzsch, and F. Naumann. Scalable discovery of unique column combinations. PVLDB, 7(4):301--312, 2013.
[23]
I. Ileana, B. Cautis, A. Deutsch, and Y. Katsis. Complete yet practical search for minimal query reformulations under constraints. In SIGMOD, pages 1015--1026, 2014.
[24]
I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: automatic discovery of correlations and soft functional dependencies. In SIGMOD, pages 647--658, 2004.
[25]
A. K. Jha, V. Rastogi, and D. Suciu. Query evaluation with soft keys. In PODS, pages 119--128, 2008.
[26]
H. Köhler, U. Leck, S. Link, and H. Prade. Logical foundations of possibilistic keys. In JELIA, pages 181--195, 2014.
[27]
H. Köhler, U. Leck, S. Link, and X. Zhou. Possible and certain SQL keys. CDMTCS-452, The University of Auckland, 2013.
[28]
P. Koutris and J. Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17--29, 2015.
[29]
M. Levene and G. Loizou. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci., 206(1-2):283--300, 1998.
[30]
M. Levene and G. Loizou. A generalisation of entity and referential integrity. ITA, 35(2):113--127, 2001.
[31]
Y. E. Lien. On the equivalence of database models. J. ACM, 29(2):333--362, 1982.
[32]
J. Liu, J. Li, C. Liu, and Y. Chen. Discover dependencies from data - A review. IEEE TKDE, 24(2):251--264, 2012.
[33]
H. Mannila and K.-J. Räihä. Design of Relational Databases. Addison-Wesley, 1992.
[34]
J. Melton. ISO/IEC 9075-2: 2003 (SQL/foundation). ISO standard, 2003.
[35]
F. Naumann. Data profiling revisited. SIGMOD Record, 42(4):40--49, 2013.
[36]
R. Pochampally, A. D. Sarma, X. L. Dong, A. Meliou, and D. Srivastava. Fusing data with correlations. In SIGMOD, pages 433--444, 2014.
[37]
K. A. Ross, D. Srivastava, and S. Sudarshan. Materialized view maintenance and integrity constraint checking: Trading space for time. In SIGMOD, pages 447--458, 1996.
[38]
B. Saha and D. Srivastava. Data quality: The other face of big data. In ICDE, pages 1294--1297, 2014.
[39]
T. J. Schaefer. The complexity of satisfiability problems. In STOC, pages 216--226, 1978.
[40]
Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald. GORDIAN: Efficient and scalable discovery of composite keys. In VLDB, pages 691--702, 2006.
[41]
B. Thalheim. On semantic issues connected with keys in relational databases permitting null values. Elektr. Informationsverarb. Kybern., 25(1/2):11--20, 1989.
[42]
C. Zaniolo. Database relations with null values. J. Comput. System Sci., 28(1):142--166, 1984.
[43]
K. Zellag and B. Kemme. Consad: a real-time consistency anomalies detector. In SIGMOD, pages 641--644, 2012.

Cited By

View all
  • (2022)Data dependencies for query optimization: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00676-331:1(1-22)Online publication date: 1-Jan-2022
  • (2022)Approximate Keys and Functional Dependencies in Incomplete Databases with Limited DomainsFoundations of Information and Knowledge Systems10.1007/978-3-031-11321-5_9(147-167)Online publication date: 20-Jun-2022
  • (2021)PatchIndex: exploiting approximate constraints in distributed databasesDistributed and Parallel Databases10.1007/s10619-021-07326-139:3(833-853)Online publication date: 1-Sep-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 11
July 2015
264 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2015
Published in PVLDB Volume 8, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Data dependencies for query optimization: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00676-331:1(1-22)Online publication date: 1-Jan-2022
  • (2022)Approximate Keys and Functional Dependencies in Incomplete Databases with Limited DomainsFoundations of Information and Knowledge Systems10.1007/978-3-031-11321-5_9(147-167)Online publication date: 20-Jun-2022
  • (2021)PatchIndex: exploiting approximate constraints in distributed databasesDistributed and Parallel Databases10.1007/s10619-021-07326-139:3(833-853)Online publication date: 1-Sep-2021
  • (2020)A Statistical Perspective on Discovering Functional Dependencies in Noisy DataProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389749(861-876)Online publication date: 11-Jun-2020
  • (2019)Discovery and ranking of embedded uniqueness constraintsProceedings of the VLDB Endowment10.14778/3358701.335870312:13(2339-2352)Online publication date: 1-Sep-2019
  • (2018)Efficient discovery of approximate dependenciesProceedings of the VLDB Endowment10.14778/3192965.319296811:7(759-772)Online publication date: 1-Mar-2018
  • (2017)Probabilistic KeysIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.263334229:3(670-682)Online publication date: 1-Mar-2017
  • (2017)Inclusion dependencies and their interaction with functional dependencies in SQLJournal of Computer and System Sciences10.1016/j.jcss.2016.11.00485:C(104-131)Online publication date: 1-May-2017
  • (2017)Open dataInternational Journal of Information Management: The Journal for Information Professionals10.1016/j.ijinfomgt.2017.01.00337:3(150-154)Online publication date: 1-Jun-2017
  • (2016)The Information Systems Group at HPIACM SIGMOD Record10.1145/3003665.300367845:2(63-68)Online publication date: 28-Sep-2016
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media