Analyzing Query Execution for Integrity Constraint Violation Detection
Resumo
Data consistency ensures the validity and integrity of data representing real-world entities. Denial constraints (DCs) generalize various integrity constraints, providing a powerful way to define rules that ensure data consistency. This work analyzes the capabilities of relational database management systems (RDBMSs) to detect DC violations in different metrics. We explore various SQL patterns for measuring DC violations and evaluate the performance of multiple RDBMSs with extensive experiments, highlighting potential performance improvements, choke points, and limitations when using them.
Referências
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I. F., Ouzzani, M., and Tang, N. (2013). Nadeef: a commodity data cleaning system. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, page 541–552, New York, NY, USA. Association for Computing Machinery.
Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. (2008). Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst., 33(2).
Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., and Boncz, P. (2018). Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. Proc. VLDB Endow., 11(13):2209–2222.
Livshits, E., Kochirgan, R., Tsur, S., Ilyas, I. F., Kimelfeld, B., and Roy, S. (2021). Properties of inconsistency measures for databases. In Proceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, page 1182–1194, New York, NY, USA. Association for Computing Machinery.
Neumann, T. and Freitag, M. J. (2020). Umbra: A disk-based system with in-memory performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. [link].
Pena, E. H. M., de Almeida, E. C., and Naumann, F. (2021). Fast detection of denial constraint violations. Proc. VLDB Endow., 15(4):859–871.
Pena, E. H. M., Porto, F., and Naumann, F. (2022). Fast algorithms for denial constraint discovery. Proc. VLDB Endow., 16(4):684–696.
Raasveldt, M. and Mühleisen, H. (2019). Duckdb: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, page 1981–1984, New York, NY, USA. Association for Computing Machinery.
Rekatsinas, T., Chu, X., Ilyas, I. F., and Ré, C. (2017). HoloClean: Holistic data repairs with probabilistic inference. Proc. VLDB Endow., 10(11):1190–1201.