Mind your dependencies for semantic query optimization
DOI:
https://doi.org/10.5753/jidm.2018.1633Keywords:
Data profiling, Functional dependencies, Integrity constraints, Query optimizationAbstract
Semantic query optimization uses dependencies between attributes to formulate query transformations and revise the number of processed rows, with direct impact on performance. Commercial databases present facilities to define dependencies as not enforced constraints. The goal is to help the query optimizer in cases where the database is denormalized or simply lost dependencies in the design. However, feeding these facilities is a manual task which is tedious and error-prone. An attractive alternative is the automatic discovery of dependencies, but the cost of finding dependencies increases with the number of rows and attributes in the dataset. In this paper, we stick to the automatic discovery approach, but to reduce the cost we focus on dependencies matching the current queries in the pipe (ie., workload). Initially, we rely on a large set of functional dependencies computed in batch with state of the art algorithms in the literature. Over time our focused dependency selector (FDSel) chooses exemplars to feed the query optimizer. Therewith we eliminate further manual interactions. The automatically selected exemplars exhibit statistical properties that resemble those of the initial dependency set. This demonstrates the effectiveness of our proposed approach. In the best case scenario, by applying the FDSel for join elimination on a real-world database, we reduce query response time by more than one order of magnitude.
Downloads
References
Abadi, D. J., Madden, S., and Hachem, N. Column-stores vs. row-stores: how different are they really? In SIGMOD 2008, Vancouver, BC, Canada. pp. 967–980, 2008.
Abedjan, Z., Akcora, C. G., Ouzzani, M., Papotti, P., and Stonebraker, M. Temporal rules discovery for web data cleaning. Proc. VLDB Endow. 9 (4): 336–347, Dec., 2015.
Abedjan, Z., Golab, L., and Naumann, F. Profiling relational data: A survey. The VLDB Journal 24 (4): 557–581, Aug., 2015.
Amavi, J. and Halfeld Ferrari, M. pp. 83–113. In A. Hameurlain, J. Küng, and R. Wagner (Eds.), Maximal Set of XML Functional Dependencies for the Integration of Multiple Systems. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 83–113, 2014.
Barahmand, S. and Ghandeharizadeh, S. D-zipfian: A decentralized implementation of zipfian. In Proceedings of the Sixth International Workshop on Testing Database Systems. DBTest ’13. ACM, New York, NY, USA, pp. 6:1–6:6, 2013.
Basak, J., Wadhwani, K., and Voruganti, K. Storage workload identification. Trans. Storage 12 (3): 14:1–14:30, May, 2016.
Beeri, C., Dowd, M., Fagin, R., and Statman, R. On the structure of armstrong relations for functional dependencies. J. ACM 31 (1): 30–46, Jan., 1984.
Chakravarthy, U. S., Grant, J., and Minker, J. Logic-based approach to semantic query optimization. ACM Trans. Database Syst. 15 (2): 162–207, June, 1990.
Cheng, Q., Gryz, J., Koo, F., Leung, T. Y. C., Liu, L., Qian, X., and Schiefer, K. B. Implementation of two semantic query optimization techniques in db2 universal database. In 25th VLDB. VLDB ’99. pp. 687–698, 1999.
Chiang, F. and Miller, R. J. Discovering data quality rules. Proc. VLDB Endow. 1 (1): 1166–1177, Aug., 2008.
Chu, X., Ilyas, I. F., and Papotti, P. Discovering denial constraints. Proc. VLDB Endow. 6 (13): 1498–1509, Aug., 2013.
Demetrovics, J., Katona, G. O. H., Miklós, D., and Thalheim, B. On the number of independent functional dependencies. In 4th FoIKS. FoIKS’06. pringer-Verlag, Berlin, Heidelberg, pp. 83–91, 2006.
Frey, B. J. and Dueck, D. Clustering by passing messages between data points. Science vol. 315, pp. 972–976, 2007.
Hammer, M. and Zdonik, S. B. Knowledge-based query processing. In Proceedings of the Sixth International Conference on Very Large Data Bases - Volume 6. VLDB ’80. VLDB Endowment, pp. 137–147, 1980.
Hazewinkel, M. Encyclopaedia of Mathematics (1). Springer, 1987.
Ilyas, I. F., Markl, V., Haas, P., Brown, P., and Aboulnaga, A. Cords: Automatic discovery of correlations and soft functional dependencies. In SIGMOD. SIGMOD ’04. ACM, New York, NY, USA, pp. 647–658, 2004.
Kimura, H., Huo, G., Rasin, A., Madden, S., and Zdonik, S. B. Correlation maps: A compressed access method for exploiting soft functional dependencies. Proc. VLDB Endow. 2 (1): 1222–1233, Aug., 2009.
Kivinen, J. and Mannila, H. Approximate inference of functional dependencies from relations. Theoretical Computer Science 149 (1): 129 – 149, 1995.
Laurent, D. and Spyratos, N. Rewriting aggregate queries using functional dependencies. In International Conference on Management of Emergent Digital EcoSystems. ACM, New York, NY, USA, pp. 40–47, 2011.
Liu, J., Li, J., Liu, C., and Chen, Y. Discover dependencies from data - a review. IEEE Trans. on Knowl. and Data Eng. 24 (2): 251–264, Feb., 2012.
Liu, Y., Liu, H., Xiao, D., and Eltabakh, M. Y. Adaptive correlation exploitation in big data query optimization. The VLDB Journal 27 (6): 873–898, Dec, 2018.
Mazuran, M., Quintarelli, E., Tanca, L., and Ugolini, S. Semi-automatic support for evolving functional dependencies. In Proceedings of the 19th EDBT, Bordeaux, France, 2016. pp. 293–304, 2016.
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., and Naumann, F. Functional dependency discovery: An experimental evaluation of seven algorithms. Proc. VLDB Endow. 8 (10): 1082–1093, June, 2015.
Papenbrock, T. and Naumann, F. A hybrid approach to functional dependency discovery. In SIGMOD. SIGMOD ’16. ACM, New York, NY, USA, pp. 821–833, 2016.
Papenbrock, T. and Naumann, F. Data-driven schema normalization. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017. pp. 342–353, 2017.
Paulley, G. N. and Larson, P.-A. Exploiting uniqueness in query optimization. In CASCON First Decade High Impact Papers. CASCON ’10. IBM Corp., Riverton, NJ, USA, pp. 127–145, 2010.
Possamai, C. L. B., Pasqualin, D., Weingaertner, D., Todt, E., Castilho, M. A., de Bona, L. C. E., and de Almeida, E. C. Proinfodata: Monitoring a large park of computational laboratories. In Open Source Software: Mobile Open Source Technologies, L. Corral, A. Sillitti, G. Succi, J. Vlasenko, and A. I. Wasserman (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 226–229, 2014.
Romero, O., Calvanese, D., Abelló, A., and Rodríguez-Muro, M. Discovering functional dependencies for multidimensional design. In Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP. DOLAP ’09. ACM, New York, NY, USA, pp. 1–8, 2009.
Shekhar, S., Srivastava, J., and Dutta, S. A formal model of trade-off between optimization and execution costs in semantic query optimization. In 14th VLDB. VLDB ’88. San Francisco, CA, USA, pp. 457–467, 1988.
Simmen, D., Shekita, E., and Malkemus, T. Fundamental techniques for order optimization. SIGMOD Rec. 25 (2): 57–67, June, 1996.
Szlichta, J., Godfrey, P., Gryz, J., and Zuzarte, C. Expressiveness and complexity of order dependencies. PVLDB vol. 6, pp. 1858–1869, 2013.
Zaharioudakis, M., Cochrane, R., Lapis, G., Pirahesh, H., and Urata, M. Answering complex sql queries using automatic summary tables. SIGMOD Rec. 29 (2): 105–116, May, 2000.