Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Magic mirror in my hand, which is the best in the land?: an experimental evaluation of index selection algorithms

Published: 01 July 2020 Publication History

Abstract

Indexes are essential for the efficient processing of database workloads. Proposed solutions for the relevant and challenging index selection problem range from metadata-based simple heuristics, over sophisticated multi-step algorithms, to approaches that yield optimal results. The main challenges are (i) to accurately determine the effect of an index on the workload cost while considering the interaction of indexes and (ii) a large number of possible combinations resulting from workloads containing many queries and massive schemata with possibly thousands of attributes.
In this work, we describe and analyze eight index selection algorithms that are based on different concepts and compare them along different dimensions, such as solution quality, runtime, multi-column support, solution granularity, and complexity. In particular, we analyze the solutions of the algorithms for the challenging analytical Join Order, TPC-H, and TPC-DS benchmarks. Afterward, we assess strengths and weaknesses, infer insights for index selection in general and each approach individually, before we give recommendations on when to use which approach.

References

[1]
S. Agrawal, S. Chaudhuri, L. Kollár, A. P. Marathe, V. R. Narasayya, and M. Syamala. Database tuning advisor for microsoft SQL server 2005. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 1110--1121, 2004.
[2]
D. Basu, Q. Lin, W. Chen, H. T. Vo, Z. Yuan, P. Senellart, and S. Bressan. Cost-model oblivious database tuning with reinforcement learning. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA), pages 253--268, 2015.
[3]
R. Borovica, I. Alagiannis, and A. Ailamaki. Automated physical designers: what you see is (not) what you get. In Proceedings of the International Workshop on Testing Database Systems (DBTEST), 2012.
[4]
N. Bruno and S. Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 227--238, 2005.
[5]
N. Bruno and S. Chaudhuri. An online approach to physical design tuning. In Proceedings of the International Conference on Data Engineering (ICDE), pages 826--835, 2007.
[6]
N. Bruno and R. V. Nehme. Configuration-parametric query optimization for physical design tuning. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 941--952, 2008.
[7]
A. Caprara, M. Fischetti, and D. Maio. Exact and approximate algorithms for the index selection problem in physical database design. IEEE Transactions on Knowledge and Data Engineering (TKDE), 7(6):955--967, 1995.
[8]
A. Caprara and J. J. Salazar González. A branch-and-cut algorithm for a generalization of the uncapacitated facility location problem. TOP, 4:135--163, 1996.
[9]
S. Chaudhuri and V. Narasayya. Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server. https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server, visited 2020-06-04, June 2020.
[10]
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 146--155, 1997.
[11]
S. Chaudhuri and V. R. Narasayya. Index merging. In Proceedings of the International Conference on Data Engineering (ICDE), pages 296--303, 1999.
[12]
S. Chaudhuri and V. R. Narasayya. Self-tuning database systems: A decade of progress. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 3--14, 2007.
[13]
S. Das, M. Grbic, I. Ilic, I. Jovandic, A. Jovanovic, V. R. Narasayya, M. Radulovic, M. Stikic, G. Xu, and S. Chaudhuri. Automatically indexing millions of databases in microsoft azure SQL database. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 666--679, 2019.
[14]
D. Dash, N. Polyzotis, and A. Ailamaki. Cophy: A scalable, portable, and interactive index advisor for large workloads. PVLDB, 4(6):362--372, 2011.
[15]
B. Ding, S. Das, R. Marcus, W. Wu, S. Chaudhuri, and V. R. Narasayya. AI meets AI: leveraging query executions to improve index recommendations. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 1241--1258, 2019.
[16]
G. Farley and S. A. Schuster. Query execution and index selection for relational data bases. In Proceedings of the International Conference on Very Large Databases (VLDB), page 519, 1975.
[17]
M. Faust, M. Boissier, M. Keller, D. Schwalb, H. Bischoff, K. Eisenreich, F. Färber, and H. Plattner. Footprint reduction and uniqueness enforcement with hash indices in SAP HANA. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA), pages 137--151, 2016.
[18]
S. J. Finkelstein, M. Schkolnick, and P. Tiberio. Physical database design for relational databases. ACM Transactions on Database Systems (TODS), 13(1):91--128, 1988.
[19]
F. Fotouhi and C. E. Galarce. Genetic algorithms and the search for optimal database index selection. In Proceedings of the First Great Lakes Computer Science Conference, pages 249--255, 1989.
[20]
R. Fourer, D. Gay, and B. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Thomson/Brooks/Cole, 2003.
[21]
M. R. Frank, E. Omiecinski, and S. B. Navathe. Adaptive and automated index selection in RDBMS. In Proceedings of the International Conference on Extending Database Technology (EDBT), pages 277--292, 1992.
[22]
G. Graefe. B-tree indexes for high update rates. SIGMOD Record, 35(1):39--44, 2006.
[23]
M. Hammer and A. Chan. Index selection in a self-adaptive data base management system. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 1--8, 1976.
[24]
M. Y. L. Ip, L. V. Saxton, and V. V. Raghavan. On the selection of an optimal set of indexes. IEEE Transactions on Software Engineering, 9(2):135--143, 1983.
[25]
A. Kane. Dexter - The automatic indexer for Postgres, June 2017. https://github.com/ankane/dexter, visited 2020-06-04.
[26]
A. Kane. Introducing Dexter, the Automatic Indexer for Postgres, June 2017. https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27, visited 2020-06-04.
[27]
M. S. Kester, M. Athanassoulis, and S. Idreos. Access path selection in main-memory optimized data systems: Should I scan or should I probe? In Proceedings of the International Conference on Management of Data (SIGMOD), pages 715--730, 2017.
[28]
V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.
[29]
V. Y. Lum and H. Ling. An optimization problem on the selection of secondary keys. In Proceedings of the 1971 26th Annual Conference (ACM '71), pages 349--356, 1971.
[30]
R. C. Marcus and O. Papaemmanouil. Plan-structured deep neural network models for query performance prediction. PVLDB, 12(11):1733--1746, 2019.
[31]
Microsoft. SQL Server 2019 - Database Engine Tuning Advisor, January 2017. https://docs.microsoft.com/en-us/sql/relational-databases/performance/database-engine-tuning-advisor?view=sql-server-ver15, visited 2020-06-04.
[32]
R. O. Nambiar and M. Poess. The making of TPC-DS. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 1049--1058, 2006.
[33]
F. P. Palermo. A quantitative approach to the selection of secondary indexes. In IBM Research RJ 730, 1970.
[34]
S. Papadomanolakis and A. Ailamaki. An integer linear programming approach to database design. In Proceedings of the International Conference on Data Engineering (ICDE), pages 442--449, 2007.
[35]
S. Papadomanolakis, D. Dash, and A. Ailamaki. Efficient use of the query optimizer for automated database design. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 1093--1104, 2007.
[36]
G. Piatetsky-Shapiro. The optimal selection of secondary indices is NP-complete. SIGMOD Record, 13(2):72--75, 1983.
[37]
M. Pöss and C. Floyd. New TPC benchmarks for decision support and web commerce. SIGMOD Record, 29(4):64--71, 2000.
[38]
J. Rouhaud. HypoPG - Hypothetical Indexes for PostgreSQL, March 2015. https://github.com/HypoPG/hypopg, visited 2020-06-04.
[39]
Z. Sadri, L. Gruenwald, and E. Leal. Online index selection using deep reinforcement learning for a cluster database. In Proceedings of the International Conference on Data Engineering Workshops (ICDEW), pages 158--161, 2020.
[40]
K. Sattler, I. Geist, and E. Schallehn. QUIET: continuous query-driven index tuning. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 1129--1132, 2003.
[41]
M. Schkolnick. The optimal selection of secondary indices for files. Information Systems (IS), 1(4):141--146, 1975.
[42]
R. Schlosser and S. Halfpap. A decomposition approach for risk-averse index selection. In Proceedings of the 32th International Conference on Scientific and Statistical Database Management (SSDBM), forthcoming, 2020.
[43]
R. Schlosser, J. Kossmann, and M. Boissier. Efficient scalable multi-attribute index selection using recursive strategies. In Proceedings of the International Conference on Data Engineering (ICDE), pages 1238--1249, 2019.
[44]
K. Schnaitter and N. Polyzotis. A benchmark for online index selection. In Proceedings of the International Conference on Data Engineering (ICDE), pages 1701--1708, 2009.
[45]
K. Schnaitter, N. Polyzotis, and L. Getoor. Index interactions in physical design tuning: Modeling, analysis, and applications. PVLDB, 2(1):1234--1245, 2009.
[46]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 23--34, 1979.
[47]
A. Sharma, F. M. Schuhknecht, and J. Dittrich. The case for automatic database administration using deep reinforcement learning. CoRR, abs/1801.05643, 2018.
[48]
G. Valentin, M. Zuliani, D. C. Zilio, G. M. Lohman, and A. Skelley. DB2 advisor: An optimizer smart enough to recommend its own indexes. In Proceedings of the International Conference on Data Engineering (ICDE), pages 101--110, 2000.
[49]
K. Whang. Index selection in relational databases. In Proceedings of the International Conference on Foundations of Data Organization (FoDO), pages 487--500, 1985.
[50]
W. Wu, Y. Chi, S. Zhu, J. Tatemura, H. Hacigümüs, and J. F. Naughton. Predicting query execution time: Are optimizer cost models really unusable? In Proceedings of the International Conference on Data Engineering (ICDE), pages 1081--1092, 2013.
[51]
H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 1567--1581, 2016.
[52]
D. C. Zilio, J. Rao, S. Lightstone, G. M. Lohman, A. J. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design advisor: Integrated automatic physical database design. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 1087--1097, 2004.

Cited By

View all
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Leveraging Dynamic and Heterogeneous Workload Knowledge to Boost the Performance of Index AdvisorsProceedings of the VLDB Endowment10.14778/3654621.365463117:7(1642-1654)Online publication date: 1-Mar-2024
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)13
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Leveraging Dynamic and Heterogeneous Workload Knowledge to Boost the Performance of Index AdvisorsProceedings of the VLDB Endowment10.14778/3654621.365463117:7(1642-1654)Online publication date: 1-Mar-2024
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 1-Mar-2024
  • (2024)PilotScope: Steering Databases with Machine Learning DriversProceedings of the VLDB Endowment10.14778/3641204.364120917:5(980-993)Online publication date: 1-Jan-2024
  • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
  • (2024)IA2Proceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655839(10-17)Online publication date: 22-Apr-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)Wred: Workload Reduction for Scalable Index TuningProceedings of the ACM on Management of Data10.1145/36393052:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Robustness of Updatable Learning-based Index Advisors against Poisoning AttackProceedings of the ACM on Management of Data10.1145/36392652:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media