Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Towards instance-optimized data systems

Published: 01 July 2021 Publication History

Abstract

In recent years, we have seen increased interest in applying machine learning to system problems. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, and sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, these techniques will allow us to build "instance-optimized" systems: that is, systems that self-adjust to a given workload and data distribution to provide unprecedented performance without the need for tuning by an administrator. While many of these techniques promise orders-of-magnitude better performance in lab settings, there is still general skepticism about how practical the current techniques really are.
The following is intended as a progress report on ML for Systems and its readiness for real-world deployments, with a focus on our projects done as part of the Data Systems and AI Lab (DSAIL) at MIT By no means is it a comprehensive overview of all existing work, which has been steadily growing over the past several years not only in the database community but also in the systems, networking, theory, PL, and many other adjacent communities.

References

[1]
Hussam Abu-Libdeh, Deniz Altinbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou Li, Andy Ly, and Christopher Olston. 2020. Learned Indexes for a Google-scale Disk-based Database. In Proceedings of the Workshop on ML for Systems at NeurIPS.
[2]
Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt. 496--505.
[3]
Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004. ACM, 359--370.
[4]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017. ACM, 1009--1024.
[5]
Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Billian, and Andrew Pavlo. 2021. An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems. Proc. VLDB Endow. 14, 7 (2021), 1241--1253.
[6]
Abdullah Al-Mamun, Hao Wu, and Walid G. Aref. 2020. A Tutorial on Learned Multi-dimensional Indexes. In SIGSPATIAL '20: 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, November 3--6, 2020. ACM, 1--4.
[7]
Christos Anagnostopoulos and Peter Triantafillou. 2015. Learning Set Cardinality in Distance Nearest Neighbours. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM) (ICDM '15). IEEE Computer Society, USA, 691--696.
[8]
C. Anagnostopoulos and P. Triantafillou. 2015. Learning to Accurately COUNT with Query-Driven Predictive Analytics. In 2015 IEEE International Conference on Big Data (Big Data) (Big Data '15). 14--23.
[9]
Christos Anagnostopoulos and Peter Triantafillou. 2017. Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality. ACM Trans. Knowl. Discov. Data 11, 4 (June 2017), 47:1--47:46.
[10]
Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus, and Umar Farooq Minhas. 2021. Towards a Benchmark for Learned Systems. In 37th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2021, Chania, Greece, April 19--22, 2021. IEEE, 127--133.
[11]
Matthias Brantner, Daniela Florescu, David A. Graf, Donald Kossmann, and Tim Kraska. 2008. Building a database on S3. In Proceedings of the SIGMOD. ACM, 251--264.
[12]
Lujing Cen, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. LEA: A Learned Encoding Advisor for Column Stores. In aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021. ACM, 32--35.
[13]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25--29, 1997, Athens, Greece. Morgan Kaufmann, 146--155. http://www.vldb.org/conf/1997/P146.PDF
[14]
Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA.ACM Press, 367--378.
[15]
Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23--27, 2007. ACM, 3--14. http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf
[16]
Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2021/papers/cidr2021_paper20.pdf
[17]
Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism:A Workload-Driven Approach to Database Replication and Partitioning. PVLDB 3, 1 (2010), 48--57.
[18]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 155--171. https://www.usenix.org/conference/osdi20/presentation/dai
[19]
Zhenwei Dai and Anshumali Shrivastava. 2020. Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/86b94dae7c6517ec1ac767fd2c136580-Abstract.html
[20]
Biplob K. Debnath, David J. Lilja, and Mohamed F. Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, April 7--12, 2008, Cancún, Mexico. IEEE Computer Society, 11--18.
[21]
Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4--7, 2005, Online Proceedings. www.cidrdb.org, 84--94. http://cidrdb.org/cidr2005/papers/P07.pdf
[22]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30-July 5, 2019. ACM, 1241--1258.
[23]
Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 418--431.
[24]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 969--984.
[25]
Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. Proc. VLDB Endow. 14, 2 (2020), 74--86.
[26]
DSAIL. 2021. Data System and AI Lab. http://dsail.csail.mit.edu/.
[27]
DSAIL. 2021. (Learned Index Leaderboard. https://learnedsystems.github.io/SOSDLeaderboard/leaderboard/. [Online; accessed 7-July-2021].
[28]
DSAIL. 2021. ML for Systems Papers. http://dsg.csail.mit.edu/mlforsystems/papers/.
[29]
Elbert Du, Franklyn Wang, and Michael Mitzenmacher. 2021. Putting the "Learning" into Learning-Augmented Algorithms for Frequency Estimation. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139). PMLR, 2860--2869. http://proceedings.mlr.press/v139/du21d.html
[30]
Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with iTuned. Proc. VLDB Endow. 2, 1 (2009), 1246--1257.
[31]
Martin Eppert, Philipp Fent, and Thomas Neumann. 2021. A Tailored Regression for Learned Indexes: Logarithmic Error Regression. In aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021. ACM, 9--15.
[32]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4 (2003), 614--656.
[33]
Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2020. Why Are Learned Indexes So Effective?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 3123--3132. http://proceedings.mlr.press/v119/ferragina20a.html
[34]
Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2021. On the performance of learned data structures. Theor. Comput. Sci. 871 (2021), 107--120.
[35]
Paolo Ferragina and Giorgio Vinciguerra. 2020. Learned Data Structures. In Recent Trends in Learning From Data. Springer International Publishing, 5--41.
[36]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proceedings of the VLDB Endowment 13, 8 (April 2020), 1162--1175.
[37]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 1189--1206.
[38]
Vahid Ghadakchi, Mian Xie, and Arash Termehchy. 2020. Bandit join: preliminary results. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2020, Portland, Oregon, USA, June 19, 2020. ACM, 1:1--1:4.
[39]
Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman P. Amarasinghe, Joshua B. Tenenbaum, and Tim Mattson. 2018. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2018, Philadelphia, PA, USA, June 18--22, 2018. ACM, 69--80.
[40]
Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1997. Index Selection for OLAP. In Proceedings of the Thirteenth International Conference on Data Engineering, April 7--11, 1997, Birmingham, UK. IEEE Computer Society, 208--219.
[41]
Yaniv Gur, Dongsheng Yang, Frederik Stalschus, and Berthold Reinwald. 2021. Adaptive Multi-Model Reinforcement Learning for Online Database Tuning. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021. OpenProceedings.org, 439--444.
[42]
Ali Hadian and Thomas Heinis. 2019. Considerations for handling updates in learned index structures. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019. ACM, 3:1--3:4.
[43]
Rojeh Hayek and Oded Shmueli. 2019. Improved Cardinality Estimation by Learning Queries Containment Rates. arXiv:1908.07723 [cs] (Aug. 2019). arXiv:1908.07723 [cs]
[44]
Shunsuke Higuchi, Junji Takemasa, Yuki Koizumi, Atsushi Tagami, and Toru Hasegawa. 2021. Feasibility of Longest Prefix Matching Using Learned Index Structures. SIGMETRICS Perform. Eval. Rev. 48, 4 (May 2021), 45--48.
[45]
Benjamin Hilprecht and Carsten Binnig. 2021. One Model to Rule them All: Towards Zero-Shot Learning for Databases. CoRR abs/2105.00642 (2021). arXiv:2105.00642 https://arxiv.org/abs/2105.00642
[46]
Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, and Tim Kraska. 2019. LISA: Towards Learned DNA Sequence Search. CoRR abs/1910.04728 (2019). arXiv:1910.04728 http://arxiv.org/abs/1910.04728
[47]
Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 2019. Learning-Based Frequency Estimation Algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=r1lohoCqY7
[48]
Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p143-idreos-cidr19.pdf
[49]
Stratos Idreos and Tim Kraska. 2019. From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 2054--2059.
[50]
Stratos Idreos, Kostas Zoumpatianos, Subarna Chatterjee, Wilson Qin, Abdul Wasay, Brian Hentschel, Mike S. Kester, Niv Dayan, Demi Guo, Minseo Kang, and Yiyou Sun. 2019. Learning Data Structure Alchemy. IEEE Data Eng. Bull. 42, 2 (2019), 47--58. http://sites.computer.org/debull/A19june/p47.pdf
[51]
Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Çetintemel. 2020. DeepSqueeze: Deep Semantic Compression for Tabular Data. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1733--1746.
[52]
Nicholas Jacek and J. Eliot B. Moss. 2019. Learning When to Garbage Collect with Random Forests. In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management (ISMM 2019). Association for Computing Machinery, Phoenix, AZ, USA, 53--63.
[53]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research (CIDR '19).
[54]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf
[55]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRR abs/1712.01208 (2017). arXiv:1712.01208 http://arxiv.org/abs/1712.01208
[56]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. ACM, 489--504.
[57]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs] (Aug. 2018). arXiv:1808.03196 [cs]
[58]
Ani Kristo, Kapil Vaidya, Ugur Çetintemel, Sanchit Misra, and Tim Kraska. 2020. The Case for a Learned Sorting Algorithm. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1001--1016.
[59]
Ani Kristo, Kapil Vaidya, and Tim Kraska. 2021. Defeating duplicates: A redesign of the LearnedSort algorithm. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB '21).
[60]
Mayuresh Kunjir and Shivnath Babu. 2020. Black or White? How to Develop an AutoTuner for Memory-based Analytics. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1667--1683.
[61]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2020. An Index Advisor Using Deep Reinforcement Learning. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19--23, 2020. ACM, 2105--2108.
[62]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. Proc. VLDB Endow. 12, 12 (2019), 2118--2130.
[63]
Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering (CASCON '15). IBM Corp., Riverton, NJ, USA, 53--59.
[64]
Qiyu Liu, Libin Zheng, Yanyan Shen, and Lei Chen. 2020. Stable Learned Bloom Filters for Data Streams. Proc. VLDB Endow. 13, 11 (2020), 2355--2367. http://www.vldb.org/pvldb/vol13/p2355-liu.pdf
[65]
Konstantinos Lolos, Ioannis Konstantinou, Verena Kantere, and Nectarios Koziris. 2017. Elastic Management of Cloud Applications Using Adaptive Reinforcement Learning. In IEEE International Conference on Big Data (Big Data '17). IEEE, 203--212.
[66]
Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. APEX: A High-Performance Learned Index on Persistent Memory. CoRR abs/2105.00683 (2021). arXiv:2105.00683 https://arxiv.org/abs/2105.00683
[67]
Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, and Shivnath Babu. 2019. Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems. Proc. VLDB Endow. 12, 12 (2019), 1970--1973.
[68]
Thodoris Lykouris and Sergei Vassilvitskii. 2018. Competitive Caching with Machine Learned Advice. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 3302--3311. http://proceedings.mlr.press/v80/lykouris18a.html
[69]
Marcel Maltry and Jens Dittrich. 2021. A Critical Analysis of Recursive Model Indexes. CoRR abs/2106.16166 (2021). arXiv:2106.16166 https://arxiv.org/abs/2106.16166
[70]
Abdullah Al Mamun, Hao Wu, and Walid G. Aref. 2020. A Tutorial on Learned Multidimensional Indexes. https://www.cs.purdue.edu/homes/aref/learned-indexes-tutorial.html.
[71]
Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Mohammad Alizadeh. 2019. Park: An Open Platform for Learning-Augmented Computer Systems. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 2490--2502. https://proceedings.neurips.cc/paper/2019/hash/f69e505b08403ad2298b9f262659929a-Abstract.html
[72]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2018. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 [cs, stat] (2018). arXiv:1810.01963 [cs, stat]
[73]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2018. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 [cs, stat] (Oct. 2018). arXiv:1810.01963 [cs, stat]
[74]
Ryan Marcus. 2021. More Bao Results: Learned Distributed Query Optimization on Vertica, Redshift, and Azure Synapse. https://learnedsystems.mit.edu/bao-distributed/.
[75]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13.
[76]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In SIGMOD 21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 1275--1288.
[77]
Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM @ SIGMOD '18). Houston, TX.
[78]
Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 2789--2792.
[79]
Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 (2019), 1705--1718.
[80]
Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montréal, Canada. 462--471. https://proceedings.neurips.cc/paper/2018/hash/0f49c89d1e7298bb9930789c8ed59d48-Abstract.html
[81]
Michael Mitzenmacher. 2020. Scheduling with Predictions and the Price of Misprediction. In 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12--14, 2020, Seattle, Washington, USA (LIPIcs, Vol. 151). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 14:1--14:18.
[82]
Michael Mitzenmacher and Sergei Vassilvitskii. 2020. Algorithms with Predictions. In Beyond the Worst-Case Analysis of Algorithms,.Cambridge University Press, 646--662.
[83]
C. Mohan and Frank E. Levine. 1992. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 2--5, 1992. ACM Press, 371--380.
[84]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 985--1000.
[85]
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, and Alekh Jindal. 2021. Steering Query Optimizers: A Practical Take on Big Data Workloads. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 2557--2569.
[86]
Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. to appear. Flow-Loss: Learning Cardinality Estimates That Matter. Proc. VLDB Endow. (to appear).
[87]
Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In 36th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 154--157.
[88]
Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where It Matters. In Workshop on Self-Managing Databases (SMDB @ ICDE '20).
[89]
Thomas Neumann and Sebastian Michel. 2008. Smooth Interpolating Histograms with Error Guarantees. In Sharing Data, Information and Knowledge, 25th British National Conference on Databases (BNCOD '08). 126--138.
[90]
Harrie Oosterhuis, J. Shane Culpepper, and Maarten de Rijke. 2018. The Potential of Learned Index Structures for Index Compression. In Proceedings of the 23rd Australasian Document Computing Symposium, ADCS 2018, Dunedin, New Zealand, December 11--12, 2018. ACM, 7:1--7:4.
[91]
Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2019. An Empirical Analysis of Deep Learning for Cardinality Estimation. arXiv:1905.06425 [cs] (Sept. 2019). arXiv:1905.06425 [cs]
[92]
Jennifer Ortiz, Brendan Lee, Magdalena Balazinska, and Joseph L. Hellerstein. 2016. PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees. arXiv:1605.09753 [cs] (May 2016). arXiv:1605.09753 [cs]
[93]
Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2018. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 [cs] (Dec. 2018). arXiv:1812.10568 [cs]
[94]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p42-pavlo-cidr17.pdf
[95]
Andrew Pavlo, Evan P. C. Jones, and Stan Zdonik. 2011. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems. PVLDB 5, 2 (2011), 86--96.
[96]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 1981--1984.
[97]
Jack W. Rae, Sergey Bartunov, and Timothy P. Lillicrap. 2019. Meta-Learning Neural Bloom Filters. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 5271--5280. http://proceedings.mlr.press/v97/rae19a.html
[98]
Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3--6, 2002. ACM, 558--569.
[99]
Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2020. A Computational Approach to Packet Classification. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (Virtual Event, USA) (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 542--556.
[100]
Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, and Tim Kraska. 2021. When Are Learned Models Better Than Hash Functions?. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB'21).
[101]
Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. DRLindex: deep reinforcement learning index advisor for a cluster database. In IDEAS 2020:24th International Database Engineering & Applications Symposium, Seoul, Republic of Korea, August 12--14, 2020. ACM, 11:1--11:8.
[102]
Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. 2018. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. arXiv:1808.07903 [cs, stat] (Aug. 2018). arXiv:1808.07903 [cs, stat]
[103]
Yangjun Sheng, Anthony Tomasic, Tieying Zhang, and Andrew Pavlo. 2019. Scheduling OLTP transactions via learned abort prediction. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019. ACM, 1:1--1:8.
[104]
Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB'21).
[105]
Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB (VLDB '01). 19--28.
[106]
Ji Sun and Guoliang Li. 2019. An End-to-End Learning-Based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (Nov. 2019), 307--319.
[107]
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 308--320.
[108]
Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, and Tim Kraska. 2021. Partitioned Learned Bloom Filters. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=6BRLOfrMhW
[109]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, February 28 - March 3, 2000. IEEE Computer Society, 101--110.
[110]
Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: a scalable learned index for string keys. In APSys '20: 11th ACM SIGOPS Asia-Pacific Workshop on Systems, Tsukuba, Japan, August 24--25, 2020. ACM, 17--24.
[111]
Xingda Wei, Rong Chen, and Haibo Chen. 2020. Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 117--135. https://www.usenix.org/conference/osdi20/presentation/wei
[112]
Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM'19). Association for Computing Machinery, Amsterdam, Netherlands, 1--8.
[113]
Jiacheng Wu, Yong Zhang, Shimin Chen, Yu Chen, Jin Wang, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (2021), 1276--1288. http://www.vldb.org/pvldb/vol14/p1276-wu.pdf
[114]
Wenkun Xiang, Hao Zhang, Rui Cui, Xing Chu, Keqin Li, and Wei Zhou. 2019. Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised? IEEE Access 7 (2019), 293--303.
[115]
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning Data Layouts for Big Data Analytics. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 193--208.
[116]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. arXiv:2006.08109 [cs] (June 2020). arXiv:2006.08109 [cs]
[117]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proceedings of the VLDB Endowment 13, 3 (Nov. 2019), 279--292.

Cited By

View all
  • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
  • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
  • (2024)Towards flexibility and robustness of LSM treesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00826-933:4(1105-1128)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Towards instance-optimized data systems
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 12
    July 2021
    587 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 July 2021
    Published in PVLDB Volume 14, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
    • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
    • (2024)Towards flexibility and robustness of LSM treesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00826-933:4(1105-1128)Online publication date: 1-Jul-2024
    • (2023)The Case for Learned In-Memory JoinsProceedings of the VLDB Endowment10.14778/3587136.358714816:7(1749-1762)Online publication date: 8-May-2023
    • (2023)PLATON: Top-down R-tree Packing with Learned Partition PolicyProceedings of the ACM on Management of Data10.1145/36267421:4(1-26)Online publication date: 12-Dec-2023
    • (2023)High-Throughput Vector Similarity Search in Knowledge GraphsProceedings of the ACM on Management of Data10.1145/35897771:2(1-25)Online publication date: 20-Jun-2023
    • (2023)FORTE: an extensible framework for robustness and efficiency in data transfer pipelinesProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3596892(139-150)Online publication date: 27-Jun-2023
    • (2023)A Prediction System ServiceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575714(48-60)Online publication date: 27-Jan-2023
    • (2023)Neural networks as building blocks for the design of efficient learned indexesNeural Computing and Applications10.1007/s00521-023-08841-135:29(21399-21414)Online publication date: 21-Jul-2023
    • (2022)LSched: A Workload-Aware Learned Query Scheduler for Analytical Database SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526158(1228-1242)Online publication date: 10-Jun-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media