research-article

Towards instance-optimized data systems

Author:

Tim KraskaAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 12

Pages 3222 - 3232

https://doi.org/10.14778/3476311.3476392

Published: 01 July 2021 Publication History

Abstract

In recent years, we have seen increased interest in applying machine learning to system problems. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, and sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, these techniques will allow us to build "instance-optimized" systems: that is, systems that self-adjust to a given workload and data distribution to provide unprecedented performance without the need for tuning by an administrator. While many of these techniques promise orders-of-magnitude better performance in lab settings, there is still general skepticism about how practical the current techniques really are.

The following is intended as a progress report on ML for Systems and its readiness for real-world deployments, with a focus on our projects done as part of the Data Systems and AI Lab (DSAIL) at MIT By no means is it a comprehensive overview of all existing work, which has been steadily growing over the past several years not only in the database community but also in the systems, networking, theory, PL, and many other adjacent communities.

References

[1]

Hussam Abu-Libdeh, Deniz Altinbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou Li, Andy Ly, and Christopher Olston. 2020. Learned Indexes for a Google-scale Disk-based Database. In Proceedings of the Workshop on ML for Systems at NeurIPS.

[2]

Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10--14, 2000, Cairo, Egypt. 496--505.

Digital Library

[3]

Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004. ACM, 359--370.

Digital Library

[4]

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017. ACM, 1009--1024.

Digital Library

[5]

Dana Van Aken, Dongsheng Yang, Sebastien Brillard, Ari Fiorino, Bohan Zhang, Christian Billian, and Andrew Pavlo. 2021. An Inquiry into Machine Learning-based Automatic Configuration Tuning Services on Real-World Database Management Systems. Proc. VLDB Endow. 14, 7 (2021), 1241--1253.

Digital Library

[6]

Abdullah Al-Mamun, Hao Wu, and Walid G. Aref. 2020. A Tutorial on Learned Multi-dimensional Indexes. In SIGSPATIAL '20: 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, November 3--6, 2020. ACM, 1--4.

Digital Library

[7]

Christos Anagnostopoulos and Peter Triantafillou. 2015. Learning Set Cardinality in Distance Nearest Neighbours. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM) (ICDM '15). IEEE Computer Society, USA, 691--696.

Digital Library

[8]

C. Anagnostopoulos and P. Triantafillou. 2015. Learning to Accurately COUNT with Query-Driven Predictive Analytics. In 2015 IEEE International Conference on Big Data (Big Data) (Big Data '15). 14--23.

Digital Library

[9]

Christos Anagnostopoulos and Peter Triantafillou. 2017. Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality. ACM Trans. Knowl. Discov. Data 11, 4 (June 2017), 47:1--47:46.

Digital Library

[10]

Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus, and Umar Farooq Minhas. 2021. Towards a Benchmark for Learned Systems. In 37th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2021, Chania, Greece, April 19--22, 2021. IEEE, 127--133.

[11]

Matthias Brantner, Daniela Florescu, David A. Graf, Donald Kossmann, and Tim Kraska. 2008. Building a database on S3. In Proceedings of the SIGMOD. ACM, 251--264.

Digital Library

[12]

Lujing Cen, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. LEA: A Learned Encoding Advisor for Column Stores. In aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021. ACM, 32--35.

Digital Library

[13]

Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25--29, 1997, Athens, Greece. Morgan Kaufmann, 146--155. http://www.vldb.org/conf/1997/P146.PDF

Digital Library

[14]

Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA.ACM Press, 367--378.

Digital Library

[15]

Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23--27, 2007. ACM, 3--14. http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf

Digital Library

[16]

Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2021/papers/cidr2021_paper20.pdf

[17]

Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism:A Workload-Driven Approach to Database Replication and Partitioning. PVLDB 3, 1 (2010), 48--57.

Digital Library

[18]

Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 155--171. https://www.usenix.org/conference/osdi20/presentation/dai

Digital Library

[19]

Zhenwei Dai and Anshumali Shrivastava. 2020. Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier with Application to Real-Time Information Filtering on the Web. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/86b94dae7c6517ec1ac767fd2c136580-Abstract.html

[20]

Biplob K. Debnath, David J. Lilja, and Mohamed F. Mokbel. 2008. SARD: A statistical approach for ranking database tuning parameters. In Proceedings of the 24th International Conference on Data Engineering Workshops, ICDE 2008, April 7--12, 2008, Cancún, Mexico. IEEE Computer Society, 11--18.

Digital Library

[21]

Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. 2005. Automatic Performance Diagnosis and Tuning in Oracle. In Second Biennial Conference on Innovative Data Systems Research, CIDR 2005, Asilomar, CA, USA, January 4--7, 2005, Online Proceedings. www.cidrdb.org, 84--94. http://cidrdb.org/cidr2005/papers/P07.pdf

[22]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30-July 5, 2019. ACM, 1241--1258.

Digital Library

[23]

Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 418--431.

Digital Library

[24]

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 969--984.

Digital Library

[25]

Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. Proc. VLDB Endow. 14, 2 (2020), 74--86.

Digital Library

[26]

DSAIL. 2021. Data System and AI Lab. http://dsail.csail.mit.edu/.

[27]

DSAIL. 2021. (Learned Index Leaderboard. https://learnedsystems.github.io/SOSDLeaderboard/leaderboard/. [Online; accessed 7-July-2021].

[28]

DSAIL. 2021. ML for Systems Papers. http://dsg.csail.mit.edu/mlforsystems/papers/.

[29]

Elbert Du, Franklyn Wang, and Michael Mitzenmacher. 2021. Putting the "Learning" into Learning-Augmented Algorithms for Frequency Estimation. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139). PMLR, 2860--2869. http://proceedings.mlr.press/v139/du21d.html

[30]

Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with iTuned. Proc. VLDB Endow. 2, 1 (2009), 1246--1257.

Digital Library

[31]

Martin Eppert, Philipp Fent, and Thomas Neumann. 2021. A Tailored Regression for Learned Indexes: Logarithmic Error Regression. In aiDM '21: Fourth Workshop in Exploiting AI Techniques for Data Management, Virtual Event, China, 25 June, 2021. ACM, 9--15.

Digital Library

[32]

Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4 (2003), 614--656.

Digital Library

[33]

Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2020. Why Are Learned Indexes So Effective?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 3123--3132. http://proceedings.mlr.press/v119/ferragina20a.html

[34]

Paolo Ferragina, Fabrizio Lillo, and Giorgio Vinciguerra. 2021. On the performance of learned data structures. Theor. Comput. Sci. 871 (2021), 107--120.

[35]

Paolo Ferragina and Giorgio Vinciguerra. 2020. Learned Data Structures. In Recent Trends in Learning From Data. Springer International Publishing, 5--41.

[36]

Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proceedings of the VLDB Endowment 13, 8 (April 2020), 1162--1175.

Digital Library

[37]

Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 1189--1206.

Digital Library

[38]

Vahid Ghadakchi, Mian Xie, and Arash Termehchy. 2020. Bandit join: preliminary results. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2020, Portland, Oregon, USA, June 19, 2020. ACM, 1:1--1:4.

Digital Library

[39]

Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman P. Amarasinghe, Joshua B. Tenenbaum, and Tim Mattson. 2018. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL@PLDI 2018, Philadelphia, PA, USA, June 18--22, 2018. ACM, 69--80.

Digital Library

[40]

Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1997. Index Selection for OLAP. In Proceedings of the Thirteenth International Conference on Data Engineering, April 7--11, 1997, Birmingham, UK. IEEE Computer Society, 208--219.

Digital Library

[41]

Yaniv Gur, Dongsheng Yang, Frederik Stalschus, and Berthold Reinwald. 2021. Adaptive Multi-Model Reinforcement Learning for Online Database Tuning. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021. OpenProceedings.org, 439--444.

[42]

Ali Hadian and Thomas Heinis. 2019. Considerations for handling updates in learned index structures. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019. ACM, 3:1--3:4.

Digital Library

[43]

Rojeh Hayek and Oded Shmueli. 2019. Improved Cardinality Estimation by Learning Queries Containment Rates. arXiv:1908.07723 [cs] (Aug. 2019). arXiv:1908.07723 [cs]

[44]

Shunsuke Higuchi, Junji Takemasa, Yuki Koizumi, Atsushi Tagami, and Toru Hasegawa. 2021. Feasibility of Longest Prefix Matching Using Learned Index Structures. SIGMETRICS Perform. Eval. Rev. 48, 4 (May 2021), 45--48.

Digital Library

[45]

Benjamin Hilprecht and Carsten Binnig. 2021. One Model to Rule them All: Towards Zero-Shot Learning for Databases. CoRR abs/2105.00642 (2021). arXiv:2105.00642 https://arxiv.org/abs/2105.00642

[46]

Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, and Tim Kraska. 2019. LISA: Towards Learned DNA Sequence Search. CoRR abs/1910.04728 (2019). arXiv:1910.04728 http://arxiv.org/abs/1910.04728

[47]

Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 2019. Learning-Based Frequency Estimation Algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=r1lohoCqY7

[48]

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p143-idreos-cidr19.pdf

[49]

Stratos Idreos and Tim Kraska. 2019. From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 2054--2059.

Digital Library

[50]

Stratos Idreos, Kostas Zoumpatianos, Subarna Chatterjee, Wilson Qin, Abdul Wasay, Brian Hentschel, Mike S. Kester, Niv Dayan, Demi Guo, Minseo Kang, and Yiyou Sun. 2019. Learning Data Structure Alchemy. IEEE Data Eng. Bull. 42, 2 (2019), 47--58. http://sites.computer.org/debull/A19june/p47.pdf

[51]

Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur Çetintemel. 2020. DeepSqueeze: Deep Semantic Compression for Tabular Data. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1733--1746.

Digital Library

[52]

Nicholas Jacek and J. Eliot B. Moss. 2019. Learning When to Garbage Collect with Random Forests. In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management (ISMM 2019). Association for Computing Machinery, Phoenix, AZ, USA, 53--63.

Digital Library

[53]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research (CIDR '19).

[54]

Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf

[55]

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2017. The Case for Learned Index Structures. CoRR abs/1712.01208 (2017). arXiv:1712.01208 http://arxiv.org/abs/1712.01208

Digital Library

[56]

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. ACM, 489--504.

Digital Library

[57]

Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. arXiv:1808.03196 [cs] (Aug. 2018). arXiv:1808.03196 [cs]

[58]

Ani Kristo, Kapil Vaidya, Ugur Çetintemel, Sanchit Misra, and Tim Kraska. 2020. The Case for a Learned Sorting Algorithm. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1001--1016.

Digital Library

[59]

Ani Kristo, Kapil Vaidya, and Tim Kraska. 2021. Defeating duplicates: A redesign of the LearnedSort algorithm. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB '21).

[60]

Mayuresh Kunjir and Shivnath Babu. 2020. Black or White? How to Develop an AutoTuner for Memory-based Analytics. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 1667--1683.

Digital Library

[61]

Hai Lan, Zhifeng Bao, and Yuwei Peng. 2020. An Index Advisor Using Deep Reinforcement Learning. In CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19--23, 2020. ACM, 2105--2108.

Digital Library

[62]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning. Proc. VLDB Endow. 12, 12 (2019), 2118--2130.

Digital Library

[63]

Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality Estimation Using Neural Networks. In Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering (CASCON '15). IBM Corp., Riverton, NJ, USA, 53--59.

Digital Library

[64]

Qiyu Liu, Libin Zheng, Yanyan Shen, and Lei Chen. 2020. Stable Learned Bloom Filters for Data Streams. Proc. VLDB Endow. 13, 11 (2020), 2355--2367. http://www.vldb.org/pvldb/vol13/p2355-liu.pdf

Digital Library

[65]

Konstantinos Lolos, Ioannis Konstantinou, Verena Kantere, and Nectarios Koziris. 2017. Elastic Management of Cloud Applications Using Adaptive Reinforcement Learning. In IEEE International Conference on Big Data (Big Data '17). IEEE, 203--212.

[66]

Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. APEX: A High-Performance Learned Index on Persistent Memory. CoRR abs/2105.00683 (2021). arXiv:2105.00683 https://arxiv.org/abs/2105.00683

[67]

Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, and Shivnath Babu. 2019. Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems. Proc. VLDB Endow. 12, 12 (2019), 1970--1973.

Digital Library

[68]

Thodoris Lykouris and Sergei Vassilvitskii. 2018. Competitive Caching with Machine Learned Advice. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 3302--3311. http://proceedings.mlr.press/v80/lykouris18a.html

[69]

Marcel Maltry and Jens Dittrich. 2021. A Critical Analysis of Recursive Model Indexes. CoRR abs/2106.16166 (2021). arXiv:2106.16166 https://arxiv.org/abs/2106.16166

[70]

Abdullah Al Mamun, Hao Wu, and Walid G. Aref. 2020. A Tutorial on Learned Multidimensional Indexes. https://www.cs.purdue.edu/homes/aref/learned-indexes-tutorial.html.

[71]

Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Bojja Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, and Mohammad Alizadeh. 2019. Park: An Open Platform for Learning-Augmented Computer Systems. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 2490--2502. https://proceedings.neurips.cc/paper/2019/hash/f69e505b08403ad2298b9f262659929a-Abstract.html

Digital Library

[72]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2018. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 [cs, stat] (2018). arXiv:1810.01963 [cs, stat]

Digital Library

[73]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2018. Learning Scheduling Algorithms for Data Processing Clusters. arXiv:1810.01963 [cs, stat] (Oct. 2018). arXiv:1810.01963 [cs, stat]

Digital Library

[74]

Ryan Marcus. 2021. More Bao Results: Learned Distributed Query Optimization on Vertica, Redshift, and Azure Synapse. https://learnedsystems.mit.edu/bao-distributed/.

[75]

Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. 14, 1 (2020), 1--13.

Digital Library

[76]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In SIGMOD 21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 1275--1288.

Digital Library

[77]

Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM @ SIGMOD '18). Houston, TX.

Digital Library

[78]

Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 2789--2792.

Digital Library

[79]

Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 (2019), 1705--1718.

Digital Library

[80]

Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3--8, 2018, Montréal, Canada. 462--471. https://proceedings.neurips.cc/paper/2018/hash/0f49c89d1e7298bb9930789c8ed59d48-Abstract.html

Digital Library

[81]

Michael Mitzenmacher. 2020. Scheduling with Predictions and the Price of Misprediction. In 11th Innovations in Theoretical Computer Science Conference, ITCS 2020, January 12--14, 2020, Seattle, Washington, USA (LIPIcs, Vol. 151). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 14:1--14:18.

[82]

Michael Mitzenmacher and Sergei Vassilvitskii. 2020. Algorithms with Predictions. In Beyond the Worst-Case Analysis of Algorithms,.Cambridge University Press, 646--662.

[83]

C. Mohan and Frank E. Levine. 1992. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 2--5, 1992. ACM Press, 371--380.

Digital Library

[84]

Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 985--1000.

Digital Library

[85]

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, and Alekh Jindal. 2021. Steering Query Optimizers: A Practical Take on Big Data Workloads. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. ACM, 2557--2569.

Digital Library

[86]

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. to appear. Flow-Loss: Learning Cardinality Estimates That Matter. Proc. VLDB Endow. (to appear).

Digital Library

[87]

Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where it Matters. In 36th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 154--157.

[88]

Parimarjan Negi, Ryan Marcus, Hongzi Mao, Nesime Tatbul, Tim Kraska, and Mohammad Alizadeh. 2020. Cost-Guided Cardinality Estimation: Focus Where It Matters. In Workshop on Self-Managing Databases (SMDB @ ICDE '20).

[89]

Thomas Neumann and Sebastian Michel. 2008. Smooth Interpolating Histograms with Error Guarantees. In Sharing Data, Information and Knowledge, 25th British National Conference on Databases (BNCOD '08). 126--138.

Digital Library

[90]

Harrie Oosterhuis, J. Shane Culpepper, and Maarten de Rijke. 2018. The Potential of Learned Index Structures for Index Compression. In Proceedings of the 23rd Australasian Document Computing Symposium, ADCS 2018, Dunedin, New Zealand, December 11--12, 2018. ACM, 7:1--7:4.

Digital Library

[91]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2019. An Empirical Analysis of Deep Learning for Cardinality Estimation. arXiv:1905.06425 [cs] (Sept. 2019). arXiv:1905.06425 [cs]

[92]

Jennifer Ortiz, Brendan Lee, Magdalena Balazinska, and Joseph L. Hellerstein. 2016. PerfEnforce: A Dynamic Scaling Engine for Analytics with Performance Guarantees. arXiv:1605.09753 [cs] (May 2016). arXiv:1605.09753 [cs]

Digital Library

[93]

Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2018. QuickSel: Quick Selectivity Learning with Mixture Models. arXiv:1812.10568 [cs] (Dec. 2018). arXiv:1812.10568 [cs]

Digital Library

[94]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2017/papers/p42-pavlo-cidr17.pdf

[95]

Andrew Pavlo, Evan P. C. Jones, and Stan Zdonik. 2011. On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems. PVLDB 5, 2 (2011), 86--96.

Digital Library

[96]

Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. ACM, 1981--1984.

Digital Library

[97]

Jack W. Rae, Sergey Bartunov, and Timothy P. Lillicrap. 2019. Meta-Learning Neural Bloom Filters. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97). PMLR, 5271--5280. http://proceedings.mlr.press/v97/rae19a.html

[98]

Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3--6, 2002. ACM, 558--569.

Digital Library

[99]

Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2020. A Computational Approach to Packet Classification. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (Virtual Event, USA) (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 542--556.

Digital Library

[100]

Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, and Tim Kraska. 2021. When Are Learned Models Better Than Hash Functions?. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB'21).

[101]

Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. DRLindex: deep reinforcement learning index advisor for a cluster database. In IDEAS 2020:24th International Database Engineering & Applications Symposium, Seoul, Republic of Korea, August 12--14, 2020. ACM, 11:1--11:8.

Digital Library

[102]

Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. 2018. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. arXiv:1808.07903 [cs, stat] (Aug. 2018). arXiv:1808.07903 [cs, stat]

[103]

Yangjun Sheng, Anthony Tomasic, Tieying Zhang, and Andrew Pavlo. 2019. Scheduling OLTP transactions via learned abort prediction. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019. ACM, 1:1--1:8.

Digital Library

[104]

Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing. In International Workshop on Applied AI for Database Systems and Applications (AIDB@VLDB) (AIDB'21).

[105]

Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB (VLDB '01). 19--28.

Digital Library

[106]

Ji Sun and Guoliang Li. 2019. An End-to-End Learning-Based Cost Estimator. Proceedings of the VLDB Endowment 13, 3 (Nov. 2019), 307--319.

Digital Library

[107]

Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 308--320.

Digital Library

[108]

Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, and Tim Kraska. 2021. Partitioned Learned Bloom Filters. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=6BRLOfrMhW

[109]

Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, February 28 - March 3, 2000. IEEE Computer Society, 101--110.

Digital Library

[110]

Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: a scalable learned index for string keys. In APSys '20: 11th ACM SIGOPS Asia-Pacific Workshop on Systems, Tsukuba, Japan, August 24--25, 2020. ACM, 17--24.

Digital Library

[111]

Xingda Wei, Rong Chen, and Haibo Chen. 2020. Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 117--135. https://www.usenix.org/conference/osdi20/presentation/wei

Digital Library

[112]

Lucas Woltmann, Claudio Hartmann, Maik Thiele, Dirk Habich, and Wolfgang Lehner. 2019. Cardinality Estimation with Local Deep Learning Models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM'19). Association for Computing Machinery, Amsterdam, Netherlands, 1--8.

Digital Library

[113]

Jiacheng Wu, Yong Zhang, Shimin Chen, Yu Chen, Jin Wang, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (2021), 1276--1288. http://www.vldb.org/pvldb/vol14/p1276-wu.pdf

Digital Library

[114]

Wenkun Xiang, Hao Zhang, Rui Cui, Xing Chu, Keqin Li, and Wei Zhou. 2019. Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised? IEEE Access 7 (2019), 293--303.

[115]

Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning Data Layouts for Big Data Analytics. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. ACM, 193--208.

Digital Library

[116]

Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. arXiv:2006.08109 [cs] (June 2020). arXiv:2006.08109 [cs]

Digital Library

[117]

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proceedings of the VLDB Endowment 13, 3 (Nov. 2019), 279--292.

Digital Library

Cited By

Choi MYoo SChoi J(2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654919
Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Huynh AChaudhari HTerzi EAthanassoulis M(2024)Towards flexibility and robustness of LSM treesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00826-933:4(1105-1128)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s00778-023-00826-9
Show More Cited By

Index Terms

Towards instance-optimized data systems
1. Theory of computation

Index terms have been assigned to the content through auto-classification.

Recommendations

Towards NoSQL-based Data Warehouse Solutions

Data warehousing is a traditional domain of relational databases, and there are two main reasons for that: (1) data warehouses mostly are used in enterprises with large-scale data sets created in different legacy systems with relational data storages, (...
Write optimized object-oriented database systems
SCCC '97: Proceedings of the 17th International Conference of the Chilean Computer Science Society

In a database system, read operations are much more common than write operations, and consequently, database systems have been read optimized. As the size of main memory increases, more of the database read requests will be satisfied front the buffer ...
Towards Analytics-Optimized Document Stores

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 12

July 2021

587 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2021

Published in PVLDB Volume 14, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
158
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)3

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Choi MYoo SChoi J(2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654919
Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Huynh AChaudhari HTerzi EAthanassoulis M(2024)Towards flexibility and robustness of LSM treesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00826-933:4(1105-1128)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s00778-023-00826-9
Sabek IKraska T(2023)The Case for Learned In-Memory JoinsProceedings of the VLDB Endowment10.14778/3587136.358714816:7(1749-1762)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.14778/3587136.3587148
Yang JCong G(2023)PLATON: Top-down R-tree Packing with Learned Partition PolicyProceedings of the ACM on Management of Data10.1145/36267421:4(1-26)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626742
Mohoney JPacaci AChowdhury SMousavi AIlyas IMinhas UPound JRekatsinas T(2023)High-Throughput Vector Similarity Search in Knowledge GraphsProceedings of the ACM on Management of Data10.1145/35897771:2(1-25)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589777
Hilgendorf MGulisano VPapatriantafilou MEngström JMishra BKemme BRiviere ESchiavoni VPasin M(2023)FORTE: an extensible framework for robustness and efficiency in data transfer pipelinesProceedings of the 17th ACM International Conference on Distributed and Event-based Systems10.1145/3583678.3596892(139-150)Online publication date: 27-Jun-2023
https://dl.acm.org/doi/10.1145/3583678.3596892
Zhang ZGlova ASherwood TBalkind JAamodt TJerger NSwift M(2023)A Prediction System ServiceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575714(48-60)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575714
Amato DLo Bosco GGiancarlo R(2023)Neural networks as building blocks for the design of efficient learned indexesNeural Computing and Applications10.1007/s00521-023-08841-135:29(21399-21414)Online publication date: 21-Jul-2023
https://dl.acm.org/doi/10.1007/s00521-023-08841-1
Sabek IUkyab TKraska TIves ZBonifati AEl Abbadi A(2022)LSched: A Workload-Aware Learned Query Scheduler for Analytical Database SystemsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526158(1228-1242)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526158
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents