Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Rethink Query Optimization in HTAP Databases

Published: 12 December 2023 Publication History

Abstract

The advent of data-intensive applications has fueled the evolution of hybrid transactional and analytical processing (HTAP). To support mixed workloads, distributed HTAP databases typically maintain two data copies that are specially tailored for data freshness and performance isolation. In particular, a copy in a row-oriented format is well-suited for OLTP workloads, and a second copy in a column-oriented format is optimized for OLAP workloads. Such a hybrid design opens up a new design space for query optimization: plans can be optimized over different data formats and can be executed over isolated resources, which we term hybrid plans. In this paper, we demonstrate that hybrid plans can largely benefit query execution (e.g., up to 11x speedups in our evaluation). However, we also found these benefits will potentially be at the cost of sacrificing data freshness or performance isolation since traditional optimizers may not precisely model and schedule the execution of hybrid plans on real-time updated HTAP databases. Therefore, we propose Metis, an HTAP-aware optimizer. We show, both theoretically and experimentally, that using the proposed optimizations, a system can largely benefit from hybrid plans while preserving isolated performance for OLTP and OLAP, and these optimizations are robust to the changes in workloads.

References

[1]
Daniel J Abadi, Samuel R Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: how different are they really?. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 967--980.
[2]
Michael Abebe, Horatiu Lazu, and Khuzaima Daudjee. 2022. Proteus: Autonomous Adaptive Storage for Mixed Workloads. Technical Report. Technical Report. University of Waterloo. https://cs. uwaterloo. ca . . . .
[3]
Sameer Agarwal, Srikanth Kandula, Nicolas Bruno, Ming-ChuanWu, Ion Stoica, and Jingren Zhou. 2012. Reoptimizing Data Parallel Computing. In NSDI, Vol. 12. 281--294.
[4]
Nitin Agrawal and Ashish Vulimiri. 2017. Low-latency analytics on colossal data streams with summarystore. In Proceedings of the 26th Symposium on Operating Systems Principles. 647--664.
[5]
Rafi Ahmed, Allison Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zait, and Thierry Cruanes. 2006. Costbased query transformation in Oracle. In VLDB, Vol. 6. 1026--1036.
[6]
Gennady Antoshenkov. 1993. Dynamic query optimization in Rdb/VMS. In Proceedings of IEEE 9th International Conference on Data Engineering. IEEE, 538--547.
[7]
Vaibhav Arora, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. 2017. Janus: A hybrid scalable multirepresentation cloud datastore. IEEE Transactions on Knowledge and Data Engineering 30, 4 (2017), 689--702.
[8]
Ron Avnur and Joseph M Hellerstein. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 261--272.
[9]
AWS. 2023. AWS Latency Monitoring. https://www.cloudping.co/grid.
[10]
Shivnath Babu, Pedro Bizarro, and David DeWitt. 2005. Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 107--118.
[11]
Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. 2015. Smooth scan: Statistics-oblivious access paths. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 315--326.
[12]
Dennis Butterstein, Daniel Martin, Knut Stolze, Felix Beier, Jia Zhong, and Lingyun Wang. 2020. Replication at the Speed of Change: A Fast, Scalable Replication Solution for near Real-Time HTAP Processing. Proc. VLDB Endow. 13, 12 (aug 2020), 3245--3257. https://doi.org/10.14778/3415478.3415548
[13]
Shaosheng Cao, XinXing Yang, Cen Chen, Jun Zhou, Xiaolong Li, and Yuan Qi. 2019. TitAnt: Online Real-Time Transaction Fraud Detection in Ant Financial. Proc. VLDB Endow. 12, 12 (aug 2019), 2082--2093. https://doi.org/10. 14778/3352063.3352126
[14]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering 38, 4 (2015).
[15]
Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract?. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 961--968.
[16]
Jianjun Chen, Yonghua Ding, Ye Liu, Fangshi Li, Li Zhang, Mingyi Zhang, Kui Wei, Lixun Cao, Dan Zou, Yang Liu, et al. 2022. ByteHTAP: bytedance's HTAP system with high data freshness and strong data consistency. Proceedings of the VLDB Endowment 15, 12 (2022), 3411--3424.
[17]
Xusheng Chen, Haoze Song, Jianyu Jiang, Chaoyi Ruan, Cheng Li, Sen Wang, Gong Zhang, Reynold Cheng, and Heming Cui. 2021. Achieving low tail-latency and high scalability for serializable transactions in edge computing. In Proceedings of the Sixteenth European Conference on Computer Systems. 210--227.
[18]
Inc. ClickHouse. 2023. ClickHouse - open source distributed column-oriented DBMS. https://github.com/ClickHouse/ ClickHouse/tree/22.6.
[19]
Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, et al. 2011. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems. 1--6.
[20]
Brian Cooper. 2010. Yahoo! Cloud Serving Benchmark. https://github.com/brianfrankcooper/YCSB.
[21]
The Transaction Processing Council. 1992. TPC-H. http://www.tpc.org/tpch/.
[22]
The Transaction Processing Council. 2014. TPC-C. http://www.tpc.org/tpcc/.
[23]
The Transaction Processing Council. 2015. TPC-DS. http://www.tpc.org/tpcds/.
[24]
Akon Dey, Alan Fekete, Raghunath Nambiar, and Uwe Röhm. 2014. YCSB T: Benchmarking web-scale transactional databases. In 2014 IEEE 30th International Conference on Data Engineering Workshops. IEEE, 223--230.
[25]
Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya. 2018. Plan stitch: harnessing the best of many plans. Proceedings of the VLDB Endowment 11, 10 (2018), 1123--1136.
[26]
Science Direct. 2004. Real-Time Pricing. https://www.sciencedirect.com/topics/engineering/real-time-pricing.
[27]
Anshuman Dutt and Jayant R Haritsa. 2014. Plan bouquets: query processing without selectivity estimation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1039--1050.
[28]
Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R Narasayya, and Manoj Syamala. 2018. Columnstore and B tree-Are Hybrid Physical Designs Important?. In Proceedings of the 2018 International Conference on Management of Data. 177--190.
[29]
Kaushik Ghosh. 1995. Speculative execution in real-time systems. Ph.D. Dissertation. Citeseer.
[30]
Matteo Golfarelli and Stefano Rizzi. 2017. From Star Schemas to Big Data: 20 Years of Data Warehouse Research. A comprehensive guide through the Italian database research over the last 25 years (2017), 93--107.
[31]
Google. 2022. AlloyDB for PostgreSQL under the hood: Columnar engine. https://cloud.google.com/blog/products/ databases/alloydb-for-postgresql-columnar-engine.
[32]
Hui-I Hsiao, Ming-Syan Chen, and Philip S Yu. 1994. On parallel execution of multiple pipelined hash joins. In Proceedings of the 1994 ACM SIGMOD international conference on Management of data. 185--196.
[33]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.
[34]
Bert Hubert. 2023. tc(8), Linux manual page. https://man7.org/linux/man-pages/man8/tc.8.html.
[35]
SnowFlake Inc. 2023. Unistore: A modern approach to working with transactional and analytical data together in a single platform. https://www.snowflake.com/workloads/unistore/.
[36]
Alekh Jindal, Lalitha Viswanathan, and Konstantinos Karanasos. 2019. Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems. arXiv preprint arXiv:1906.06590 (2019).
[37]
Navin Kabra and David J DeWitt. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 106--117.
[38]
The kernel development community. 2023. Control Groups. https://docs.kernel.org/admin-guide/cgroup-v1/cgroups.html.
[39]
Michael S Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access path selection in main-memory optimized data systems: Should I scan or should I probe?. In Proceedings of the 2017 ACM International Conference on Management of Data. 715--730.
[40]
Michael S Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access path selection in main-memory optimized data systems: Should I scan or should I probe?. In Proceedings of the 2017 ACM International Conference on Management of Data. 715--730.
[41]
Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle database in-memory: A dual format in-memory database. In 2015 IEEE 31st International Conference on Data Engineering. IEEE, 1253--1258.
[42]
Per-Åke Larson, Adrian Birka, Eric N Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-time analytical processing with SQL server. Proceedings of the VLDB Endowment 8, 12 (2015), 1740--1751.
[43]
Juchang Lee, SeungHyun Moon, Kyu Hwan Kim, Deok Hoe Kim, Sang Kyun Cha, and Wook-Shin Han. 2017. Parallel replication across formats in SAP HANA for scaling out mixed OLTP/OLAP workloads. Proceedings of the VLDB Endowment 10, 12 (2017), 1598--1609.
[44]
Mark Levene and George Loizou. 2003. Why is the snowflake schema a good data warehouse design? Information Systems 28, 3 (2003), 225--240.
[45]
Guoliang Li and Chao Zhang. 2022. HTAP Databases: What is New and What is Next. In Proceedings of the 2022 International Conference on Management of Data. 2483--2488.
[46]
Quanzhong Li, Minglong Shao, Volker Markl, Kevin Beyer, Latha Colby, and Guy Lohman. 2006. Adaptively reordering joins during query execution. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 26--35.
[47]
Yan Li, Liwei Wang, Sheng Wang, Yuan Sun, and Zhiyong Peng. 2022. A Resource-Aware Deep Cost Model for Big Data Query Processing. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 885--897.
[48]
Chen Luo and Michael J Carey. 2019. On performance stability in LSM-based storage systems. Proceedings of the VLDB Endowment 13, 4 (2019).
[49]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, AlexandraWang, et al. 2021. Greenplum: A Hybrid Database for Transactional and AnalyticalWorkloads. In Proceedings of the 2021 International Conference on Management of Data. 2530--2542.
[50]
Elena Milkai, Yannis Chronis, Kevin P Gaffney, Zhihan Guo, Jignesh M Patel, and Xiangyao Yu. 2022. How Good is My HTAP System?. In Proceedings of the 2022 International Conference on Management of Data. 1810--1824.
[51]
MySQL. 2022. MySQL Heatwave. https://dev.mysql.com/doc/heatwave/en/heatwave-introduction.html.
[52]
Edmund B Nightingale, Peter M Chen, and Jason Flinn. 2005. Speculative execution in a distributed file system. ACM SIGOPS operating systems review 39, 5 (2005), 191--205.
[53]
Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid transactional/analytical processing: A survey. In Proceedings of the 2017 ACM International Conference on Management of Data. 1771--1775.
[54]
Patrick O'Neil, Elizabeth O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The star schema benchmark and augmented fact table indexing. In Performance Evaluation and Benchmarking: First TPC Technology Conference, TPCTC 2009, Lyon, France, August 24--28, 2009, Revised Selected Papers 1. Springer, 237--252.
[55]
Vijayshankar Raman, Gopi Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M Lohman, et al. 2013. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment 6, 11 (2013), 1080--1091.
[56]
Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through elastic resource scheduling. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2043--2054.
[57]
Mohammad Sadoghi, Souvik Bhattacherjee, Bishwaranjan Bhattacharjee, and Mustafa Canim. 2016. L-store: A realtime OLTP and OLAP system. arXiv preprint arXiv:1601.04084 (2016).
[58]
Subhadeep Sarkar, Tarikul Islam Papon, Dimitris Staratzis, and Manos Athanassoulis. 2020. Lethe: A tunable deleteaware LSM engine. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 893--908.
[59]
Hemant Saxena, Lukasz Golab, Stratos Idreos, and Ihab F Ilyas. 2021. Real-time LSM-trees for HTAP workloads. arXiv preprint arXiv:2101.06801 (2021).
[60]
P Griffiths Selinger, Morton M Astrahan, Donald D Chamberlin, Raymond A Lorie, and Thomas G Price. 1979. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data. 23--34.
[61]
Sijie Shen, Rong Chen, Haibo Chen, and Binyu Zang. 2021. Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 219--238.
[62]
Inc. SingleStore. 2023. SingleStore: Real-Time Distributed SQL. https://www.singlestore.com/.
[63]
T Spenser and T Loukas. 1999. From Star to Snowflake to ERD: Comparing Data Warehouse Design Approaches. Enterprise Systems Journal 14 (1999), 62--69.
[64]
Khin Me Me Thein. 2014. Apache kafka: Next generation distributed messaging system. International Journal of Scientific Engineering and Technology Research 3, 47 (2014), 9478--9483.
[65]
Panos Vassiliadis. 2009. A survey of extract--transform--load technology. International Journal of Data Warehousing and Mining (IJDWM) 5, 3 (2009), 1--27.
[66]
Lalitha Viswanathan, Alekh Jindal, and Konstantinos Karanasos. 2018. Query and resource optimization: Bridging the gap. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1384--1387.
[67]
Jianying Wang, Tongliang Li, Haoze Song, Xinjun Yang, Wenchao Zhou, Feifei Li, Baoyue Yan, Qianqian Wu, Yukun Liang, ChengJun Ying, Yujie Wang, Baokai Chen, Chang Cai, Yubin Ruan, Xiaoyi Weng, Shibin Chen, Liang Yin, Chengzhong Yang, Xin Cai, Hongyan Xing, Nanlong Yu, Xiaofei Chen, Dapeng Huang, and Jianling Sun. 2023. PolarDBIMCI: A Cloud-Native HTAP Database System at Alibaba. Proc. ACM Manag. Data 1, 2, Article 199 (jun 2023), 25 pages. https://doi.org/10.1145/3589785
[68]
Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, et al. 2020. F1 Lightning: HTAP as a Service. Proceedings of the VLDB Endowment 13, 12 (2020), 3313--3325.
[69]
Ting Yao, Yiwen Zhang, JiguangWan, Qiu Cui, gLiu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: reducing write stalls and write amplification in LSM-tree based KV stores with a matrix container in NVM. In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference. 17--31.
[70]
Shaoyi Yin, Abdelkader Hameurlain, and Franck Morvan. 2015. Robust query optimization methods with respect to estimation errors: A survey. ACM Sigmod Record 44, 3 (2015), 25--36.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 4
PACMMOD
December 2023
1317 pages
EISSN:2836-6573
DOI:10.1145/3637468
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2023
Published in PACMMOD Volume 1, Issue 4

Author Tags

  1. HTAP database
  2. hybrid physical format
  3. query optimization

Qualifiers

  • Research-article

Funding Sources

  • HK GRF
  • HK RIF
  • HKU-SCF FinTech Academy R\&D Funding Scheme
  • HK ITF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 974
    Total Downloads
  • Downloads (Last 12 months)974
  • Downloads (Last 6 weeks)99
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media