Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Cloud databases: new techniques, challenges, and opportunities

Published: 01 August 2022 Publication History

Abstract

As database vendors are increasingly moving towards the cloud data service, i.e., databases as a service (DBaaS), cloud databases have become prevalent. Compared with the early cloud-hosted databases, the new generation of cloud databases, also known as cloud-native databases, seek for higher elasticity and lower cost by developing new techniques, e.g., compute-storage disaggregation and the log is the database. To better harness the power of these cloud databases, it is important to study and compare the pros and cons of their key techniques. In this tutorial, we offer a comprehensive survey of cloud-native databases. Based on various system architectures, we introduce a taxonomy for the state-of-the-art cloud-native OLTP databases and OLAP databases, respectively. We then take a deep dive into their key techniques regarding storage management, transaction processing, analytical processing, data replication, serverless computing, database recovery, and security. Finally, we discuss the research challenges and opportunities.

References

[1]
Panagiotis Antonopoulos, Arvind Arasu, Kunal D. Singh, et al. 2020. Azure SQL Database Always Encrypted. In SIGMOD. ACM, 1511--1525.
[2]
Panagiotis Antonopoulos, Alex Budovski, Cristian Diaconu, et al. 2019. Socrates: The New SQL Server in the Cloud. In SIGMOD. ACM, 1743--1756.
[3]
Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, et al. 2022. Amazon Redshift Re-invented. In SIGMOD. ACM, 2205--2217.
[4]
AWS. 2022. Severless Interactive Query Service. https://aws.amazon.com/athena/
[5]
Wei Cao, Yingqiang Zhang, Xinjun Yang, et al. 2021. PolarDB Serverless: A Cloud Native Database for Disaggregated Data Centers. In SIGMOD. 2477--2489.
[6]
Benoît Dageville, Thierry Cruanes, Marcin Zukowski, et al. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD. ACM, 215--226.
[7]
Alex Depoutovitch, Chong Chen, Jin Chen, et al. 2020. Taurus Database: How to be Fast, Available, and Frugal in the Cloud. In SIGMOD. ACM, 1463--1478.
[8]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering 6, 1 (2021), 86--101.
[9]
Guoliang Li and Chao Zhang. 2022. HTAP Databases: What is New and What is Next. In SIGMOD. ACM, 2483--2488.
[10]
Guoliang Li, Xuanhe Zhou, and Lei Cao. 2021. AI Meets Database: AI4DB and DB4AI. In SIGMOD. ACM, 2859--2866.
[11]
Guoliang Li, Xuanhe Zhou, and Lei Cao. 2021. Machine Learning for Databases. VLDB 14, 12 (2021), 3190--3193.
[12]
Guoliang Li, Xuanhe Zhou, Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, and Shifu Li. 2021. openGauss: An Autonomous Database System. VLDB 14, 12 (2021), 3028--3041.
[13]
Edo Liberty, Zohar S. Karnin, Bing Xiang, et al. 2020. Elastic Machine Learning Algorithms in Amazon SageMaker. In SIGMOD. ACM, 731--737.
[14]
Sergey Melnik, Andrey Gubarev, Jing Jing Long, et al. 2020. Dremel: A Decade of Interactive SQL Analysis at Web Scale. VLDB 13, 12 (2020), 3461--3472.
[15]
Vivek R. Narasayya and Surajit Chaudhuri. 2021. Cloud Data Services: Workloads, Architectures and Multi-Tenancy. Foundations and Trends in Databases 10, 1 (2021), 1--107.
[16]
Vivek R. Narasayya, Ishai Menache, Mohit Singh, et al. 2015. Sharing Buffer Pool Memory in Multi-Tenant Relational Database-as-a-Service. VLDB 8, 7 (2015), 726--737.
[17]
Ippokratis Pandis. 2021. The Evolution of Amazon Redshift. VLDB 14, 12 (2021), 3162--3163.
[18]
Matthew Perron, Raul Castro Fernandez, David J. DeWitt, and Samuel Madden. 2020. Starling: A Scalable Query Engine on Cloud Functions. In SIGMOD. ACM, 131--141.
[19]
Massimo Pezzini, Donald Feinberg, Nigel Rayner, and Roxane Edjlali. 2021. Magic Quadrant for Cloud Database Management Systems. Gartner (2021, December 13) (2021), 1--37.
[20]
Johann Schleier-Smith. 2019. Serverless Foundations for Elastic Database Systems. In CIDR.
[21]
Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021. Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation. VLDB 15, 1 (2021), 85--97.
[22]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, et al. 2017. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In SIGMOD. ACM, 1041--1052.
[23]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, et al. 2018. Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. In SIGMOD. ACM, 789--796.
[24]
Midhul Vuppalapati, Justin Miron, Rachit Agarwal, et al. 2020. Building An Elastic Query Engine on Disaggregated Storage. In NSDI. 449--462.
[25]
Yifei Yang, Matt Youill, Matthew E. Woicik, et al. 2021. FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. VLDB 14, 11 (2021), 2101--2113.
[26]
Xiangyao Yu, Matt Youill, Matthew E. Woicik, et al. 2020. PushdownDB: Accelerating a DBMS Using S3 Computation. In ICDE. IEEE, 1802--1805.
[27]
Haitao Yuan and Guoliang Li. 2021. A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation. Data Science and Engineering 6, 1 (2021), 63--85.
[28]
Chao Zhang, Jiaheng Lu, Pengfei Xu, and Yuxing Chen. 2018. UniBench: A Benchmark for Multi-model Database Management Systems. In TPCTC, Vol. 11135. Springer, 7--23.
[29]
Yingqiang Zhang, Chaoyi Ruan, Cheng Li, et al. 2021. Towards Cost-Effective and Elastic Cloud Database Deployment via Memory Disaggregation. VLDB 14, 10 (2021), 1900--1912.
[30]
Xuanhe Zhou, Chengliang Chai, Guoliang Li, and Ji Sun. 2022. Database Meets Artificial Intelligence: A Survey. IEEE Transaction Knowledge Data Engineering 34, 3 (2022), 1096--1116.

Cited By

View all
  • (2024)HyBench: A New Benchmark for HTAP DatabasesProceedings of the VLDB Endowment10.14778/3641204.364120617:5(939-951)Online publication date: 1-Jan-2024
  • (2024)PACE: Poisoning Attacks on Learned Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36392922:1(1-27)Online publication date: 26-Mar-2024
  • (2023)PolarDB-IMCI: A Cloud-Native HTAP Database System at AlibabaProceedings of the ACM on Management of Data10.1145/35897851:2(1-25)Online publication date: 20-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 12
August 2022
551 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2022
Published in PVLDB Volume 15, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)320
  • Downloads (Last 6 weeks)17
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HyBench: A New Benchmark for HTAP DatabasesProceedings of the VLDB Endowment10.14778/3641204.364120617:5(939-951)Online publication date: 1-Jan-2024
  • (2024)PACE: Poisoning Attacks on Learned Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36392922:1(1-27)Online publication date: 26-Mar-2024
  • (2023)PolarDB-IMCI: A Cloud-Native HTAP Database System at AlibabaProceedings of the ACM on Management of Data10.1145/35897851:2(1-25)Online publication date: 20-Jun-2023
  • (2023)Disaggregated Database SystemsCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589403(37-44)Online publication date: 4-Jun-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media