Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

AsterixDB: a scalable, open source BDMS

Published: 01 October 2014 Publication History

Abstract

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.
Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.

References

[1]
Data, Data Everywhere. The Economist, February 25, 2010.
[2]
S. Alsubaiee, A. Behm, V. Borkar, Z. Heilbron, Y.-S. Kim, M. Carey, M. Dressler, and C. Li. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), June 2014.
[3]
A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantinou, and V. Tsotras. ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-world Models. Distributed and Parallel Databases, 29(3):185--216, 2011.
[4]
V. Borkar and M. Carey. A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits. IEEE Data Eng. Bull., 36(1):56--64, 2013.
[5]
V. Borkar, M. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. ICDE, 0: 1151--1162, 2011.
[6]
Y. Bu, V. Borkar, M. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Weimer, and R. Ramakrishnan. Scaling Datalog for Machine Learning on Big Data. CoRR, abs/1203.0160, 2012.
[7]
R. Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Rec., 39(4):12--27, May 2011.
[8]
D. DeWitt and J. Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85--98, June 1992.
[9]
R. Grover and M. Carey. Scalable Fault-Tolerant Data Feeds in AsterixDB. CoRR, abs/1405-1705, 2014.
[10]
F. Keller and S. Wendt. FMC: An Approach Towards Architecture-Centric System Development. In ECBS, pages 173--182, 2003.
[11]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, 2010.
[12]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The Log-Structured Merge-Tree (LSM-tree). Acta Inf., 33: 351--385, June 1996.
[13]
P. Pirzadeh, T. Westmann, and M. Carey. A Performance Study of Big Data Management Systems. in preparation.
[14]
AsterixDB Documentation. http://asterixdb.ics.uci.edu/documentation/.
[15]
Experiment Details. https://asterixdb.ics.uci.edu/pub/asterix14/experiments.html.
[16]
Apache Hadoop. http://hadoop.apache.org/.
[17]
Apache Hive. http://hive.apache.org/.
[18]
Hivesterix. http://code.google.com/p/hyracks/wiki/HivesterixUserManual028.
[19]
AsterixDB. http://asterixdb.ics.uci.edu/.
[20]
JSON. http://www.json.org/.
[21]
MongoDB. http://www.mongodb.org/.
[22]
Pregelix. http://hyracks.org/projects/pregelix/.
[23]
Apache VXQuery. http://vxquery.apache.org/.
[24]
XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/.

Cited By

View all
  • (2024)Memory Management in Complex Join Queries: A Re-evaluation StudyProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698565(933-942)Online publication date: 20-Nov-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • (2024)Benchmarking Learned and LSM Indexes for Data SortednessProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662764(16-22)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 14
October 2014
244 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2014
Published in PVLDB Volume 7, Issue 14

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Memory Management in Complex Join Queries: A Re-evaluation StudyProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698565(933-942)Online publication date: 20-Nov-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • (2024)Benchmarking Learned and LSM Indexes for Data SortednessProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662764(16-22)Online publication date: 9-Jun-2024
  • (2024)Multi-model query languages: taming the variety of big dataDistributed and Parallel Databases10.1007/s10619-023-07433-142:1(31-71)Online publication date: 1-Mar-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00863-y33:6(1813-1836)Online publication date: 1-Nov-2024
  • (2024)Towards flexibility and robustness of LSM treesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00826-933:4(1105-1128)Online publication date: 1-Jul-2024
  • (2023)Perseid: A Secondary Indexing Mechanism for LSM-Based Storage SystemsACM Transactions on Storage10.1145/363328520:2(1-28)Online publication date: 17-Nov-2023
  • (2023)NOCAP: Near-Optimal Correlation-Aware Partitioning JoinsProceedings of the ACM on Management of Data10.1145/36267391:4(1-27)Online publication date: 12-Dec-2023
  • (2023)A Model and Survey of Distributed Data-Intensive SystemsACM Computing Surveys10.1145/360480156:1(1-69)Online publication date: 26-Aug-2023
  • (2023)Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management SystemsACM SIGMOD Record10.1145/3604437.360446052:1(104-113)Online publication date: 8-Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media