Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3481646.3481648acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbdcConference Proceedingsconference-collections
research-article

A Comparative Study of MongoDB, ArangoDB and CouchDB for Big Data Storage

Published: 26 November 2021 Publication History

Abstract

A distinctive aspect of the current era is the ferocious amount of data that is generated and processed in a daily basis. There is no wonder that this epoch is generally characterized as the “Era of Big Data”. Thus, many enterprises and research initiatives strive to find a way to effectively and efficiently collect, store and analyze Big Data in order to improve their services and make efficient decisions. Those approaches refer to several domains such as healthcare, transportation, governance, or insurance. Towards this direction, in this paper we contribute into the selection of the most appropriate database for efficiently storing and retrieving Big Data. More specifically, taking into account the nature of Big Data and the main categories of databases that currently exist, three (3) NoSQL document-based databases were considered for this comparative study, namely the ArangoDB, the MongoDB and the CouchDB. The performance of these databases was measured based on specific metrics and criteria, including the total execution time for the same CRUD operations and their corresponding demands for resources, concluding to the most suitable database for storing Big Data.

References

[1]
SeedScientific - How Much Data Is Created Every Day?, https://seedscientific.com/how-much-data-is-created-every-day/
[2]
DAVENPORT, Thomas H.; DYCHÉ, Jill. Big data in big companies. International Institute for Analytics, 2013, 3: 1-31.
[3]
DASH, S., Shakyawar, S.K., Sharma, M. Big data in healthcare: management, analysis and future prospects. J Big Data 6, 54, 2019.
[4]
GOKALP, Mert Onuralp, Big data for industry 4.0: A conceptual framework. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2016. p. 431-434.
[5]
KAPASSA, Evgenia, An innovative ehealth system powered by 5G network slicing. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE, 2019. p. 7-12.
[6]
VICECONTI, Marco; HUNTER, Peter; HOSE, Rod. Big data, big knowledge: big data for personalized healthcare. IEEE journal of biomedical and health informatics, 2015, 19.4: 1209-1215.
[7]
KBIOASSIST, S., Crowdhealth: Holistic health records and big data analytics for health policy making and personalized health. Informatics Empowers Healthcare Transformation, 2017, 238: 19.
[8]
KYRIAZIS, Dimosthenis, The CrowdHEALTH project and the Hollistic Health Records: Collective Wisdom Driving Public Health Policies. Acta Informatica Medica, 2019, 27.5: 369.
[9]
KYRIAZIS, Dimosthenis, PolicyCLOUD: Analytics as a Service Facilitating Efficient Data-Driven Public Policy Management. In: IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Cham, 2020. p. 141-150.
[10]
LIU, Yi; PENG, Jiawen; YU, Zhihao. Big Data Platform Architecture under The Background of Financial Technology: In The Insurance Industry As An Example. In: Proceedings of the 2018 International Conference on Big Data Engineering and Technology. 2018. p. 31-35.
[11]
LIAN, Yanqi, Review on big data applications in safety research of intelligent transportation systems and connected/automated vehicles. Accident Analysis & Prevention, 2020, 146: 105711.
[12]
TRUICA, Ciprian-Octavian, Performance evaluation for CRUD operations in asynchronously replicated document oriented database. In: 2015 20th International Conference on Control Systems and Computer Science. IEEE, 2015. p. 191-196.
[13]
TANG, Ming; LIAO, Huchang. From conventional group decision making to large-scale group decision making: What are the challenges and how to meet them in big data era? A state-of-the-art survey. Omega, 2021, 100: 102141.
[14]
Gartner Glossary – Big Data, https://www.gartner.com/en/information-technology/glossary/big-data
[15]
MARJANI, Mohsen, Big IoT data analytics: architecture, opportunities, and open research challenges. IEEE access, 2017, 5: 5247-5261.
[16]
TUFEKCI, Zeynep. Big questions for social media big data: Representativeness, validity and other methodological pitfalls. In: Proceedings of the International AAAI Conference on Web and Social Media. 2014.
[17]
GHANDOUR, Ahmad. Big Data driven e-commerce architecture. International Journal of Economics, Commerce and Management, 2015, 3.5: 940-947.
[18]
MCAFEE, Andrew, Big data: the management revolution. Harvard business review, 2012, 90.10: 60-68.
[19]
SUN, Zhaohao, Big data with ten big characteristics. In: Proceedings of the 2nd International Conference on Big Data Research. 2018. p. 56-61.
[20]
STONEBRAKER, Michael. Errors in database systems, eventual consistency, and the cap theorem. Communications of the ACM, 2010.
[21]
GILBERT, Seth; LYNCH, Nancy. Perspectives on the CAP Theorem. Computer, 2012, 45.2: 30-36.
[22]
ABRAMOVA, Veronika; BERNARDINO, Jorge. NoSQL databases: MongoDB vs cassandra. In: Proceedings of the international C* conference on computer science and software engineering. 2013. p. 14-22.
[23]
TUDORICA, Bogdan George; BUCUR, Cristian. A comparison between several NoSQL databases with comments and notes. In: 2011 RoEduNet international conference 10th edition: Networking in education and research. IEEE, 2011. p. 1-5.
[24]
CHANDRA, Deka Ganesh. BASE analysis of NoSQL database. Future Generation Computer Systems, 2015, 52: 13-21.
[25]
CATTELL, Rick. Scalable SQL and NoSQL data stores. Acm Sigmod Record, 2011, 39.4: 12-27.
[26]
GYŐRÖDI, Cornelia, A comparative study: MongoDB vs. MySQL. In: 2015 13th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 2015. p. 1-6.
[27]
HAN, Jing, Survey on NoSQL database. In: 2011 6th international conference on pervasive computing and applications. IEEE, 2011. p. 363-366.
[28]
SHARMA, Vatika, Sql and nosql databases. International Journal of Advanced Research in Computer Science and Software Engineering, 2012, 2.8.
[29]
ABADI, Daniel J, Column-oriented database systems. Proceedings of the VLDB Endowment, 2009, 2.2: 1664-1665.
[30]
ANGLES, Renzo; GUTIERREZ, Claudio. Survey of graph database models. ACM Computing Surveys (CSUR), 2008, 40.1: 1-39.
[31]
LUTU, Patricia E. Nalwoga. Using Twitter Mentions and a Graph Database to Analyse Social Network Centrality. In: 2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2019. p. 155-159.
[32]
WEI-PING, Zhu, Chen. Using MongoDB to implement textbook management system instead of MySQL. In: 2011 IEEE 3rd International Conference on Communication Software and Networks. IEEE, 2011. p. 303-305.
[33]
JAMES, Blessing E.; ASAGBA, Prince Oghenekaro. Hybrid database system for big data storage and management. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol, 2017, 7.
[34]
CHANG, Yue Shan, Big data platform for air quality analysis and prediction. In: 2018 27th Wireless and Optical Communication Conference (WOCC). IEEE, 2018. p. 1-3.
[35]
KANG, Yong-Shin, MongoDB-based repository design for IoT-generated RFID/sensor big data. IEEE Sensors Journal, 2015, 16.2: 485-497.
[36]
ABOUTORABIª, Seyyed Hamid, Performance evaluation of SQL and MongoDB databases for big e-commerce data. In: 2015 International Symposium on Computer Science and Software Engineering (CSSE). IEEE, 2015. p. 1-7.
[37]
BAZILA BANU, A., Prediction of Children Diabetes by Autoregressive Integrated Moving Averages Model Using Big Data and Not Only SQL. Journal of Computational and Theoretical Nanoscience, 2019, 16.8: 3510-3513.
[38]
CUTRONA, Vincenzo, Semantically-enabled optimization of digital marketing campaigns. In: International Semantic Web Conference. Springer, Cham, 2019. p. 345-362.
[39]
NS, Patil, A Survey on Graph Database Management Techniques for Huge Unstructured Data. International Journal of Electrical & Computer Engineering (2088-8708), 2018, 8.2.
[40]
MILER, Mario, Two-tier architecture for web mapping with NoSQL database CouchDB. In: Geospatial Crossroads GI Forum. 2011. p. 62-71.
[41]
CHENG, Bin, Building a big data platform for smart cities: Experience and lessons from santander. In: 2015 IEEE International Congress on Big Data. IEEE, 2015. p. 592-599.
[42]
BOICEA, Alexandru; RADULESCU, Florin; AGAPIN, Laura Ioana. MongoDB vs Oracle–database comparison. In: 2012 third international conference on emerging intelligent data and web technologies. IEEE, 2012. p. 330-335.
[43]
MongoDB – Map Reduce, https://docs.mongodb.com/manual/core/map-reduce/
[44]
MongoDB – MongoDB Customers, https://www.mongodb.com
[45]
FERNANDES, Diogo, Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB. In: DATA. 2018. p. 373-380.
[46]
Mindk - ArangoDB: a perfect database for projects with a high level of uncertainty, https://www.mindk.com/blog/arangodb/
[47]
ArangoDB – ArangoDB Users, https://www.arangodb.com
[48]
CouchDB – CouchDB Specifications, https://couchdb.apache.org
[49]
IBM – Apache CouchDB, https://www.ibm.com/cloud/learn/couchdb
[50]
HG Insights – Companies Currently Using CouchDB https://discovery.hgdata.com/product/couchdb
[51]
STRACK, Beata, Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014.

Cited By

View all

Index Terms

  1. A Comparative Study of MongoDB, ArangoDB and CouchDB for Big Data Storage
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            ICCBDC '21: Proceedings of the 2021 5th International Conference on Cloud and Big Data Computing
            August 2021
            122 pages
            ISBN:9781450390408
            DOI:10.1145/3481646
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 26 November 2021

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. ArangoDB
            2. Big Data
            3. CouchDB
            4. Document-based
            5. MongoDB
            6. Storage

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Funding Sources

            • Operational Program Competitiveness, Entrepreneurship and Innovation

            Conference

            ICCBDC 2021

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)211
            • Downloads (Last 6 weeks)11
            Reflects downloads up to 19 Nov 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Comprehensive Security for IoT Devices with Kubernetes and Raspberry Pi ClusterElectronics10.3390/electronics1309161313:9(1613)Online publication date: 23-Apr-2024
            • (2024)A Self-Aware Digital Memory Framework Powered by Artificial IntelligenceIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33758345:7(3579-3594)Online publication date: Jul-2024
            • (2024)FGDB‐MLPPIET Communications10.1049/cmu2.1273718:4(309-321)Online publication date: 1-Mar-2024
            • (2024)Exploring business intelligence applications in the healthcare industry: A comprehensive analysisEgyptian Informatics Journal10.1016/j.eij.2024.10043825(100438)Online publication date: Mar-2024
            • (2023)EverAnalyzer: A Self-Adjustable Big Data Management Platform Exploiting the Hadoop EcosystemInformation10.3390/info1402009314:2(93)Online publication date: 3-Feb-2023
            • (2023)A Comparative Study of the Performance of Real time databases and Big data Analytics Frameworks2023 7th International Multi-Topic ICT Conference (IMTIC)10.1109/IMTIC58887.2023.10178651(1-7)Online publication date: 10-May-2023
            • (2023)A System of Data Assets Registration and Catalog Management2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)10.1109/BMSB58369.2023.10211143(1-6)Online publication date: 14-Jun-2023
            • (2023)A big data association rule mining based approach for energy building behaviour analysis in an IoT environmentScientific Reports10.1038/s41598-023-47056-113:1Online publication date: 13-Nov-2023
            • (2023)Interpretable Stroke Risk Prediction Using Machine Learning AlgorithmsIntelligent Sustainable Systems10.1007/978-981-19-7663-6_61(647-656)Online publication date: 25-Jan-2023
            • (2022)Automated Rule-Based Data Cleaning Using NLP2022 32nd Conference of Open Innovations Association (FRUCT)10.23919/FRUCT56874.2022.9953810(162-168)Online publication date: 9-Nov-2022
            • Show More Cited By

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media