Abstract
Over the past decade, digital transformation has led to the evolution of databases towards Big Data. A need to collect and analyze data from different sources has emerged. At the same time, traditional decision support systems are unable to meet the growing needs of modern businesses to integrate and analyze a wide variety of generated data. As a result, most organizations need to convert their data stored in relational systems to NoSQL or "Not only SQL" systems that are based on flexible models and schemas. Our work is part of a medical application that must allow health professionals to analyze complex data for decision making. We propose mechanisms to extract data from a Data Lake and store them in a NoSQL Data Warehouse. This will allow to perform, in a second time, decisional analysis facilitated by the features offered by NoSQL systems (richness of data structures, query language, access performances). In this article, we present a process for ingesting data from a Data Lake into a Data Warehouse. The ingestion consists, first, in transferring relational and NoSQL DBs extracted from the Data Lake into a single NoSQL DB (the Data Warehouse), second, in merging so-called "similar" classes and third, in converting the links into references between objects. To automate this process, we used the Model Driven Architecture (MDA) which provides a schema transformation environment. From the physical schemas describing a Data Lake, we propose transformation rules that allow to create a Data Warehouse stored under a document-oriented NoSQL system. An experimentation has been performed for a medical application.
Similar content being viewed by others
Notes
OrientDB https://orientdb.org/.
Object Data Management Group http://www.odbms.org/odmg-standard/.
The Object Management Group (OMG) https://www.omg.org.
Query/View/Transform (QVT) https://www.omg.org/spec/QVT/1.2/PDF.
Unified Modeling Language (UML) https://www.omg.org/spec/UML/2.5.1/About-UML/.
The principles of exploitation of the ontology are not detailed in this article.
MySQL https://www.mysql.com.
PostgreSQL https://www.postgresql.org.
Mon0goDB https://www.mongodb.com/.
The Object Management Group (OMG) https://www.omg.org.
Eclipse Modeling Framework (EMF) https://www.eclipse.org/modeling/emf.
XML Metadata Interchange (XMI) https://www.omg.org/spec/XMI/2.5.1/About-XMI/.
References
Couto J, Borges O, Ruiz DD, Marczak S, Prikladnicki R. A mapping study about Data Lakes: an improved definition and possible architectures. SEKE. 2019. https://doi.org/10.18293/SEKE2019-129.
DB-Engines Ranking. DB-Engines. https://db-engines.com/en/ranking/document+store. Accessed 17 Jan 2022.
Kuszera EM, Peres LM, Fabro MDD. Toward RDB to NoSQL: transforming data with metamorfose framework. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, Limassol Cyprus, April 2019, pp 456–463. https://doi.org/10.1145/3297280.3299734.
Mahmood AA. Automated algorithm for data migration from relational to NoSQL databases. Al-Nahrain J Eng Sci. 2018;21:60–5.
Stanescu L, Brezovan M, Burdescu DD. Automatic mapping of MySQL databases to NoSQL MongoDB. 2016, pp 837–840. https://doi.org/10.15439/2016F45.
Liyanaarachchi G, Kasun L, Nimesha M, Lahiru K, Karunasena A. MigDB—relational to NoSQL mapper. In: 2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS), 2016; pp 1–6. https://doi.org/10.1109/ICIAFS.2016.7946576.
Mallek H, Ghozzi F, Teste O, Gargouri F. BigDimETL with NoSQL Database. Procedia Comput Sci. 2018;126:798–807. https://doi.org/10.1016/j.procs.2018.08.014.
Yangui R, Nabli A, Gargouri F. ETL based framework for NoSQL warehousing. In: Information systems. Cham: Springer; 2017. p. 40–53. https://doi.org/10.1007/978-3-319-65930-5_4.
Wijaya YS, Arman AA. A framework for data migration between different datastore of NoSQL Database. In: 2018 International Conference on ICT for Smart Society (ICISS), 2018, p 1–6. https://doi.org/10.1109/ICTSS.2018.8549944.
Dabbèchi H, Haddar N, Elghazel H, Haddar K. Social media data integration: from data lake to NoSQL Data Warehouse. 2021; pp 701–710. https://doi.org/10.1007/978-3-030-71187-0_64.
Candel CJF, Ruiz DS, García-Molina JJ. A unified metamodel for NoSQL and relational databases. ArXiv210506494 Cs, 2021, [Online]. http://arxiv.org/abs/2105.06494. Accessed 21 Jun 2021.
Özsu MT, Valduriez P. Distributed and parallel database systems. ACM Comput Surv. 1996;28(1):125–8. https://doi.org/10.1145/234313.234368.
Azqueta-Alzúaz A, Patiño-Martinez M, Brondino I, Jimenez-Peris R. Massive data load on distributed database systems over HBase. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2017, pp 776–779. https://doi.org/10.1109/CCGRID.2017.124.
Machado F, Saccol D, Piveta E, Padilha R, Ribeiro E. A text similarity-based process for extracting JSON conceptual schemas. In: Proceedings of the 23rd International Conference on Enterprise Information Systems, Online Streaming, 2021, pp 264–271. https://doi.org/10.5220/0010475102640271.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jemmali, R., Abdelhedi, F. & Zurfluh, G. DLToDW: Transferring Relational and NoSQL Databases from a Data Lake. SN COMPUT. SCI. 3, 381 (2022). https://doi.org/10.1007/s42979-022-01287-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01287-7