Abstract
The variety of data is one of the most challenging issues for the research and practice in data management. The so-called multi-model data are naturally organized in different, but mutually linked formats and models, including structured, semi-structured, and unstructured. In this position paper we discuss the so far neglected, but from the point of view of real-world applications important aspect of evolution management of multi-model data. We provide a motivation scenario and we discuss key related challenges, such as multi-model data modelling, intra vs. inter model changes, global and local evolution operations, eager vs. lazy migration, and schema inference .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that the UPDATE command should be done for all the key/value records.
- 2.
For a concrete implementation, the definitions of identifier and value must still be specified.
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)
Akoka, J., Comyn-Wattiau, I., Prat, N.: A four V’s design approach of NoSQL graph databases. In: de Cesare, S., Frank, U. (eds.) ER 2017. LNCS, vol. 10651, pp. 58–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70625-2_6
Atzeni, P., Bugiotti, F., Rossi, L.: Uniform access to NoSQL systems. Inf. Syst. 43, 117–133 (2014)
Baader, F., Calvanese, D., McGuinness, D., Patel-Schneider, P., Nardi, D.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)
Baazizi, M.-A., Colazzo, D., Ghelli, G., Sartiani, C.: Parametric schema inference for massive JSON datasets. VLDB J. 28(4), 497–521 (2019)
Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web 4(4), 14:1–14:32 (2010)
Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2), 11:1–11:47 (2010)
Bonaque, R., et al.: Mixed-instance querying: a lightweight integration architecture for data journalism. PVLDB 9(13), 1513–1516 (2016)
Bruneliere, H., Perez, J.G., Wimmer, M., Cabot, J.: EMF views: a view mechanism for integrating heterogeneous models. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 317–325. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_23
Bugiotti, F., Bursztyn, D., Deutsch, A., Ileana, I., Manolescu, I.: Invisible glue: scalable self-tuning multi-stores. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 4–7 January 2015, Online Proceedings (2015). www.cidrdb.org
Bugiotti, F., Bursztyn, D., Deutsch, A., Manolescu, I., Zampetakis, S.: Flexible hybrid stores: constraint-based rewriting to the rescue. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, 16–20 May 2016, pp. 1394–1397 (2016)
Bugiotti, F., Cabibbo, L., Atzeni, P., Torlone, R.: Database design for NoSQL systems. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 223–231. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12206-9_18
Chen, P.: The entity-relationship model - toward a unified view of data. ACM Trans. Database Syst. 1(1), 9–36 (1976)
Chillón, A.H., Morales, S.F., Sevilla, D., Molina, J.G.: Exploring the visualization of schemas for aggregate-oriented NoSQL databases. In: Proceedings of the ER Forum 2017 and the ER 2017 Demo Track co-located with the 36th International Conference on Conceptual Modelling (ER 2017), Valencia, Spain, 6–9 November 2017, CEUR Workshop Proceedings, vol. 1979, pp. 72–85. CEUR-WS.org (2017)
Curino, C., Moon, H.J., Tanca, L., Zaniolo, C.: Schema evolution in wikipedia - toward a web information system benchmark. In: ICEIS 2008 - Proceedings of the Tenth International Conference on Enterprise Information Systems, Volume DISI, Barcelona, Spain, 12–16 June 2008, pp. 323–332 (2008)
Daniel, G., Sunyé, G., Cabot, J.: UMLtoGraphDB: mapping conceptual schemas to graph databases. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 430–444. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_33
De Virgilio, R., Maccioni, A., Torlone, R.: Model-driven design of graph databases. In: Yu, E., Dobbie, G., Jarke, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8824, pp. 172–185. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12206-9_14
DeWitt, D.J., et al.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, 22–27 June 2013, pp. 1255–1266. ACM (2013)
Gallinucci, E., Golfarelli, M., Rizzi, S.: Schema profiling of document-oriented databases. Inf. Syst. 75, 13–25 (2018)
Gallinucci, E., Golfarelli, M., Rizzi, S., Abelló, A., Romero, O.: Interactive multidimensional modeling of linked data for exploratory OLAP. Inf. Syst. 77, 86–104 (2018)
Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: a system for extracting document type descriptors from XML documents. SIGMOD Rec. 29(2), 165–176 (2000)
Génova, G., Llorens, J., Martínez, P.: Semantics of the minimum multiplicity in ternary associations in UML. In: Gogolla, M., Kobryn, C. (eds.) UML 2001. LNCS, vol. 2185, pp. 329–341. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45441-1_25
Gold, E.M.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Hacigümüs, H., Sankaranarayanan, J., Tatemura, J., LeFevre, J., Polyzotis, N.: Odyssey: a multi-store system for evolutionary analytics. PVLDB 6(11), 1180–1181 (2013)
Halpin, T.: Object-Role Modeling Workbook: Data Modeling Exercises Using ORM and NORMA, 1st edn. Technics Publications, LLC, USA (2015)
Herrmann, K., Voigt, H., Rausch, J., Behrend, A., Lehner, W.: Robust and simple database evolution. Inf. Syst. Front. 20(1), 45–61 (2018)
Holubová, I., Scherzinger, S.: Unlocking the potential of nextgen multi-model databases for semantic big data projects. In: Proceedings of the International Workshop on Semantic Big Data, SBD 2019, New York, NY, USA, pp. 6:1–6:6. ACM (2019)
Keet, C.M., Fillottrani, P.R.: Toward an ontology-driven unifying metamodel for UML class diagrams, EER, and ORM2. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 313–326. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41924-9_26
Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 481–495. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_36
Kepner, J., et al.: Associative array model of SQL, NoSQL, and NewSQL databases. In: HPEC 2016: Proceedings of the High Performance Extreme Computing Conference, pp. 1–9. IEEE (2016)
Klettke, M., Awolin, H., Störl, U., Müller, D., Scherzinger, S.: Uncovering the evolution history of data lakes. In: 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, 11–14 December 2017, pp. 2462–2471. IEEE Computer Society (2017)
Klettke, M., Störl, U., Shenavai, M., Scherzinger, S.: NoSQL schema evolution and big data migration at scale. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December 2016, pp. 2764–2774. IEEE Computer Society (2016)
LeFevre, J., Sankaranarayanan, J., Hacigumus, H., Tatemura, J., Polyzotis, N., Carey, M.J.: MISO: souping up big data query processing with a multistore system. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 1591–1602. ACM (2014)
Liu, Z.H., Lu, J., Gawlick, D., Helskyaho, H., Pogossiants, G., Wu, Z.: Multi-model database management systems - a look forward. In: Gadepally, V., Mattson, T., Stonebraker, M., Wang, F., Luo, G., Teodoro, G. (eds.) DMAH/Poly -2018. LNCS, vol. 11470, pp. 16–29. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14177-6_2
Lu, J., Holubová, I.: Multi-model data management: what’s new and what’s next? In: EDBT 2017: Proceedings of the 20th International Conference on Extending Database Technology, pp. 602–605 (2017)
Lu, J., Holubová, I.: Multi-model databases: a new journey to handle the variety of data. ACM Comput. Surv. 52(3), 55:1–55:38 (2019)
Lu, J., Holubová, I., Cautis, B.: Multi-model databases and tightly integrated polystores: current practices, comparisons, and open challenges. In: CIKM 2018: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2301–2302 (2018)
Mlýnková, I., Nečaský, M.: Heuristic methods for inference of XML schemas: lessons learned and open issues. Informatica Lith. Acad. Sci. 24(4), 577–602 (2013)
OMG.: Business Process Model and Notation (BPMN), Version 2.0. OMG Standard, Object Management Group, January 2011
Pokorný, J.: Conceptual and database modelling of graph databases. In: IDEAS 2016: Proceedings of the 20th International Database Engineering & Applications Symposium, New York, NY, USA, pp. 370–377. ACM (2016)
Rumbaugh, J., Jacobson, I., Booch, G.: Unified Modeling Language Reference Manual. Pearson Higher Education (2004)
Saur, K., Dumitras, T., Hicks, M.W.: Evolving NoSQL Databases Without Downtime. CoRR, abs/1506.08800 (2015)
Scherzinger, S., Klettke, M., Störl, U.: Managing schema evolution in NoSQL data stores. In Proceedings of DBPL 2013: Proceedings of the 14th International Symposium on Database Programming Languages (2013)
Schildgen, J., Lottermann, T., Deßloch, S.: Cross-system NoSQL data transformations with NotaQL. In: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2016, New York, NY, USA, pp. 5:1–5:10. ACM (2016)
Sevilla Ruiz, D., Morales, S.F., García Molina, J.: Inferring versioned schemas from NoSQL databases and its applications. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 467–480. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_35
Störl, U., Müller, D., Tekleab, A., Tolale, S., Stenzel, J., Klettke, M., Scherzinger, S.: Curating variational data in application development. Proc. ICDE 2018, 1605–1608 (2018)
Tian, Y., Zou, T., Ozcan, F., Goncalves, R., Pirahesh, H.: Joins for hybrid warehouses: exploiting massive parallelism in hadoop and enterprise data warehouses. In: Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, 23–27 March 2015, pp. 373–384. OpenProceedings.org (2015)
Acknowledgements
This work was partly supported by the German Research Foundation (Deutsche Forschungsgemeinschaft (DFG)), grant number 385808805 (M. Klettke, U. Störl) and the Charles University project PROGRES Q48 (I. Holubová). We want to thank Stefanie Scherzinger and Mark Lukas Möller for numerous interesting and helpful discussions and several comments on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Holubová, I., Klettke, M., Störl, U. (2019). Evolution Management of Multi-model Data . In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2019 2019. Lecture Notes in Computer Science(), vol 11721. Springer, Cham. https://doi.org/10.1007/978-3-030-33752-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-33752-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33751-3
Online ISBN: 978-3-030-33752-0
eBook Packages: Computer ScienceComputer Science (R0)