Computer Science and Information Systems 2024 Volume 21, Issue 4, Pages: 1913-1961
https://doi.org/10.2298/CSIS240229065B
Full text ( 2802 KB)
Automatic conceptual database design based on heterogeneous source artifacts
Banjac Goran (Faculty of Electrical Engineering, University of Banja Luka, Banja Luka, Bosnia and Herzegovina), goran.banjac@etf.unibl.org
Brđanin Dražen (Faculty of Electrical Engineering, University of Banja Luka, Banja Luka, Bosnia and Herzegovina), drazen.brdjanin@etf.unibl.org
Banjac Danijela (Faculty of Electrical Engineering, University of Banja Luka, Banja Luka, Bosnia and Herzegovina), danijela.banjac@etf.unibl.org
The article presents an approach to the automatic derivation of conceptual database models from heterogeneous source artifacts. The approach is based on the integration of conceptual database models that are derived from source artifacts of one single type by already existing tools, whereby those models possess limited certainty given their limited completeness and correctness. The uncertainty of the automatically derived models from specific source artifacts is expressed and managed through the effectiveness measure of the generation of specific concepts of the input conceptual database models. The approach is implemented by the DBomnia tool – the first online web-based tool enabling automatic derivation of conceptual database models from heterogeneous source artifacts (business process models and textual specifications). DBomnia employs other pre-existing tools to derive conceptual models from sources of the same type and then integrates those models. The case study-based evaluation proves that the implemented approach enables effective automatic derivation of the conceptual database model from a set of heterogeneous source artifacts. Moreover, the automatic derivation of the conceptual database model from a set of heterogeneous source artifacts is more effective than each independent automatic derivation of the conceptual database model from sources of one single type only.
Keywords: AMADEOS, DBomnia, Schema integration, Schema matching, Schema merging, TexToData, UML class diagram, Uncertain schema
Show references
Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10(7), 253-260 (1974)
Anam, S., Kim, Y.S., Kang, B., Liu, Q.: Designing a knowledge-based schema matching system for schema mapping. Australasian Data Mining Conference (1 2015), https://figshare.utas.edu.au/articles/conferencecontribution/Designing_a_knowledge_based_schema_matching_system_for_schema_mapping/23095379
Axler, S.: Linear Algebra Done Right. Springer Cham (2023)
Banjac, G., Brdjanin, D., Banjac, D.: Towards automatic conceptual database design based on heterogeneous source artifacts. In: Abelló, A. et al. (ed.) New Trends in Database and Information Systems. pp. 487-498. Springer Nature Switzerland, Cham (2023)
Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18(4), 323-364 (1986)
Berlin, J., Motro, A.: Database schema matching using machine learning with feature selection. In: Pidduck, A.B., Ozsu, M.T., Mylopoulos, J., Woo, C.C. (eds.) Advanced Information Systems Engineering. pp. 452-466. Springer Berlin Heidelberg, Berlin, Heidelberg (2002)
Bernstein, P., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. Proc. VLDB Endow. 4(11), 695-701 (2011)
Bernstein, P.A., Halevy, A.Y., Pottinger, R.A.: A vision for management of complex models. SIGMOD Rec. 29(4), 55-63 (dec 2000), https://doi.org/10.1145/369275.369289
Brdjanin, D., Maric, S.: An Approach to Automated Conceptual Database Design Based on the UML Activity Diagram. Computer Science and Information Systems 9(1), 249-283 (2012)
Brdjanin, D., Maric, S.: Model-driven Techniques for Data Model Synthesis. Electronics 17(2), 130-136 (2013)
Brdjanin, D., Vukotic, A., Banjac, D., Banjac, G., Maric, S.: Automatic derivation of the initial conceptual database model from a set of business process models. Computer Science and Information Systems 19(1), 455-493 (2022)
Brdjanin, D., Banjac, G., Babic, N., Golubovic, N.: Towards the speech-driven database design. In: Proc. of TELFOR 2022. pp. 1-4. IEEE (2022)
Brdjanin, D., Grumic, M., Banjac, G., Miscevic, M., Dujlovic, I., Kelec, A., Obradovic, N., Banjac, D., Volas, D., Maric, S.: Towards an online multilingual tool for automated conceptual database design. In: Braubach, L., et al. (eds.) Intelligent Distributed Computing XV. pp. 144- 153. Springer (2023)
Bulygin, L.: Combining lexical and semantic similarity measures with machine learning approach for ontology and schema matching problem. In: Proceedings of the XX International Conference “Data Analytics and Management in Data Intensive Domains”(DAMDID/RCDL’2018). pp. 245-249 (2018)
Chen, P.: English sentence structure and entity-relationship diagrams. Information Sciences 29(2-3), 127-149 (1983)
Choobineh, J., Mannino, M., Nunamaker, J., Konsynsky, B.: An expert database design system based on analysis of forms. IEEE Transaction on Software Engineering 14(2), 242-253 (1988)
Choobineh, J., Lo, A.W.: CABSYDD: Case-based system for database design. Journal of Management Information Systems 21(3), 281-314 (2004)
Cohen,W.W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: Proc. of KDD 2003,Workshop on Data Cleaning, Record Linkage, and Object Consolidation (2003)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171-176 (1964)
Date, C.: An Introduction to Database Systems, 8th edn. Addison-Wesley (2003)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: a machine-learning approach. SIGMOD Rec. 30(2), 509-520 (may 2001), https://doi.org/10.1145/376284.375731
Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: (not) yet another matcher. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. p. 1537-1540. CIKM ’09, Association for Computing Machinery, New York, NY, USA (2009), https://doi.org/10.1145/1645953.1646165
Friedman, C., Sideli, R.: Tolerating spelling errors during patient validation. Computers and Biomedical Research 25(5), 486-509 (1992)
Gali, N., Mariescu-Istodor, R., Hostettler, D., Fränti, P.: Framework for syntactic string similarity measures. Expert Systems with Applications 129, 169-185 (2019)
Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162(3), 705-708 (1982)
Hamming, R.W.: Error detecting and error correcting codes. The Bell System Technical Journal 29(2), 147-160 (1950)
Harmain, H., Gaizauskas, R.: CM-Builder: A Natural Language-Based CASE Tool for Object- Oriented Analysis. Automated Software Eng. 10(2), 157-181 (2003)
Hartmann, S., Link, S.: English sentence structures and EER modeling. In: Proc. of the 4th Asia-Pacific conf. on conceptual modelling - Vol. 67. pp. 27-35 (2007)
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association 84(406), 414-420 (1989)
Jouault, F., Allilaire, F., Bezivin, J., Kurtev, I.: ATL: A model transformation tool. Science of Computer Programming 72(1-2), 31-39 (2008)
Kriouile, A., Addamssiri, N., Gadi, T.: An MDA Method for Automatic Transformation of Models from CIM to PIM. American J. of Software Eng. and Applications 4(1), 1-14 (2015)
Levenshtein, I.V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707-710 (1966)
Lukovic, I., Mogin, P., Pavicevic, J., Ristic, S.: An approach to developing complex database schemas using form types. Software: Practice & Experience 37(15), 1621-1656 (2007)
Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: 21st International Conference on Data Engineering (ICDE’05). pp. 57-68 (2005)
Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with cupid. In: Proc. of VLDB 2001. pp. 49-58. Morgan Kaufmann (2001)
Magnani, M., Rizopoulos, N., Mc.Brien, P., Montesi, D.: Schema integration based on uncertain semantic mappings. In: Conceptual Modeling - ER 2005. pp. 31-46. Springer (2005)
Malakasiotis, P., Androutsopoulos, I.: Learning textual entailment using svms and string similarity measures. In: Proc. of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing. p. 42-47. Association for Computational Linguistics, USA (2007)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443-453 (1970)
Nikiforova, O., Gusarovs, K., Gorbiks, O., Pavlova, N.: BrainTool: A tool for generation of the UML class diagrams. In: Proc. of ICSEA 2012. pp. 60-69. IARIA (2012)
Omar, N., Hanna, P., McKevitt, P.: Heuristics-based entity-relationship modelling through natural language processing. In: Proc. of AICS 2004. pp. 302-313 (2004)
OMG: Unified Modeling Language (OMG UML), v2.5. OMG (2015)
Overmyer, S.P., Benoit, L., Owen, R.: Conceptual modeling through linguistic analysis using LIDA. In: Proc. of ICSE 2001. pp. 401-410. IEEE (2001)
Pottinger, R.A., Bernstein, P.A.: Merging models based on given correspondences. In: Proceedings 2003 VLDB Conference, pp. 862-873. Morgan Kaufmann, San Francisco (2003), https://www.sciencedirect.com/science/article/pii/B9780127224428500811
Purao, S.: APSARA: A tool to automate system design via intelligent pattern retrieval and synthesis. SIGMIS Database 29(4), 45-57 (1998)
Rezaei, M., Fränti, P.: Matching similarity for keyword-based clustering. In: Fränti, P. et al. (ed.) Structural, Syntactic, and Statistical Pattern Recognition. pp. 193-202. Springer (2014)
Rodrigues, D., da Silva, A., Rodrigues, R., dos Santos, E.: Using active learning techniques for improving database schema matching methods. In: 2015 International Joint Conference on Neural Networks (IJCNN). pp. 1-8 (2015)
Rodriguez, A., Garcia-Rodriguez de Guzman, I., Fernandez-Medina, E., Piattini, M.: Semiformal transformation of secure business processes into analysis class and use case models: An MDA approach. Information and Software Technology 52(9), 945-971 (2010)
Sahay, T., Mehta, A., Jadon, S.: Schema matching using machine learning. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN). pp. 359-366 (2020)
Sheetrit, E., Brief, M., Mishaeli, M., Elisha, O.: Rematch: Retrieval enhanced schema matching with llms (2024), https://arxiv.org/abs/2403.01567
Shraga, R., Gal, A.: Powarematch: A quality-aware deep learning approach to improve human schema matching. J. Data and Information Quality 14(3) (may 2022), https://doi.org/10.1145/3483423
Shraga, R., Gal, A., Roitman, H.: Adnev: cross-domain schema matching using deep similarity matrix adjustment and evaluation. Proc. VLDB Endow. 13(9), 1401-1415 (may 2020), https://doi.org/10.14778/3397230.3397237
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195-197 (1981)
Spasic, Z., Vukotic, A., Brdjanin, D., Banjac, D., Banjac, G.: UML-based forward database engineering. In: Proc. of INFOTEH 2023. pp. 1-6. IEEE (2023)
Sugumaran, V., Storey, V.C.: Ontologies for conceptual modeling: their creation, use, and management. Data & Knowledge Engineering 42(3), 251-271 (2002)
Tan, H.B.K., Yang, Y., Blan, L.: Systematic Transformation of functional analysis model in Object Oriented design and Implementation. IEEE Trans. on Soft. Eng. 32(2), 111-135 (2006)
Thonggoom, O.: Semi-automatic conceptual data modelling using entity and relationship instance repositories. PhD Thesis, Drexel University (2011)
Unal, O., Afsarmanesh, H.: Using linguistic techniques for schema matching. In: Filipe, J. et al. (ed.) Proc. of ICSOFT 2006. pp. 115-120. INSTICC Press (2006)
Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. (1990), https://eric.ed.gov/?id=ED325505
Zhang, J., Shin, B., Choi, J.D., Ho, J.C.: Smat: An attention-based deep learning solution to the automation of schema matching. Symposium on Advances in Databases and Information Systems (2021)
Zhang, Y., Di, M., Luo, H., Xu, C., Tsai, R.T.H.: Smutf: Schema matching using generative tags and hybrid features (2024), https://arxiv.org/abs/2402.01685
Zhang, Y., Floratou, A., Cahoon, J., Krishnan, S., Müller, A.C., Banda, D., Psallidas, F., Patel, J.M.: Schema matching using pre-trained language models. In: 2023 IEEE 39th International Conference on Data Engineering (ICDE). pp. 1558-1571 (2023)