Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3297662.3365809acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

Matching disparate dimensions for analytical integration of heterogeneous data sources

Published: 10 January 2020 Publication History

Abstract

The paper presents the first steps towards an authorial integration methodology for heterogeneous data. Exposing information from multiple heterogeneous data sources demands a global (mediated) schema. We need a model to couple with the mismatches between schemata of different sources and to provide uniform access to the data. The virtual global schema is apparently more convenient for assembling big data sources because of useless time consumption during the processes of materialization and synchronization. Thus, an integral analytical model has been proposed as the global schema of heterogeneous data sources. The suggested model provides virtual integration of complex and diverse information for further analytical processing. It combines the original multidimensional design and lattice structure according to the formal conceptual analysis. The main goal of the paper is to suggest an approach to automatic mapping between the schemata of the disparate data sources and virtual integral analytical model with human moderation.

References

[1]
Alberto Abelló, Jérôme Darmont, Lorena Etcheverry, Matteo Golfarelli, Jose-Norberto Mazón, Felix Naumann, Torben Pedersen, Stefano Bach Rizzi, Juan Trujillo, Panos Vassiliadis, and Gottfried Vossen. 2013. Fusion Cubes: towards self-service Business Intelligence. Int. J. Data Warehous. Min. 9, 2 (2013), 66--88.
[2]
Alberto Abelló, Oscar Romero, Torben Bach Pedersen, Rafael Berlanga, Victoria Nebot, María José Aramburu, and Alkis Simitsis. 2015. Using semantic web technologies for exploratory OLAP: A survey. IEEE Trans. Knowl. Data Eng. 27, 2 (February 2015), 571--588.
[3]
S Abiteboul, P Buneman, and D Suciu. 1999. Data on the Web: from relations to semistructured data and XML. (1999), 258.
[4]
Antonio Albano. 2015. Decision Support Databases Essentials. Univ. Pisa, Dep. Comput. Sci. (2015), 138. Retrieved April 30, 2019 from http://pages.di.unipi.it/ghelli/bd2/DWessential.pdf
[5]
Paul Alpar and Michael Schulz. 2016. Self-Service Business Intelligence. Bus. Inf. Syst. Eng. 58, 2 (2016), 151--155.
[6]
Doan AnHai, Halevy Alon, and Ives Zachary. 2012. Principles of Data Integration. Elsevier.
[7]
Michael Benedikt, Bernardo Cuenca Grau, and Egor V. Kostylev. 2018. Logical foundations of information disclosure in ontology-based data integration. Artif. Intell. 262, (2018), 52--95.
[8]
Garrett Birkhoff and Saunders Mac Lane. 1998. A survey of modern algebra. AK Peters/CRC Press.
[9]
DT Chang. 2000. Common Warehouse Metamodel (CWM), UML and XML. Vor. auf Meta Data Conf. (March 19--23, 2000) (2000), 56.
[10]
EF Codd and SB Codd. 1993. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Codd Date (1993).
[11]
Claudia Diamantini, Domenico Potena, and Emanuele Storti. 2018. Multidimensional query reformulation with measure decomposition. Inf. Syst. 78, (2018), 23--39.
[12]
Henrique José Rosa Dias. 2018. Augmenting data warehousing architectures with hadoop.
[13]
Ayoub Elotmani and Pr. Halima Bouden. 2017. Automating the Conceptual Modeling of Data Warehouse in Information System ERP Type. Trans. Mach. Learn. Artif. Intell. 5, 4 (2017).
[14]
MATTEO GOLFARELLI, DARIO MAIO, and STEFANO RIZZI. 1998. the Dimensional Fact Model: a Conceptual Model for Data Warehouses. Int. J. Coop. Inf. Syst. 07, 02n03 (1998), 215--247.
[15]
Jiawei Han. 2013. OLAP Mining: An Integration of OLAP with Data Mining. In Data Mining and Reverse Engineering. 3--20.
[16]
Dilshod Ibragimov, Katja Hose, Torben Bach Pedersen, and Esteban Zimányi. 2014. Towards Exploratory OLAP over Linked Open Data--A Case Study. Enabling Real-Time Bus. Intell. (2014), 1--18.
[17]
Pradeeban Kathiravelu, Ashish Sharma, Helena Galhardas, Peter Van Roy, and Luís Veiga. 2018. On-demand big data integration: A hybrid ETL approach for reproducible scientific research. Distrib. Parallel Databases September (2018).
[18]
Ralph Kimball and Margy Ross. 2011. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.
[19]
Ralph Kimball and Margy Ross. 2013. The Data Warehouse Toolkit, The Definitive Guide to Dimensional Modeling.
[20]
A.V. Korobko and T.G. Penkova. 2010. On-line analytical processing based on formal concept analysis. In Procedia Computer Science.
[21]
A.V. Korobko, T.G. Penkova, V.V. Nicheporchuk, and A.S. Mihalev. 2013. The integral OLAP-model of the emergency risk estimation in the case of Krasnoyarsk region. In 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2013 -Proceedings.
[22]
Anna Korobko. 2016. Technology of Exploratory OLAP Based on the Integral Analytical Model. Adv. Intell. Syst. Res. 133, (2016), 43--47.
[23]
Anna KOROBKO and Aleksei KOROBKO. 2017. Multidimensional Design from XML Sources for the Integral Analytical Model. DEStech Trans. Comput. Sci. Eng. aiie (2017).
[24]
Michael Mireku Kwakye, Iluju Kiringa, Herna Lydia Viktor, and Herna L Viktor. 2014. Merging Multidimensional Data Models: A Practical Approach for Schema and Data Instances. Retrieved April 29, 2019 from https://www.researchgate.net/publication/236861867
[25]
Steve Lohr. 2011. When There's No Such Thing as Too Much Information. New York Times Aprill 23 (2011).
[26]
Silverio Martínez-Fernández, Petar Jovanovic, Xavier Franch, and Andreas Jedlitschka. 2018. Towards Automated Data Integration in Software Analytics. (2018).
[27]
Omg and Object Management Group. 2014. Object Management Group, Model Driven Architecture (MDA). OMG Doc. ormsc/2014-06-01 2.0, June (2014), 1--15.
[28]
Sellappan Palaniappan and Cs Ling. 2008. Clinical decision support using OLAP with data mining. IJCSNS Int. J. Comput. Sci. Netw. Secur. 8, 9 (2008), 290--296. Retrieved from http://paper.ijcsns.org/07_book/200809/20080942.pdf
[29]
Juan Manuel Pe, Berlanga Rafael, Maria Jose Aramburu, and Torben Bach Pederson. 2008. Integrating Data Warehouses with Web Data: A Survey. IEEE Trans. Knowl. Data Eng. 20, 7 (2008), 940--955.
[30]
Torben Bach Pedersen and C. S. Jensen. 2001. Multidimensional database technology. Computer (Long. Beach. Calif). 34, 12 (2001).
[31]
Fabian Pedregosa FABIANPEDREGOSA, Vincent Michel, Olivier Grisel OLIVIERGRISEL, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Jake Vanderplas, David Cournapeau, Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Bertrand Thirion, Olivier Grisel, Vincent Dubourg, Alexandre Passos, Matthieu Brucher, Matthieu Perrot andÉdouardand, andÉdouard Duchesnay, and FRÉdouard Duchesnay EDOUARDDUCHESNAY. 2011. Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot. Retrieved May 3, 2019 from http://scikit-learn.sourceforge.net.
[32]
T. Penkova and A. Korobko. 2013. Constructing the integral OLAP-model for scientific activities based on FCA.
[33]
T.G. Penkova, A.V. Korobko, V.V. Nicheporchuk, and L.F. Nozhenkova. 2016. On-line control of the state of technosphere and environment objects in Krasnoyarsk region based on monitoring data. Int. J. Knowledge-Based Intell. Eng. Syst. 20, 2 (2016).
[34]
Juan Manuel Pérez, Rafael Berlanga, María José Aramburu, and Torben Bach Pedersen. 2005. A Relevance-Extended Multi-dimensional Model for a Data Warehouse Contextualized with Documents. Retrieved April 29, 2019 from http://www.ischool.drexel.edu/faculty/song/dolap/dolap05/paper/p19-perez.pdf
[35]
Rahul Singh, Victoria Y. Yoon, and Richard T. Redmond. 2002. Integrating Data Mining and On-line Analytical Processing for Intelligent Decision Systems. J. Decis. Syst. 11, 2 (2002), 185--204.
[36]
Uthayasankar Sivarajah, Muhammad Mustafa Kamal, Zahir Irani, and Vishanth Weerakkody. 2017. Critical analysis of Big Data challenges and analytical methods. J. Bus. Res. 70, (2017), 263--286.
[37]
David Taniar and Li Chen. 2011. Integrations of Data Warehousing, Data Mining and Database Technologies. Innov. approaches (2011).
[38]
Olivier Teste. 2010. Towards Conceptual Multidimensional Design in Decision Support Systems. Architecture 04, (2010). Retrieved from http://arxiv.org/abs/1005.0224
[39]
Erik Thomsen. 2002. OLAP Solutions: Building Multidimensional Information Systems (Google eBook). Wiley. Retrieved April 30, 2019 from https://books.google.ru/books?hl=ru&lr=&id=eskZA1CFdqMC&oi=fnd&pg=PR9&dq=OLAP+Solutions:+Building+Multidimensional+Information+Systems&ots=W4_qkKwdLj&sig=AIrNmisMl7wrlJeePfXLu_1bI2Y&redir_esc=y#v=onepage&q=OLAP Solutions %3A Building Multidimensional Infor
[40]
Nectaria Tryfona, Frank Busborg, and Gottfried Vossen. 1999. Conceptual data warehouse design. Dol. 99 Proc. 2nd ACM Int. Work. Data Warehous. Ol. (1999), 3--8.
[41]
Esko Ukkonen. 1992. Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92, 1 (1992), 191--211.
[42]
Jovan Varga, Oscar Romero, Torben Bach Pedersen, and Christian Thomsen. 2018. Analytical metadata modeling for next generation BI systems. J. Syst. Softw. 144, (2018), 240--254.

Index Terms

  1. Matching disparate dimensions for analytical integration of heterogeneous data sources

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems
    November 2019
    350 pages
    ISBN:9781450362382
    DOI:10.1145/3297662
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 January 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Analytical Data Integration
    2. FCA
    3. Heterogeneous Data
    4. OLAP
    5. Semantic Analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MEDES '19

    Acceptance Rates

    MEDES '19 Paper Acceptance Rate 41 of 102 submissions, 40%;
    Overall Acceptance Rate 267 of 682 submissions, 39%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 91
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 18 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media