Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3366030.3366054acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Uniform Access to Multiform Data Lakes using Semantic Technologies

Published: 22 February 2020 Publication History

Abstract

Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, Semantic Web techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-of-the-art Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.

References

[1]
Paolo Atzeni, Francesca Bugiotti, and Luca Rossi. 2012. Uniform Access to Nonrelational Database Systems: The SOS Platform. In In CAiSE, Jolita Ralyté, Xavier Franch, Sjaak Brinkkemper, and Stanislaw Wrycza (Eds.), Vol. 7328. Springer, 160--174.
[2]
Sören Auer, Simon Scerri, Aad Versteden, Erika Pauwels, Stasinos Konstantopoulos, Jens Lehmann, Hajira Jabeen, Ivan Ermilov, Gezim Sejdiu, Mohamed Nadjib Mami, et al. 2017. The BigDataEurope platform-supporting the variety dimension of big data. In International Conference on Web Engineering. Springer, 41--59.
[3]
Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5, 2 (2009), 1--24.
[4]
Elena Botoeva, Diego Calvanese, Benjamin Cogrel, Julien Corman, and Guohui Xiao. 2018. A Generalized Framework for Ontology-Based Data Access. In International Conference of the Italian Association for Artificial Intelligence. Springer, 166--180.
[5]
Olivier Curé, Robin Hecht, Chan Le Duc, and Myriam Lamolle. 2011. Data integration over NoSQL stores using access path based mappings. In International Conference on Database and Expert Systems Applications. Springer, 481--495.
[6]
Oliver Curé, Fadhela Kerdjoudj, David Faye, Chan Le Duc, and Myriam Lamolle. 2013. On the potential integration of an ontology-based data access approach in NoSQL stores. International Journal of Distributed Systems and Technologies (IJDST) 4, 3 (2013), 17--30.
[7]
James Dixon. 2010. Pentaho, Hadoop, and Data Lakes. (2010). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes Online; accessed 06-August-2019.
[8]
Brendan Elliott, En Cheng, Chimezie Thomas-Ogbuji, and Z Meral Ozsoyoglu. 2009. A complete translation from SPARQL into efficient SQL. In Proceedings of the International Database Engineering & Applications Symposium. ACM, 31--42.
[9]
Kemele M Endris, Philipp D Rohde, Maria-Esther Vidal, and Sören Auer. 2019. Ontario: Federated Query Processing Against a Semantic Data Lake. In International Conference on Database and Expert Systems Applications. Springer, 379--395.
[10]
Vijay Gadepally, Peinan Chen, Jennie Duggan, Aaron Elmore, Brandon Haynes, Jeremy Kepner, Samuel Madden, Tim Mattson, and Michael Stonebraker. 2016. The bigdawg polystore system and architecture. In High Performance Extreme Computing Conference. IEEE, 1--6.
[11]
Victor Giannakouris, Nikolaos Papailiou, Dimitrios Tsoumakos, and Nectarios Koziris. 2016. MuSQLE: Distributed SQL query execution over multiple engine environments. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 452--461.
[12]
Martin Giese, Ahmet Soylu, Guillermo Vega-Gorgojo, Arild Waaler, Peter Haase, Ernesto Jiménez-Ruiz, Davide Lanti, Martín Rezk, Guohui Xiao, Özgür Özçep, et al. 2015. Optique: Zooming in on big data. Computer 48, 3 (2015), 60--67.
[13]
Damien Graux, Louis Jachiet, Pierre Geneves, and Nabil Layaïda. 2018. A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 693--702.
[14]
Eben Hewitt. 2010. Cassandra: the definitive guide. " O'Reilly Media, Inc.".
[15]
Boyan Kolev, Patrick Valduriez, Carlyna Bondiombouy, Ricardo Jiménez-Peris, Raquel Pau, and José Pereira. 2016. CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distributed and Parallel Databases 34, 4 (2016), 463--503.
[16]
Doug Laney. 2012. Deja VVVu: others claiming Gartner's construct for big data. Gartner Blog, Jan 14 (2012).
[17]
Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).
[18]
Jens Lehmann, Gezim Sejdiu, Lorenz Bühmann, Patrick Westphal, Claus Stadler, Ivan Ermilov, Simon Bin, Nilesh Chakraborty, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. 2017. Distributed Semantic Analytics using the SANSA Stack. In ISWC. Springer, 147--155.
[19]
Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, and Sören Auer. 2019. Querying Data Lakes using Spark and Presto. In The World Wide Web Conference. ACM, 3574--3578.
[20]
Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, Sören Auer, and Jens Lehman. 2019. How to feed the Squerall with RDF and other data nuts? Proceedings of 18th International Semantic Web Conference (Poster & Demo Track) (2019).
[21]
Mohamed Nadjib Mami, Damien Graux, Simon Scerri, Hajira Jabeen, Sören Auer, and Jens Lehman. 2019. Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources. Proceedings of 18th International Semantic Web Conference (2019).
[22]
Franck Michel, Catherine Faron-Zucker, and Johan Montagnat. 2016. A mapping-based method to query MongoDB documents with SPARQL. In International Conference on Database and Expert Systems Applications. Springer, 52--67.
[23]
Kian Win Ong, Yannis Papakonstantinou, and Romain Vernoux. 2014. The SQL+ + unifying semi-structured query language, and an expressiveness benchmark of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631 (2014).
[24]
Rami Sellami, Sami Bhiri, and Bruno Defude. 2016. Supporting Multi Data Stores Applications in Cloud Environments. IEEE Trans. Services Computing 9, 1 (2016), 59--71.
[25]
Rami Sellami and Bruno Defude. 2018. Complex Queries Optimization and Evaluation over Relational and NoSQL Data Stores in Cloud Environments. IEEE Trans. Big Data 4, 2 (2018), 217--230.
[26]
Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, et al. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802--1813.
[27]
D.E. Spanos, P. Stavrou, and N. Mitrou. 2010. Bringing relational databases into the semantic web: A survey. Semantic Web (2010), 1--41.
[28]
Jörg Unbehauen and Michael Martin. 2016. Executing SPARQL queries over Mapped Document Stores with SparqlMap-M. In 12th Int. Conf. on Semantic Systems.
[29]
Ágnes Vathy-Fogarassy and Tamás Hugyák. 2017. Uniform data access platform for SQL and NoSQL database systems. Information Systems 69 (2017), 93--105.
[30]
Marco Vogt, Alexander Stiemer, and Heiko Schuldt. 2017. Icarus: Towards a multistore database system. 2017 IEEE International Conference on Big Data (Big Data) (2017), 2490--2499.
[31]
Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Riccardo Rosati, and Michael Zakharyaschev. 2018. Ontology-based data access: A survey. IJCAI.
[32]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2024)Data Discovery as a Service for Data Lake2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON)10.1109/NMITCON62075.2024.10699255(1-9)Online publication date: 9-Aug-2024
  • (2024)Analytic Processing in Data Lakes: A Semantic Query-Driven Discovery ApproachInformation Systems Frontiers10.1007/s10796-024-10471-4Online publication date: 14-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
December 2019
709 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • JKU: Johannes Kepler Universität Linz
  • @WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. Data Variety
  3. NoSQL
  4. SPARQL
  5. Semantic Data Lake

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

iiWAS2019

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2024)Data Discovery as a Service for Data Lake2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON)10.1109/NMITCON62075.2024.10699255(1-9)Online publication date: 9-Aug-2024
  • (2024)Analytic Processing in Data Lakes: A Semantic Query-Driven Discovery ApproachInformation Systems Frontiers10.1007/s10796-024-10471-4Online publication date: 14-Feb-2024
  • (2023)Characteristic sets profile features: Estimation and application to SPARQL query planningSemantic Web10.3233/SW-22290314:3(491-526)Online publication date: 5-Apr-2023
  • (2023)Discovery and Matching Numerical Attributes in Data Lakes2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386080(423-432)Online publication date: 15-Dec-2023
  • (2023)Toward Data Lake Technologies for Intelligent Societies and CitiesSustainable, Innovative and Intelligent Societies and Cities10.1007/978-3-031-30514-6_1(3-29)Online publication date: 29-Mar-2023
  • (2022)Responsible Knowledge Management in Energy Data EcosystemsEnergies10.3390/en1511397315:11(3973)Online publication date: 27-May-2022
  • (2022)Policy-Based Access Control System for Delta Lake2022 Tenth International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD58033.2022.00020(60-65)Online publication date: Nov-2022
  • (2022)Serving Hybrid-Cloud SQL Interactive Queries at TwitterSoftware Architecture10.1007/978-3-031-15116-3_1(3-21)Online publication date: 19-Aug-2022
  • (2021)Semantic Intelligence in Big Data ApplicationsSmart Connected World10.1007/978-3-030-76387-9_4(71-89)Online publication date: 28-Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media