Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3323878.3325802acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Parallel RDF generation from heterogeneous big data

Published: 05 July 2019 Publication History

Abstract

To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.

References

[1]
Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, and Axel Polleres. 2012. Mapping between RDF and XML with XSPARQL. Journal on Data Semantics (2012).
[2]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015).
[3]
Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to RDF Mapping Language. Working Group Recommendation. W3C. http://www.w3.org/TR/r2rml/.
[4]
Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. 2014. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Workshop on Linked Data on the Web.
[5]
Kevin Jeffay. 1993. The Real-time Producer/Consumer Paradigm: A Paradigm for the Construction of Efficient, Predictable Real-time Systems. In Proceedings of the 1993 ACM/SIGAPP Symposium on Applied Computing: States of the Art and Practice (SAC '93). ACM, New York, NY, USA, 796--804.
[6]
Maxime Lefrançois, Antoine Zimmermann, and Noorani Bakerally. 2017. A SPARQL Extension for Generating RDF from Heterogeneous Formats. In The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Proceedings. Springer International Publishing, Portoroz, Slovenia, 35--50.
[7]
Wouter Maroy, Anastasia Dimou, Dimitris Kontokostas, Ben De Meester, Ruben Verborgh, Jens Lehmann, Erik Mannens, and Sebastian Hellmann. 2017. Sustainable Linked Data Generation: The Case of DBpedia. In The Semantic Web -ISWC 2017, Claudia d'Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Springer International Publishing, Cham, 297--313.
[8]
Franck Michel, Loïc Djimenou, Catherine Faron-Zucker, and Johan Montagnat. 2015. Translation of Relational and Non-relational Databases into RDF with xR2RML. In WEBIST.
[9]
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 423--438.

Cited By

View all
  • (2024)RDF Stream Taxonomy: Systematizing RDF Stream Types in Research and PracticeElectronics10.3390/electronics1313255813:13(2558)Online publication date: 29-Jun-2024
  • (2024)Helio: A framework for implementing the life cycle of knowledge graphsSemantic Web10.3233/SW-23322415:1(223-249)Online publication date: 12-Jan-2024
  • (2024)Morph-KGC: Scalable knowledge graph materialization with mapping partitionsSemantic Web10.3233/SW-22313515:1(1-20)Online publication date: 12-Jan-2024
  • Show More Cited By

Index Terms

  1. Parallel RDF generation from heterogeneous big data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SBD '19: Proceedings of the International Workshop on Semantic Big Data
    July 2019
    57 pages
    ISBN:9781450367660
    DOI:10.1145/3323878
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. RDF generation
    2. big data
    3. linked data
    4. semantic web

    Qualifiers

    • Research-article

    Funding Sources

    • Flanders Innovation & Entrepreneurship (VLAIO)
    • Ghent University
    • imec
    • European Union

    Conference

    SIGMOD/PODS '19
    Sponsor:

    Acceptance Rates

    SBD '19 Paper Acceptance Rate 8 of 15 submissions, 53%;
    Overall Acceptance Rate 30 of 54 submissions, 56%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RDF Stream Taxonomy: Systematizing RDF Stream Types in Research and PracticeElectronics10.3390/electronics1313255813:13(2558)Online publication date: 29-Jun-2024
    • (2024)Helio: A framework for implementing the life cycle of knowledge graphsSemantic Web10.3233/SW-23322415:1(223-249)Online publication date: 12-Jan-2024
    • (2024)Morph-KGC: Scalable knowledge graph materialization with mapping partitionsSemantic Web10.3233/SW-22313515:1(1-20)Online publication date: 12-Jan-2024
    • (2024)Optimized continuous homecare provisioning through distributed data-driven semantic services and cross-organizational workflowsJournal of Biomedical Semantics10.1186/s13326-024-00303-415:1Online publication date: 6-Jun-2024
    • (2023)Incremental schema integration for data wrangling via knowledge graphsSemantic Web10.3233/SW-233347(1-38)Online publication date: 8-Jun-2023
    • (2023)Streaming linked data: A survey on life cycle complianceJournal of Web Semantics10.1016/j.websem.2023.10078577(100785)Online publication date: Jul-2023
    • (2023)Declarative RDF graph generation from heterogeneous (semi-)structured dataWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2022.10075375:COnline publication date: 1-Jan-2023
    • (2023)Parallel Construction of Knowledge Graphs from Relational DatabasesPRICAI 2023: Trends in Artificial Intelligence10.1007/978-981-99-7019-3_42(467-479)Online publication date: 10-Nov-2023
    • (2023)A Window into the Multiple Views of Linked DataThe Semantic Web: ESWC 2023 Satellite Events10.1007/978-3-031-43458-7_51(331-340)Online publication date: 21-Oct-2023
    • (2022)Semi-Automatic Generating Semantic Markup Webpage from Structured Data with Semantic Matching2022 14th International Conference on Information Technology and Electrical Engineering (ICITEE)10.1109/ICITEE56407.2022.9954103(85-90)Online publication date: 18-Oct-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media