SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs

Mikhail Galkin^19,20,23,
Diego Collarana^19,20,
Ignacio Traverso-Ribón²¹,
Maria-Esther Vidal^20,22 &
…
Sören Auer^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1168 Accesses

Abstract

Semi-structured data models like the Resource Description Framework (RDF), naturally allow for modeling the same real-world entity in various ways. For example, different RDF vocabularies enable the definition of various RDF graphs representing the same drug in Bio2RDF or Drugbank. Albeit semantically equivalent, these RDF graphs may be syntactically different, i.e., they have distinctive graph structure or entity identifiers and properties. Existing data-driven integration approaches only consider syntactic matching criteria or similarity measures to solve the problem of integrating RDF graphs. However, syntactic-based approaches are unable to semantically integrate heterogeneous RDF graphs. We devise SJoin, a semantic similarity join operator to solve the problem of matching semantically equivalent RDF graphs, i.e., syntactically different graphs corresponding to the same real-world entity. Two physical implementations are proposed for SJoin which follow blocking or non-blocking data processing strategies, i.e., RDF graphs can be merged in a batch or incrementally. We empirically evaluate the effectiveness and efficiency of the SJoin physical operators with respect to baseline similarity join algorithms. Experimental results suggest that SJoin outperforms baseline approaches, i.e., non-blocking SJoin incrementally produces results faster, while the blocking SJoin accurately matches all semantically equivalent RDF graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Entity Comparison in RDF Graphs

Poster Paper Data Integration for Supporting Biomedical Knowledge Graph Creation at Large-Scale

FedS: Towards Traversing Federated RDF Graphs

Notes

References

Collarana, D., Galkin, M., Lange, C., Grangel-Gonzàlez, I., Vidal, M.-E., Auer, S.: FuhSen: a federated hybrid search engine for building a knowledge graph on-demand (short paper). In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 752–761. Springer, Cham (2016). doi:10.1007/978-3-319-48472-3_47
Chapter Google Scholar
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)
Article MATH Google Scholar
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Article Google Scholar
Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). doi:10.1007/978-3-319-11915-1_16
Google Scholar
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: apartition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)
Google Scholar
Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. PVLDB 9(9), 636–647 (2016)
Google Scholar
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Article MathSciNet MATH Google Scholar
Ribeiro, L.A., Cuzzocrea, A., Bezerra, K.A.A., do Nascimento, B.H.B.: Incorporating clustering into set similarity join algorithms: the SjClust framework. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 185–204. Springer, Cham (2016). doi:10.1007/978-3-319-44403-1_12
Chapter Google Scholar
Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)
Article Google Scholar
Traverso, I., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: Gades: a graph-based semantic similarity measure. In: SEMANTiCS, pp. 101–104. ACM (2016)
Google Scholar
Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)
Google Scholar
Wandelt, S., Deng, D., Gerdjikov, S., Mishra, S., Mitankin, P., Patil, M., Siragusa, E., Tiskin, A., Wang, W., Wang, J., Leser, U.: State-of-the-art in string similarity search and join. SIGMOD Rec. 43(1), 64–76 (2014)
Article Google Scholar
Wang, Y., Wang, H., Li, J., Gao, H.: Efficient graph similarity join for information integration on graphs. Front. Comput. Sci. 10(2), 317–329 (2016)
Article Google Scholar
Zhu, H., Meng, X., Kollios, G.: NED: an inter-graph node metric based on edit distance. PVLDB 10(6), 697–708 (2017)
Google Scholar

Download references

Acknowledgments

Mikhail Galkin is supported by the project Open Budgets (GA 645833). This work is also funded in part by the European Union under the Horizon 2020 Framework Program for the project BigDataEurope (GA 644564), and the German Ministry of Education and Research with grant no. 13N13627 (LiDaKra).

Author information

Authors and Affiliations

Enterprise Information Systems (EIS), University of Bonn, Bonn, Germany
Mikhail Galkin, Diego Collarana & Sören Auer
Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin, Germany
Mikhail Galkin, Diego Collarana, Maria-Esther Vidal & Sören Auer
FZI Research Center for Information Technology, Karlsruhe, Germany
Ignacio Traverso-Ribón
Universidad Simón Bolívar, Caracas, Venezuela
Maria-Esther Vidal
ITMO University, Saint Petersburg, Russia
Mikhail Galkin

Authors

Mikhail Galkin
View author publications
You can also search for this author in PubMed Google Scholar
Diego Collarana
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Traverso-Ribón
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikhail Galkin .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, ME., Auer, S. (2017). SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-64468-4_16
Published: 01 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Comparison in RDF Graphs

Poster Paper Data Integration for Supporting Biomedical Knowledge Graph Creation at Large-Scale

FedS: Towards Traversing Federated RDF Graphs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Comparison in RDF Graphs

Poster Paper Data Integration for Supporting Biomedical Knowledge Graph Creation at Large-Scale

FedS: Towards Traversing Federated RDF Graphs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation