Abstract
Semi-structured data models like the Resource Description Framework (RDF), naturally allow for modeling the same real-world entity in various ways. For example, different RDF vocabularies enable the definition of various RDF graphs representing the same drug in Bio2RDF or Drugbank. Albeit semantically equivalent, these RDF graphs may be syntactically different, i.e., they have distinctive graph structure or entity identifiers and properties. Existing data-driven integration approaches only consider syntactic matching criteria or similarity measures to solve the problem of integrating RDF graphs. However, syntactic-based approaches are unable to semantically integrate heterogeneous RDF graphs. We devise SJoin, a semantic similarity join operator to solve the problem of matching semantically equivalent RDF graphs, i.e., syntactically different graphs corresponding to the same real-world entity. Two physical implementations are proposed for SJoin which follow blocking or non-blocking data processing strategies, i.e., RDF graphs can be merged in a batch or incrementally. We empirically evaluate the effectiveness and efficiency of the SJoin physical operators with respect to baseline similarity join algorithms. Experimental results suggest that SJoin outperforms baseline approaches, i.e., non-blocking SJoin incrementally produces results faster, while the blocking SJoin accurately matches all semantically equivalent RDF graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Prefixes are as specified on http://prefix.cc/.
- 4.
- 5.
- 6.
- 7.
References
Collarana, D., Galkin, M., Lange, C., Grangel-Gonzàlez, I., Vidal, M.-E., Auer, S.: FuhSen: a federated hybrid search engine for building a knowledge graph on-demand (short paper). In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 752–761. Springer, Cham (2016). doi:10.1007/978-3-319-48472-3_47
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)
Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)
Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). doi:10.1007/978-3-319-11915-1_16
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: apartition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)
Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. PVLDB 9(9), 636–647 (2016)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Ribeiro, L.A., Cuzzocrea, A., Bezerra, K.A.A., do Nascimento, B.H.B.: Incorporating clustering into set similarity join algorithms: the SjClust framework. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 185–204. Springer, Cham (2016). doi:10.1007/978-3-319-44403-1_12
Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)
Traverso, I., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: Gades: a graph-based semantic similarity measure. In: SEMANTiCS, pp. 101–104. ACM (2016)
Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)
Wandelt, S., Deng, D., Gerdjikov, S., Mishra, S., Mitankin, P., Patil, M., Siragusa, E., Tiskin, A., Wang, W., Wang, J., Leser, U.: State-of-the-art in string similarity search and join. SIGMOD Rec. 43(1), 64–76 (2014)
Wang, Y., Wang, H., Li, J., Gao, H.: Efficient graph similarity join for information integration on graphs. Front. Comput. Sci. 10(2), 317–329 (2016)
Zhu, H., Meng, X., Kollios, G.: NED: an inter-graph node metric based on edit distance. PVLDB 10(6), 697–708 (2017)
Acknowledgments
Mikhail Galkin is supported by the project Open Budgets (GA 645833). This work is also funded in part by the European Union under the Horizon 2020 Framework Program for the project BigDataEurope (GA 644564), and the German Ministry of Education and Research with grant no. 13N13627 (LiDaKra).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, ME., Auer, S. (2017). SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-64468-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64467-7
Online ISBN: 978-3-319-64468-4
eBook Packages: Computer ScienceComputer Science (R0)