Nothing Special   »   [go: up one dir, main page]

Skip to main content

SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10438))

Included in the following conference series:

  • 1168 Accesses

Abstract

Semi-structured data models like the Resource Description Framework (RDF), naturally allow for modeling the same real-world entity in various ways. For example, different RDF vocabularies enable the definition of various RDF graphs representing the same drug in Bio2RDF or Drugbank. Albeit semantically equivalent, these RDF graphs may be syntactically different, i.e., they have distinctive graph structure or entity identifiers and properties. Existing data-driven integration approaches only consider syntactic matching criteria or similarity measures to solve the problem of integrating RDF graphs. However, syntactic-based approaches are unable to semantically integrate heterogeneous RDF graphs. We devise SJoin, a semantic similarity join operator to solve the problem of matching semantically equivalent RDF graphs, i.e., syntactically different graphs corresponding to the same real-world entity. Two physical implementations are proposed for SJoin which follow blocking or non-blocking data processing strategies, i.e., RDF graphs can be merged in a batch or incrementally. We empirically evaluate the effectiveness and efficiency of the SJoin physical operators with respect to baseline similarity join algorithms. Experimental results suggest that SJoin outperforms baseline approaches, i.e., non-blocking SJoin incrementally produces results faster, while the blocking SJoin accurately matches all semantically equivalent RDF graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://data.un.org/.

  2. 2.

    http://stats.lod2.eu/.

  3. 3.

    Prefixes are as specified on http://prefix.cc/.

  4. 4.

    https://github.com/RDF-Molecules/Test-DataSets/tree/master/DBpedia-People/20160819.

  5. 5.

    https://github.com/RDF-Molecules/Test-DataSets/tree/master/DBpedia-WikiData/operators_evaluation.

  6. 6.

    https://github.com/RDF-Molecules/operators/tree/master/mFuhsion.

  7. 7.

    https://github.com/RDF-Molecules/operators/tree/master/baseline_ops.

References

  1. Collarana, D., Galkin, M., Lange, C., Grangel-Gonzàlez, I., Vidal, M.-E., Auer, S.: FuhSen: a federated hybrid search engine for building a knowledge graph on-demand (short paper). In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 752–761. Springer, Cham (2016). doi:10.1007/978-3-319-48472-3_47

    Chapter  Google Scholar 

  2. Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)

    Article  MATH  Google Scholar 

  3. Feng, J., Wang, J., Li, G.: Trie-join: a trie-based method for efficient string similarity joins. VLDB J. 21(4), 437–461 (2012)

    Article  Google Scholar 

  4. Fernández, J.D., Llaves, A., Corcho, O.: Efficient RDF interchange (ERI) format for RDF data streams. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 244–259. Springer, Cham (2014). doi:10.1007/978-3-319-11915-1_16

    Google Scholar 

  5. Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: apartition-based method for similarity joins. PVLDB 5(3), 253–264 (2011)

    Google Scholar 

  6. Mann, W., Augsten, N., Bouros, P.: An empirical evaluation of set similarity join techniques. PVLDB 9(9), 636–647 (2016)

    Google Scholar 

  7. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ribeiro, L.A., Cuzzocrea, A., Bezerra, K.A.A., do Nascimento, B.H.B.: Incorporating clustering into set similarity join algorithms: the SjClust framework. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 185–204. Springer, Cham (2016). doi:10.1007/978-3-319-44403-1_12

    Chapter  Google Scholar 

  9. Shang, Z., Liu, Y., Li, G., Feng, J.: K-join: knowledge-aware similarity join. IEEE Trans. Knowl. Data Eng. 28(12), 3293–3308 (2016)

    Article  Google Scholar 

  10. Traverso, I., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: Gades: a graph-based semantic similarity measure. In: SEMANTiCS, pp. 101–104. ACM (2016)

    Google Scholar 

  11. Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)

    Google Scholar 

  12. Wandelt, S., Deng, D., Gerdjikov, S., Mishra, S., Mitankin, P., Patil, M., Siragusa, E., Tiskin, A., Wang, W., Wang, J., Leser, U.: State-of-the-art in string similarity search and join. SIGMOD Rec. 43(1), 64–76 (2014)

    Article  Google Scholar 

  13. Wang, Y., Wang, H., Li, J., Gao, H.: Efficient graph similarity join for information integration on graphs. Front. Comput. Sci. 10(2), 317–329 (2016)

    Article  Google Scholar 

  14. Zhu, H., Meng, X., Kollios, G.: NED: an inter-graph node metric based on edit distance. PVLDB 10(6), 697–708 (2017)

    Google Scholar 

Download references

Acknowledgments

Mikhail Galkin is supported by the project Open Budgets (GA 645833). This work is also funded in part by the European Union under the Horizon 2020 Framework Program for the project BigDataEurope (GA 644564), and the German Ministry of Education and Research with grant no. 13N13627 (LiDaKra).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mikhail Galkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, ME., Auer, S. (2017). SJoin: A Semantic Join Operator to Integrate Heterogeneous RDF Graphs. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10438. Springer, Cham. https://doi.org/10.1007/978-3-319-64468-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64468-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64467-7

  • Online ISBN: 978-3-319-64468-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics