Nothing Special   »   [go: up one dir, main page]

Skip to main content

Large Scale Entity Resolution

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 243 Accesses

Synonyms

Data deduplication; Link discovery; Object matching; Record linkage

Definition

The goal of entity resolution is the identification of semantically equivalent objects within one data source or between different sources. In the context of Big Data, there is a growing need for large-scale entity resolution to find matching entities within very large and between many data sources. This requires effectively parallelizing entity resolution tasks within cluster environments.

Overview

Entity resolution (ER) is the task to identify semantically equivalent entities referring to the same real-word object (e.g., persons, products, publications, or movies) within one data source or between different sources. This task is also known as data deduplication, object matching, record linkage, or link discovery. ER is of core importance for data cleaning and data integration and has been addressed for a long time in practice and research (Rahm and Do 2000; Elmagarmid et al. 2007; Christen 2012).

T...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Böhm C, de Melo G, Naumann F, Weikum G (2012) LINDA: distributed Web-of- Data-scale entity matching. In: Proceedings of the conference on information and knowledge management, Maui, Hawaii

    Google Scholar 

  • Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: Proceedings of the ACM SIGMOD, Snowbird, Utah

    Google Scholar 

  • Christen P (2012) Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection, Springer

    Google Scholar 

  • Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017) Temporal group linkage and evolution analysis for census data. In: Proceedings of the extending database technology, Venice

    Google Scholar 

  • Dong XL, Srivastava D (2015) Big Data Integration. Morgan and Claypool, San Rafael

    Google Scholar 

  • Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N (2017) DeepER – Deep entity resolution. CoRR abs/1710.00597

    Google Scholar 

  • Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16

    Article  Google Scholar 

  • Gruenheid A, Dong XL, Srivastava D (2014) Incremental record linkage. Proc VLDB Endownment 7(9):697–708

    Article  Google Scholar 

  • Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endownment 2(1):1282–1293

    Article  Google Scholar 

  • Kolb L, Rahm E (2013) Parallel entity resolution with Dedoop. Datenbank-Spektrum 13(1):23–32

    Article  Google Scholar 

  • Kolb L, Thor A, Rahm E (2012) Load balancing for MapReduce-based entity resolution. In: Proceedings of the international conference on data engineering, Washington

    Google Scholar 

  • Köpcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69(2):197–210

    Article  Google Scholar 

  • Köpcke H, Thor A, Rahm E (2010) Evaluation of entity resolution approaches on real-world match problems. Proc VLDB Endownment 3(1–2):484–493

    Article  Google Scholar 

  • Köpcke H, Thor A, Thomas S, Rahm E (2012) Tailoring entity resolution for matching product offers. In: Proceedings of the international conference on extending database technology, Berlin, pp 545–550

    Google Scholar 

  • Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. Proc VLDB Endowment 4(11):956–967

    MATH  Google Scholar 

  • Nentwig M, Groß A, Rahm E (2016) Holistic entity clustering for linked data. In: IEEE Data Mining Workshops (ICDMW), Barcelona

    Google Scholar 

  • Nentwig M, Hartung M, Ngonga Ngomo AC, Rahm E (2017) A survey of current link discovery frameworks. Semantic Web 8(3):419–436

    Article  Google Scholar 

  • Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan M (2015) Parallel correlation clustering on big graphs. In: Proceedings of the Advances in Neural Information Processing Systems, Montréal

    Google Scholar 

  • Pershina M, Yakout M, Chakrabarti K (2015) Holistic entity matching across knowledge graphs. In: Proceedings of the IEEE big data conference, Santa Clara

    Google Scholar 

  • Rahm E (2016) The case for holistic data integration. In: Proceedings of the advances in databases and information systems, Prague, Czech Republic, vol. 9809. Springer LNCS, Prague

    Chapter  Google Scholar 

  • Rahm E, Do HH (2000) Data cleaning: problems and current approaches. In: IEEE data engineering bulletin

    Google Scholar 

  • Saeedi A, Peukert E, Rahm E (2017) Comparative evaluation of distributed clustering schemes for multi-source entity resolution. In: Proceedings of the advances in databases and information systems, vol 10509. Springer LNCS, Nicosia

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erhard Rahm .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Rahm, E., Peukert, E. (2018). Large Scale Entity Resolution. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_4-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_4-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics