The goal of entity resolution is the identification of semantically equivalent objects within one data source or between different sources. In the context of Big Data, there is a growing need for large-scale entity resolution to find matching entities within very large and between many data sources. This requires effectively parallelizing entity resolution tasks within cluster environments.
Entity resolution (ER) is the task to identify semantically equivalent entities referring to the same real-word object (e.g., persons, products, publications, or movies) within one data source or between different sources. This task is also known as data deduplication, object matching, record linkage, or link discovery. ER is of core importance for data cleaning and data integration and has been addressed for a long time in practice and research (Rahm and Do 2000; Elmagarmid et al. 2007; Christen 2012).
Böhm C, de Melo G, Naumann F, Weikum G (2012) LINDA: distributed Web-of- Data-scale entity matching. In: Proceedings of the conference on information and knowledge management, Maui, Hawaii
Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: Proceedings of the ACM SIGMOD, Snowbird, Utah
Christen P (2012) Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection, Springer
Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017) Temporal group linkage and evolution analysis for census data. In: Proceedings of the extending database technology, Venice
Dong XL, Srivastava D (2015) Big Data Integration. Morgan and Claypool, San Rafael
Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N (2017) DeepER – Deep entity resolution. CoRR abs/1710.00597
Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1–16
Gruenheid A, Dong XL, Srivastava D (2014) Incremental record linkage. Proc VLDB Endownment 7(9):697–708
Hassanzadeh O, Chiang F, Lee HC, Miller RJ (2009) Framework for evaluating clustering algorithms in duplicate detection. Proc VLDB Endownment 2(1):1282–1293
Kolb L, Rahm E (2013) Parallel entity resolution with Dedoop. Datenbank-Spektrum 13(1):23–32
Kolb L, Thor A, Rahm E (2012) Load balancing for MapReduce-based entity resolution. In: Proceedings of the international conference on data engineering, Washington
Köpcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69(2):197–210
Köpcke H, Thor A, Rahm E (2010) Evaluation of entity resolution approaches on real-world match problems. Proc VLDB Endownment 3(1–2):484–493
Köpcke H, Thor A, Thomas S, Rahm E (2012) Tailoring entity resolution for matching product offers. In: Proceedings of the international conference on extending database technology, Berlin, pp 545–550
Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. Proc VLDB Endowment 4(11):956–967
Nentwig M, Groß A, Rahm E (2016) Holistic entity clustering for linked data. In: IEEE Data Mining Workshops (ICDMW), Barcelona
Nentwig M, Hartung M, Ngonga Ngomo AC, Rahm E (2017) A survey of current link discovery frameworks. Semantic Web 8(3):419–436
Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan M (2015) Parallel correlation clustering on big graphs. In: Proceedings of the Advances in Neural Information Processing Systems, Montréal
Pershina M, Yakout M, Chakrabarti K (2015) Holistic entity matching across knowledge graphs. In: Proceedings of the IEEE big data conference, Santa Clara
Rahm E (2016) The case for holistic data integration. In: Proceedings of the advances in databases and information systems, Prague, Czech Republic, vol. 9809. Springer LNCS, Prague
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. In: IEEE data engineering bulletin
Saeedi A, Peukert E, Rahm E (2017) Comparative evaluation of distributed clustering schemes for multi-source entity resolution. In: Proceedings of the advances in databases and information systems, vol 10509. Springer LNCS, Nicosia
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this entry
Cite this entry
Rahm, E., Peukert, E. (2018). Large Scale Entity Resolution. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham.
Download citation
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering