Abstract
Every day the global media system produces an abundance of news stories, all containing many references to people. An important task is to automatically generate reliable lists of people by analysing news content. We describe a system that leverages large amounts of data for this purpose. Lack of structure in this data gives rise to a large number of ways to refer to any particular person. Entity matching attempts to connect references that refer to the same person, usually employing some measure of similarity between references. We use information from multiple sources in order to produce a set of similarity measures with differing strengths and weaknesses. We show how their combination can improve precision without decreasing recall.
Chapter PDF
Similar content being viewed by others
Keywords
References
Cunningham, H., Maynard, D., Bontcheva, K., Talban, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: ACL (2002)
Dimitrov, M.: A light-weight approach to coreference resolution for named entities in text. MSc Thesis, University of Sofia (2002)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. In: TKDD (2007)
Newcombe, H., Kennedy, J., Axford, S., James, A.: Automatic linkage of vital records. Science (1959)
Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: WWW (2005)
Han, X., Zhao, J.: Named entity disambiguation by leveraging Wikipedia semantic knowledge. In: CIKM (2009)
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL (2007)
Minkov, E., Cohen, W., Ng, A.: Contextual search and name disambiguation in email using graphs. In: SIGIR (2006)
Jijkoun, V., Khalid, M., Marx, M., De Rijke, M.: Named entity normalization in user generated content. In: AND (2008)
Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. In: VLDB (2009)
Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. In: PODS (2009)
Köpcke, H., Rahm, E.: Frameworks for entity matching: A comparison. Data Knowledge Engineering (2009)
Chapman, S.: SimMetrics Library. NLP Group, University of Sheffield (2006), http://www.dcs.shef.ac.uk/~sam/simmetrics.html
Duda, R., Hart, P., Stork, D.: Pattern classification, 2nd edn. Wiley Interscience, Hoboken
Talukdar, P., Brants, T., Liberman, M., Pereira, F.: A context pattern induction method for named entity extraction. In: CoNLL (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP
About this paper
Cite this paper
Ali, O., Cristianini, N. (2010). Information Fusion for Entity Matching in Unstructured Data. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2010. IFIP Advances in Information and Communication Technology, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16239-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-16239-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16238-1
Online ISBN: 978-3-642-16239-8
eBook Packages: Computer ScienceComputer Science (R0)