Abstract
A pair of RDF instances are said to corefer when they are intended to denote the same thing in the world, for example, when two nodes of type foaf:Person describe the same individual. This problem is central to integrating and inter-linking semi-structured datasets. We are developing an online, unsupervised coreference resolution framework for heterogeneous, semi-structured data. The online aspect requires us to process new instances as they appear and not as a batch. The instances are heterogeneous in that they may contain terms from different ontologies whose alignments are not known in advance. Our framework encompasses a two-phased clustering algorithm that is both flexible and distributable, a probabilistic multidimensional attribute model that will support robust schema mappings, and a consolidation algorithm that will be used to perform instance consolidation in order to improve accuracy rates over time by addressing data spareness.
Advisor: Tim Finin.
Chapter PDF
Similar content being viewed by others
Keywords
- Link Open Data
- Coreference Resolution
- Link Open Data Cloud
- Ontology Alignment Evaluation Initiative
- Consolidation Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Araujo, S., Hidders, J., Schwabe, D., de Vries, A.P.: Serimi resource description similarity, rdf instance matching and interlinking. CoRR, Vol. abs/1107.1104 (2011)
Weisstein, E.: Distance. From MathWorld–A Wolfram Web Resource (1999-2012) (accessed May 2012)
Hogan, A., Harth, A., Decker, S.: Performing object consolidation on the semantic web data graph. In: Proc. I3: Identity, Identifiers, Identification. Workshop at 16th Int. World Wide Web Conf. (February 2007)
Hu, W., Qu, Y., Sun, X.: Bootstrapping object coreferencing on the semantic web. Journal of Computer Science and Technology 26(4), 663–675 (2011)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals, vol. 10(8), pp. 707–710 (1966)
McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: The Sixth International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, pp. 169–178 (2000)
Nikolov, A., Uren, V., Motta, E.: Data linking: Capturing and utilising implicit schema level relations. In: International Workshop on Linked Data on the Web (2010)
Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 332–346. Springer, Heidelberg (2009)
Rao, D., McNamee, P., Dredze, M.: Streaming cross document entity coreference resolution. In: International Conference on Computational Linguistics (COLING). Coling 2010 Organizing Committee, pp. 1050–1058 (November 2010)
Seddiqui, M.H., Aono, M.: Ontology instance matching by considering semantic link cloud. In: 9th WSEAS International Conference on Applications of Computer Engineering (2010)
Shi, L., Berrueta, D., Fernandez, S., Polo, L., Fernandez, S.: Smushing rdf instances: are alice and bob the same open source developer? In: Proc. 3rd Expert Finder workshop on Personal Identification and Collaborations: Knowledge Mediation and Extraction, 7th Int. Semantic Web Conf. (November 2008)
Sleeman, J., Finin, T.: Computing foaf co-reference relations with rules and machine learning. In: The Third International Workshop on Social Data on the Web, ISWC (November 2010)
Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649–664. Springer, Heidelberg (2011)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework for the web of data. In: Proc. 2nd Workshop on Linked Data on the Web, Madrid, Spain (April 2009)
Yatskevich, M., Welty, C., Murdock, J.: Coreference resolution on rdf graphs generated from information extraction: first results. In: The ISWC 2006 Workshop on Web Content Mining with Human Language Technologies (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sleeman, J. (2012). Online Unsupervised Coreference Resolution for Semi-structured Heterogeneous Data. In: Cudré-Mauroux, P., et al. The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7650. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35173-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-35173-0_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35172-3
Online ISBN: 978-3-642-35173-0
eBook Packages: Computer ScienceComputer Science (R0)