Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2467696.2467709acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

A relevance feedback approach for the author name disambiguation problem

Published: 22 July 2013 Publication History

Abstract

This paper presents a new name disambiguation method that exploits user feedback on ambiguous references across iterations. An unsupervised step is used to define pure training samples, and a hybrid supervised step is employed to learn a classification model for assigning references to authors. Our classification scheme combines the Optimum-Path Forest (OPF) classifier with complex reference similarity functions generated by a Genetic Programming framework. Experiments demonstrate that the proposed method yields better results than state-of-the-art disambiguation methods on two traditional datasets.

References

[1]
Byung-Won On, Dongwon Lee, Jaewoo Kang, and Prasenjit Mitra. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pages 344--353, Denver, CO, USA, 2005.
[2]
Anderson A. Ferreira, Marcos Andre Gonçalves, and Alberto H. F. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Record, 41(2):15--26, 2012.
[3]
J. P. Papa, A. X. Falc\ ao, and C. T. N. Suzuki. Supervised pattern classification based on optimum-path forest. International Journal of Imaging Systems and Technology, 19(2):120--131, 2009.
[4]
J. P. Papa, A. X. Falc\ ao, V. H. C. Albuquerque, and J. M. R. S. Tavares. Efficient supervised optimum-path forest classification for large datasets. Pattern Recognition, 45(1):512--520, 2012.
[5]
Hui Han, Hongyuan Zha, and C. Lee Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries, pages 334--343, Denver, CO, USA, 2005.
[6]
Jian Huang, Seyda Ertekin, and C. Lee Giles. Efficient name disambiguation for large-scale databases. In Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 536--544, Berlin, Germany, 2006.
[7]
Byung-Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, and Jian Pei. Improving grouped-entity resolution using quasi-cliques. In Proceedings of the 6th IEEE International Conference on Data Mining, pages 1008--015, 2006.
[8]
Indrajit Bhattacharya and Lise Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1), 2007.
[9]
Aron Culotta, Pallika Kanani, Robert Hall, Michael Wick, and Andrew McCallum. Author disambiguation using error-driven machine learning with a ranking loss function. In Proceedings of the International Workshop on Information Integration on the Web, Vancouver, Canada, 2007.
[10]
In-Su Kang, Seung-Hoon Na, Seungwoo Lee, Hanmin Jung, Pyung Kim, Won-Kyung Sung, and Jong-Hyeok Lee. On co-authorship for author disambiguation. Information Processing & Management, 45(1):84--97, 2009.
[11]
Byung-Won On and Dongwon Lee. Scalable name disambiguation using multi-level graph partition. In Proceedings of the 7th SIAM International Conference on Data Mining, pages 575--580, Minneapolis, Minnesota, USA, 2007.
[12]
José M. Soler. Separating the articles of authors with the same name. Scientometrics, 72(2):281--290, 2007.
[13]
Yang Song, Jian Huang, Isaac G. Councill, Jia Li, and C. Lee Giles. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, pages 342--351, Vancouver, BC, Canada, 2007.
[14]
Denilson Alves Pereira, Berthier A. Ribeiro-Neto, Nivio Ziviani, Alberto H. F. Laender, Marcos André Gonçalves, and Anderson A. Ferreira. Using web information for author name disambiguation. In Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries, pages 49--58, 2009.
[15]
Vetle I. Torvik and Neil R. Smalheiser. Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data, 3(3):1--29, 2009.
[16]
Pucktada Treeratpituk and C. Lee Giles. Disambiguating authors in academic publications using random forests. In Proceedings of the 2009 ACM/IEEE Joint Conference on Digital Libraries, pages 39--48, Austin, TX, USA, 2009.
[17]
Ricardo G. Cota, Anderson Almeida Ferreira, Marcos André Gonçalves, Alberto H. F. Laender, and Cristiano Nascimento. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9):1853--1870, 2010.
[18]
A.A. Ferreira, A. Veloso, M.A. Gonçalves, and A.H.F. Laender. Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th annual joint conference on Digital libraries, pages 39--48. ACM, 2010.
[19]
Xiaoming Fan, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. On graph-based name disambiguation. ACM Journal of Data and Information Quality, 2:10:1--10:23, February 2011.
[20]
Ana Paula Carvalho, Anderson A. Ferreira, Alberto H. F. Laender, and Marcos André Gonçalves. Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management, 2(3):289--304, 2011.
[21]
Michael Levin, Stefan Krawzyk, Steven Bethard, and Dan Jurafsky. Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5):1030--1047, 2012.
[22]
Felipe H. Levin and Carlos A. Heuser. Evaluating the use of social networks in author name disambiguation in digital libraries. Journal of Information and Data Management, 1(2):183--197, 2010.
[23]
Hui Han, C. Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 296--305, Tuscon, USA, 2004.
[24]
Hui Han, Wei Xu, Hongyuan Zha, and C. Lee Giles. A hierarchical naive Bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM Symposium on Applied Computing, pages 1065--1069, Santa Fe, New Mexico, USA, 2005.
[25]
Indrajit Bhattacharya and Lise Getoor. A latent dirichlet model for unsupervised entity resolution. In Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006.
[26]
Jie Tang, Auvis C. M. Fong, Bo Wang, and Jing Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6):975--987, 2012.
[27]
Adriano Veloso, Anderson A. Ferreira, Marcos A. Gonçalves, Alberto H.F. Laender, and Wagner Meira Jr. Cost-effective on-demand associative author name disambiguation. Information Processing & Management, 48(4):680 -- 697, 2012.
[28]
A.A. Ferreira, T.M. Machado, and M.A. Gonçalves. Improving author name disambiguation with user relevance feedback. Journal of Information and Data Management, 3(3):332, 2012.
[29]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishing Company, 2nd edition, 2008.
[30]
X. Wang, J. Tang, H. Cheng, and P.S. Yu. Adana: Active name disambiguation. In Proceedings of the 11th International Conference on Data Mining, pages 794--803, Vancouver,Canada, 2011.
[31]
Yuhua Li, Aiming Wen, Quan Lin, Ruixuan Li, and Zhengding Lu. Incorporating user feedback into name disambiguation of scientific cooperation network. In Proceedings of the 12th international conference on Web-age information management, WAIM'11, pages 454--466, 2011.
[32]
A.T. da Silva, J.A. dos Santos, A.X. Falc\ ao, R.S. Torres, and L.P. Magalh\ aes. Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning. Computer Vision and Image Understanding, 116(4):510--523, 2012.
[33]
A.T. da Silva, AX Falc\ ao, and L.P. Magalh\ aes. A new cbir approach based on relevance feedback and optimum-path forest classification. Journal of WSCG, 18(1--3):73--80, 2010.
[34]
Jefersson Alex dos Santos, André Tavares da Silva, Ricardo da Silva Torres, Alexandre X. Falcão, Léo Pini Magalhães, and Rubens A. C. Lamparelli. Interactive classification of remote sensing images by using optimum-path forest and genetic programming. In 14th International Conference on Computer Analysis of Images and Patterns (CAIP), pages 300--307, 2011.
[35]
R. Calumby, R. da S. Torres, and M. A. Gonçalves. Multimodal retrieval with relevance feedback based on genetic programming. Multimedia Tools and Applications, pages 1--29, 2012.
[36]
F. S. P. Andrade, J. Almeida, H. Pedrini, and R. da S. Torres. Fusion of local and global descriptors for content-based image and video retrieval. In Iberoamerican Congress on Pattern Recognition, pages 845--853, 2012.
[37]
F. F. Faria, A. Veloso, H. M. Almeida, E. Valle, R. da S. Torres, M. A. Gonçalves, and W. Meira Jr. Learning to rank for content-based image retrieval. In ACM MIR, pages 285--294, 2010.
[38]
R. da S. Torres, A. X. Falc\ ao, M. A. Gonçalves, J. P. Papa, B. Zhang, W. Fan, and E. A. Fox. A genetic programming framework for content-based image retrieval. Pattern Recognition, 42(2):283--292, 2009.
[39]
C. D. Ferreira, J. A. Santos, R. da S. Torres, M. A. Gonçalves, R. C. Rezende, and W. Fan. Relevance feedback based on genetic programming for image retrieval. Pattern Recognition Letters, 32(1):27--37, 2011.
[40]
Weiguo Fan, Praveen Pathak, and Mi Zhou. Genetic-based approaches in ranking function discovery and optimization in information retrieval - a framework. Decision Support Systems, 47(4):398--407, 2009.
[41]
H. M. de Almeida, M. A. Gonçalves, M. Cristo, and P. P. Calado. A combined component approach for finding collection-adapted ranking functions based on genetic programming. In ACM SIGIR, pages 399--406, 2007.
[42]
T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2001.
[43]
A.A. Ferreira, R. Silva, M.A. Gonçalves, A. Veloso, and A.H.F. Laender. Active associative sampling for author name disambiguation. In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pages 175--184. ACM, 2012.
[44]
In-Su Kang, Pyung Kim, Seungwoo Lee, Hanmin Jung, and Beom-Jong You. Construction of a large-scale test set for author disambiguation. Information Processing and Management, 47(3):452--465, May 2011.
[45]
Itshak Lapidot. Self-Organizing-Maps with BIC for Speaker Clustering. Technical report, IDIAP Research Institute, Martigny, Switzerland, 2002.
[46]
C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Butterworths, London, 1979.
[47]
Robert Feldt and Peter Nordin. Using factorial experiments to evaluate the effect of genetic programming parameters. In EuroGP, pages 271--282, 2000.

Cited By

View all
  • (2024)Towards Effective Author Name Disambiguation by Hybrid AttentionJournal of Computer Science and Technology10.1007/s11390-023-2070-z39:4(929-950)Online publication date: 1-Jul-2024
  • (2022)Combination of classifiers with incomplete frames of discernmentChinese Journal of Aeronautics10.1016/j.cja.2021.04.02035:5(145-157)Online publication date: May-2022
  • (2019)Dirichlet process gaussian mixture for active online name disambiguation by particle filterProceedings of the 18th Joint Conference on Digital Libraries10.1109/JCDL.2019.00045(269-278)Online publication date: 2-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '13: Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
July 2013
480 pages
ISBN:9781450320771
DOI:10.1145/2467696
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. genetic programming
  2. name disambiguation
  3. optimum-path forest classifier
  4. relevance feedback

Qualifiers

  • Research-article

Conference

JCDL '13
Sponsor:
JCDL '13: 13th ACM/IEEE-CS Joint Conference on Digital Libraries
July 22 - 26, 2013
Indiana, Indianapolis, USA

Acceptance Rates

JCDL '13 Paper Acceptance Rate 28 of 95 submissions, 29%;
Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Effective Author Name Disambiguation by Hybrid AttentionJournal of Computer Science and Technology10.1007/s11390-023-2070-z39:4(929-950)Online publication date: 1-Jul-2024
  • (2022)Combination of classifiers with incomplete frames of discernmentChinese Journal of Aeronautics10.1016/j.cja.2021.04.02035:5(145-157)Online publication date: May-2022
  • (2019)Dirichlet process gaussian mixture for active online name disambiguation by particle filterProceedings of the 18th Joint Conference on Digital Libraries10.1109/JCDL.2019.00045(269-278)Online publication date: 2-Jun-2019
  • (2018)Data-Fusion Techniques for Open-Set Recognition ProblemsIEEE Access10.1109/ACCESS.2018.28242406(21242-21265)Online publication date: 2018
  • (2015)Combining Classifiers and User Feedback for Disambiguating Author NamesProceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries10.1145/2756406.2756964(259-260)Online publication date: 21-Jun-2015
  • (2015)Dynamic author name disambiguation for growing digital librariesInformation Retrieval Journal10.1007/s10791-015-9261-318:5(379-412)Online publication date: 21-Jul-2015
  • (2015)Author Profile Enrichment for Cross-Linking Digital LibrariesResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-24592-8_10(124-136)Online publication date: 28-Nov-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media