Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1134271.1134284acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering missing links in Wikipedia

Published: 21 August 2005 Publication History

Abstract

In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.

References

[1]
D. Ahn, V. Jijkoun, G. Mishne, K. Müller, M. de Rijke, and S. Schlobach. Using Wikipedia at the TREC QA Track. In Proceedings TREC 2004, 2005.
[2]
Apache Lucene. A high-performance, full-featured text search engine library. URL: http://lucene.apache.org, 2005.
[3]
F. Bellomi and R. Bonato. Lexical authorities in an encyclopedic corpus: a case study with wikipedia. URL: http://www.fran.it/blog/2005/01/lexical-authorities-in-encyclopedic.html, 2005. Site accessed on June 9, 2005.
[4]
S. Chakrabarti. Mining the Web. Morgan Kaufmann, 2002.
[5]
A. Ciffolilli. Phantom authority, selfselective recruitment and retention of members in virtual communities: The case of Wikipedia. First Monday, 8(12), 2003.
[6]
J. Dean and M. R. Henzinger. Finding related pages in the world wide web. Computer Networks (Amsterdam, Netherlands: 1999), 31(11--16):1467--1479, 1999.
[7]
D. Ellis, J. Furner-Hines, and P. Willett. On the measurement of inter-linker consistency and retrieval effectiveness in hypertext databases. In SIGIR 1994: Proceedings of the 17th Annual International Conference on Research and Development in Information Retrieval, pages 51--60, 1994.
[8]
R. Ghani, S. Slattery, and Y. Yang. Hypertext categorization using hyperlink patterns and meta data. In C. Brodley and A. Danyluk, editors, Proceedings of ICML-01, 18th International Conference on Machine Learning, pages 178--185, 2001.
[9]
G. Jeh and J. Widom. SimRank:- a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, 2002.
[10]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11--16):1481--1493, 1999.
[11]
A. Lih. Wikipedia as participatory journalism: Reliable sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, 2004.
[12]
N. Miller. Wikipedia and the disappearing "Author". ETC: A Review of General Semantics, 62(1):37--40, 2005.
[13]
U. Rao and M. Turoff. Hypertext functionality: A theoretical framework. International Journal of Human-Computer Interaction, 1990.
[14]
F. Viégas, M. Wattenberg, and D. Kushal. Studying cooperation and conflict between authors with history flow visualization. In Proceedings of the 2004 conference on Human factors in computing systems, 2004.
[15]
J. Voss. Measuring Wikipedia. In Proceedings 10th International Conference of the International Society for Scientometrics and Informetrics, 2005.
[16]
Wikipedia. Manual of style. URL: http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_%28links%29, 2005.
[17]
Wikipedia. The Free Encyclopedia, 2005. URL: http://www.wikipedia.org.

Cited By

View all
  • (2024)Community-enhanced Link Prediction in Dynamic NetworksACM Transactions on the Web10.1145/358051318:2(1-32)Online publication date: 8-Jan-2024
  • (2024)Chapter 6. Exploring the evolution of Wikipedia articles through ContropediaInvestigating Wikipedia10.1075/scl.121.06lan(156-177)Online publication date: 25-Oct-2024
  • (2024)Analysis and prediction of the Horizon 2020 R&D&I collaboration networkExpert Systems with Applications10.1016/j.eswa.2024.124417255(124417)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
LinkKDD '05: Proceedings of the 3rd international workshop on Link discovery
August 2005
101 pages
ISBN:1595932151
DOI:10.1145/1134271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. co-citation
  2. link analysis
  3. wikipedia

Qualifiers

  • Article

Conference

KDD05

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Community-enhanced Link Prediction in Dynamic NetworksACM Transactions on the Web10.1145/358051318:2(1-32)Online publication date: 8-Jan-2024
  • (2024)Chapter 6. Exploring the evolution of Wikipedia articles through ContropediaInvestigating Wikipedia10.1075/scl.121.06lan(156-177)Online publication date: 25-Oct-2024
  • (2024)Analysis and prediction of the Horizon 2020 R&D&I collaboration networkExpert Systems with Applications10.1016/j.eswa.2024.124417255(124417)Online publication date: Dec-2024
  • (2023)OpenFact: Factuality Enhanced Open Knowledge ExtractionTransactions of the Association for Computational Linguistics10.1162/tacl_a_0056911(686-702)Online publication date: 29-Jun-2023
  • (2023)Large-Scale Analysis of Wikipedia’s Link Structure and its Applications in Learning Path Construction2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI58017.2023.00051(254-260)Online publication date: Aug-2023
  • (2023)PD-Box: A People Place Data Box for Processing Engine Anatomy2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)10.1109/DELCON57910.2023.10127379(1-6)Online publication date: 24-Feb-2023
  • (2022)PQKLPComputer Communications10.1016/j.comcom.2022.10.006196:C(249-267)Online publication date: 1-Dec-2022
  • (2022)PWAF Computer Communications10.1016/j.comcom.2022.05.019191:C(438-458)Online publication date: 1-Jul-2022
  • (2021)Linking Friends in Social Networks Using HashTag AttributesAnalysis of Images, Social Networks and Texts10.1007/978-3-030-72610-2_20(269-281)Online publication date: 9-Apr-2021
  • (2020)Efficient algorithm to compute Markov transitional probabilities for a desired PageRankEPJ Data Science10.1140/epjds/s13688-020-00240-z9:1Online publication date: 29-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media