Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2872427.2883077acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Growing Wikipedia Across Languages via Recommendation

Published: 11 April 2016 Publication History

Abstract

The different Wikipedia language editions vary dramatically in how comprehensive they are. As a result, most language editions contain only a small fraction of the sum of information that exists across all Wikipedias. In this paper, we present an approach to filling gaps in article coverage across different Wikipedia editions. Our main contribution is an end-to-end system for recommending articles for creation that exist in one language but are missing in an- other. The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests. We empirically validate our models in a controlled experiment involving 12,000 French Wikipedia editors. We find that personalizing recommendations increases editor engagement by a factor of two. Moreover, recommending articles increases their chance of being created by a factor of 3.2. Finally, articles created as a result of our recommendations are of comparable quality to organically created articles. Overall, our system leads to more engaged editors and faster growth of Wikipedia with no effect on its quality.

References

[1]
P. Adams and F. Fleck. Bridging the language divide in health. Bulletin of the World Health Organization, 93(6):356--366, June 2015.
[2]
E. Adar, M. Skinner, and D. S. Weld. Information arbitrage across multi-lingual wikipedia. In WSDM, 2009.
[3]
P. Bao, B. Hecht, S. Carton, M. Quaderi, M. Horn, and D. Gergle. Omnipedia: Bridging the wikipedia language gap. In CHI, 2012.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[5]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[6]
D. Cosley, D. Frankowski, L. Terveen, and J. Riedl. Suggestbot: using intelligent task routing to help people find work in wikipedia. In Proceedings of the 12th international conference on Intelligent user interfaces, pages 32--41. ACM, 2007.
[7]
J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. J. Res. Nat. Bur. Standards B, 69:125--130, 1965.
[8]
E. Filatova. Multilingual wikipedia, summarization, and information trustworthiness. In SIGIR workshop on information access in a multilingual world, 2009.
[9]
J. R. Frank, D. A. Max Kleiman-Weiner, N. Feng, C. Zhang, C. Ré, and S. I. Building an entity-centric stream filtering test collection for trec 2012, 2012.
[10]
D. Geiger and M. Schader. Personalized task recommendation in crowdsourcing information systems--current state of the art. Decision Support Systems, 65:3--16, 2014.
[11]
Github. ua-parser, 2015. https://github.com/tobie/ua-parser.
[12]
S. A. Hale. Multilinguals and Wikipedia editing. In WebSci, 2014.
[13]
A. Halfaker and M. Warncke-Wang. Wikiclass. https://github.com/wiki-ai/wikiclass.
[14]
B. Hecht and D. Gergle. The tower of babel meets web 2.0: user-generated content and its applications in a multilingual context. In CHI, 2010.
[15]
M. Manske. Not in the other language. Website. https://tools.wmflabs.org/not-in-the-other-language/.
[16]
MediaWiki. Content translation, 2015. https://www.mediawiki.org/wiki/Content_translation.
[17]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[18]
R. Rehurek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In LREC Workshop on New Challenges for NLP Frameworks, 2010.
[19]
M. Warncke-Wang, D. Cosley, and J. Riedl. Tell me more: An actionable quality model for wikipedia. In WikiSym, 2013.
[20]
Wikidata, 2015. https://www.wikidata.org/wiki/Wikidata:Main_Page.
[21]
Wikipedia. Babel template, 2015. https://en.wikipedia.org/wiki/Wikipedia:Babel.
[22]
Wikipedia. Importance assessments, 2015. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wikipedia/Assessment#Importance_assessment.
[23]
Wikipedia. List of articles every wikipedia should have, 2015. https://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_have.
[24]
Wikipedia. List of Wikipedias. Website, 2015. https://meta.wikimedia.org/wiki/List_of_Wikipedias.
[25]
Wikipedia. Quality assessments, 2015. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wikipedia/Assessment#Quality_assessments.
[26]
Wikipedia. Wikipedia:notability, 2015. https://en.wikipedia.org/wiki/Wikipedia:Notability.
[27]
E. Wulczyn, R. West, and L. Zia. Project website. https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage.
[28]
C.-M. A. Yeung, K. Duh, and M. Nagata. Providing cross-lingual editing assistance to wikipedia editors. In Computational Linguistics and Intelligent Text Processing, pages 377--389. Springer, 2011.

Cited By

View all
  • (2024)Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study.Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642605(1-21)Online publication date: 11-May-2024
  • (2023)Detecting Cross-Lingual Information Gaps in WikipediaCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587539(581-585)Online publication date: 30-Apr-2023
  • (2023)Companies in Multilingual Wikipedia: Articles Quality and Important Sources of InformationInformation Technology for Management: Approaches to Improving Business and Society10.1007/978-3-031-29570-6_3(48-67)Online publication date: 28-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. recommendation systems
  2. translation
  3. wikipedia

Qualifiers

  • Research-article

Conference

WWW '16
Sponsor:
  • IW3C2
WWW '16: 25th International World Wide Web Conference
April 11 - 15, 2016
Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Low-Resourced Languages and Online Knowledge Repositories: A Need-Finding Study.Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642605(1-21)Online publication date: 11-May-2024
  • (2023)Detecting Cross-Lingual Information Gaps in WikipediaCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587539(581-585)Online publication date: 30-Apr-2023
  • (2023)Companies in Multilingual Wikipedia: Articles Quality and Important Sources of InformationInformation Technology for Management: Approaches to Improving Business and Society10.1007/978-3-031-29570-6_3(48-67)Online publication date: 28-Mar-2023
  • (2021)Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English WikipediaQuantitative Science Studies10.1162/qss_a_001052:1(1-19)Online publication date: 8-Apr-2021
  • (2021)Automatically Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing BehaviorsProceedings of the ACM on Human-Computer Interaction10.1145/34795035:CSCW2(1-23)Online publication date: 18-Oct-2021
  • (2021)How Inclusive Are Wikipedia’s Hyperlinks in Articles Covering Polarizing Topics?2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671943(1300-1307)Online publication date: 15-Dec-2021
  • (2021)Information asymmetry in Wikipedia across different languages: A statistical analysisJournal of the Association for Information Science and Technology10.1002/asi.24553Online publication date: 21-Jul-2021
  • (2020)Exposure to social engagement metrics increases vulnerability to misinformationHarvard Kennedy School Misinformation Review10.37016/mr-2020-033Online publication date: 25-Jul-2020
  • (2020)Quantifying Engagement with Citations on WikipediaProceedings of The Web Conference 202010.1145/3366423.3380300(2365-2376)Online publication date: 20-Apr-2020
  • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.1800363Online publication date: 5-Sep-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media