Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3173574.3174140acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities

Published: 21 April 2018 Publication History

Abstract

The extensive Wikipedia literature has largely considered Wikipedia in isolation, outside of the context of its broader Internet ecosystem. Very recent research has demonstrated the significance of this limitation, identifying critical relationships between Google and Wikipedia that are highly relevant to many areas of Wikipedia-based research and practice. This paper extends this recent research beyond search engines to examine Wikipedia's relationships with large-scale online communities, Stack Overflow and Reddit in particular. We find evidence of consequential, albeit unidirectional relationships. Wikipedia provides substantial value to both communities, with Wikipedia content increasing visitation, engagement, and revenue, but we find little evidence that these websites contribute to Wikipedia in return. Overall, these findings highlight important connections between Wikipedia and its broader ecosystem that should be considered by researchers studying Wikipedia. Critically, our results also emphasize the key role that volunteer-created Wikipedia content plays in improving other websites, even contributing to revenue generation.

References

[1]
Alexa.com. 2018. Alexa Top 500 Global Sites. Retrieved July 17, 2017 from http://www.alexa.com/topsites
[2]
Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2012. Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12), 850--858.
[3]
Peter C Austin. 201 An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research 46, 3: 399--424.
[4]
Erik Brynjolfsson and Andrew McAfee. 2014. The second machine age: Work, progress, and prosperity in a time of brilliant technologies. WW Norton&Company.
[5]
Fabio Calefato, Filippo Lanubile, Maria Concetta Marasciulo, and Nicole Novielli. 2015. Mining successful answers in stack overflow. In Mining Software Repositories (MSR), 2015 IEEE/ACM 12th Working Conference on, 430--433.
[6]
Judith A Chevalier and Dina Mayzlin. 2006. The effect of word of mouth on sales: Online book reviews. Journal of marketing research 43, 3: 345--354.
[7]
Giovanni Luca Ciampaglia and Dario Taraborelli. 2015. MoodBar: Increasing new user retention in Wikipedia through lightweight socialization. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work&Social Computing, 734--742.
[8]
Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 2: 283.
[9]
Richard K Crump, V Joseph Hotz, Guido W Imbens, and Oscar A Mitnik. 2009. Dealing with limited overlap in estimation of average treatment effects. Biometrika 96, 1: 187--199.
[10]
Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2098--2110.
[11]
Dean Eckles and Eytan Bakshy. 2017. Bias and high-dimensional adjustment in observational studies of peer effects. arXiv preprint arXiv:1706.04692.
[12]
Paolo Ferragina and Ugo Scaiella. 2010. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management, 1625--1628.
[13]
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJcAI, 1606--1611.
[14]
Eric Gilbert. 2013. Widespread Underprovision on Reddit. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work (CSCW '13), 803--808.
[15]
Carlos Gómez, Brendan Cleary, and Leif Singer. 2013. A Study of Innovation Diffusion Through Link Sharing on Stack Overflow. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13), 81--84. Retrieved from http://dl.acm.org/citation.cfm?id=2487085.2487105
[16]
José González Cabañas, Angel Cuevas, and Rubén Cuevas. 2017. FDVT: Data Valuation Tool for Facebook Users. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 3799--3809.
[17]
Aaron Halfaker, Oliver Keyes, and Dario Taraborelli. 2013. Making peripheral participation legitimate: reader engagement experiments in wikipedia. In Proceedings of the 2013 conference on Computer supported cooperative work, 849--860.
[18]
Aaron Halfaker, Aniket Kittur, and John Riedl. 2011. Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th international symposium on wikis and open collaboration, 163--172.
[19]
F. Maxwell Harper, Daphne Raban, Sheizaf Rafaeli, and Joseph A. Konstan. 2008. Predictors of Answer Quality in Online Q&A Sites. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08), 865--874.
[20]
Brent Hecht and Darren Gergle. 2009. Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on Communities and technologies, 11--20.
[21]
Brent Hecht and Darren Gergle. 2010. The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. In Proceedings of the SIGCHI conference on human factors in computing systems, 291--300.
[22]
Benjamin Mako Hill and Aaron Shaw. 2013. The Wikipedia gender gap revisited: characterizing survey response bias with propensity score estimation. PloS one 8, 6: e65782.
[23]
Himabindu Lakkaraju, Julian McAuley, and Jure Leskovec. 2013. What's in a Name? Understanding the Interplay between Titles, Content, and Communities in Social Media. International AAAI Conference on Web and Social Media; Seventh International AAAI Conference on Weblogs and Social Media. Retrieved January 1, 2013 from https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6085
[24]
Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko. 2017. Wikipedia Matters.
[25]
Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194: 28--61.
[26]
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, and Zheng Chen. 2008. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 179--186.
[27]
Guido W Imbens and Donald B Rubin. 2015. Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge University Press.
[28]
Isaac L Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at home on the range: Peer production and the urban/rural divide. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 13--25.
[29]
Aniket Kittur and Robert E Kraut. 2008. Harnessing the wisdom of crowds in wikipedia: quality through coordination. In Proceedings of the 2008 ACM conference on Computer supported cooperative work, 37--46.
[30]
Alex Leavitt and Joshua A. Clark. 2014. Upvoting Hurricane Sandy: Event-based News Production Processes on a Social News Site. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14), 1495--1504.
[31]
Jun Liu and Sudha Ram. 2011. Who does what: Collaboration patterns in the wikipedia and their impact on article quality. ACM Transactions on Management Information Systems (TMIS) 2, 2: 11.
[32]
Steven Loria, P Keen, M Honnibal, R Yankovsky, D Karesh, E Dempsey, and others. 2014. TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing.
[33]
Michael Luca. 2011. Reviews, Reputation, and Revenue: The Case of Yelp.com. Retrieved June 21, 2017 from http://www.hbs.edu/faculty/Pages/item.aspx?num=41233
[34]
Connor McMahon, Isaac L Johnson, and Brent J Hecht. 2017. The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies. In ICWSM, 142--151.
[35]
Olena Medelyan, David Milne, Catherine Legg, and Ian H Witten. 2009. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67, 9: 716--754.
[36]
Amanda Menking and Ingrid Erickson. 2015. The heart work of Wikipedia: Gendered, emotional labor in the world's largest online encyclopedia. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 207--210.
[37]
David Milne and Ian H Witten. 2013. An open-source toolkit for mining Wikipedia. Artificial Intelligence 194: 222--239.
[38]
Jonathan T Morgan, Siko Bouterse, Heather Walls, and Sarah Stierch. 2013. Tea and sympathy: crafting positive new user experiences on wikipedia. In Proceedings of the 2013 conference on Computer supported cooperative work, 839--848.
[39]
Daniel Moyer, Samuel L Carson, Thayne Keegan Dye, Richard T Carson, and David Goldbaum. 2015. Determining the influence of Reddit posts on Wikipedia pageviews. In Proceedings of the Ninth International AAAI Conference on Web and Social Media.
[40]
Hendrik Müller, Aaron Sedley, and Elizabeth Ferrall-Nunge. 2014. Survey research in HCI. In Ways of Knowing in HCI. Springer, 229--266.
[41]
Alexandra Olteanu, Onur Varol, and Emre Kiciman. 2017. Distilling the Outcomes of Personal Experiences: A Propensity-scored Analysis of Social Media. In CSCW, 370--386.
[42]
Luca Ponzanelli, Andrea Mocci, Alberto Bacchelli, Michele Lanza, and David Fullerton. 2014. Improving low quality stack overflow post detection. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on, 541--544.
[43]
Yuqing Ren and Robert E Kraut. 2014. Agent based modeling to inform the design of multiuser systems. In Ways of Knowing in HCI. Springer, 395--419.
[44]
Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. Sentibench-a benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1: 1--29.
[45]
Paul R Rosenbaum and Donald B Rubin. 1984. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American statistical Association 79, 387: 516--524.
[46]
Paul R Rosenbaum and Donald B Rubin. 1985. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician 39, 1: 33--38.
[47]
Derek Ruths and Jürgen Pfeffer. 2014. Social media for large studies of behavior. Science 346, 6213: 1063--1064.
[48]
Tom De Smedt and Walter Daelemans. 2012. Pattern for python. Journal of Machine Learning Research 13, Jun: 2063--2067.
[49]
Greg Stoddard. 2015. Popularity Dynamics and Intrinsic Quality in Reddit and Hacker News. In ICWSM, 416--425.
[50]
Dario Taraborelli. 2015. The Sum of All Human Knowledge in the Age of Machines: A New Research Agenda for Wikimedia.
[51]
Yla R. Tausczik and James W. Pennebaker. 2011. Predicting the Perceived Quality of Online Mathematics Contributions from Users' Reputations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), 1885--1888.
[52]
Zeynep Tufekci. 2014. Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls. ICWSM 14: 505--514.
[53]
Claudia Wagner, David Garcia, Mohsen Jadidi, and Markus Strohmaier. 2015. It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia. In ICWSM, 454--463.
[54]
Morten Warncke-Wang, Vladislav R Ayukaev, Brent Hecht, and Loren G Terveen. 2015. The success and failure of quality improvement projects in peer production communities. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work&Social Computing, 743--756.
[55]
Morten Warncke-Wang, Vivek Ranjan, Loren G Terveen, and Brent J Hecht. 2015. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. In ICWSM, 493--502.
[56]
Robert West, Ingmar Weber, and Carlos Castillo. 2012. Drawing a data-driven portrait of Wikipedia editors. In Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration, 3.
[57]
Eric Yeh, Daniel Ramage, Christopher D Manning, Eneko Agirre, and Aitor Soroa. 2009. WikiWalk: random walks on Wikipedia for semantic relatedness. In Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, 41--49.
[58]
Haiyi Zhu, Robert E. Kraut, and Aniket Kittur. 2014. The Impact of Membership Overlap on the Survival of Online Communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14), 281--290.
[59]
Haiyi Zhu, Robert E Kraut, Yi-Chia Wang, and Aniket Kittur. 2011. Identifying shared leadership in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 3431--3434.
[60]
Haiyi Zhu, Robert Kraut, and Aniket Kittur. 2012. Effectiveness of shared leadership in online communities. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, 407--416.
[61]
Haiyi Zhu, Amy Zhang, Jiping He, Robert E Kraut, and Aniket Kittur. 2013. Effects of peer feedback on contribution: a field experiment in Wikipedia. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2253--2262.
[62]
2017. OSSM17: Observational Studies Through Social Media.
[63]
Google turning its lucrative web search over to AI machines - Livemint. Retrieved from http://www.livemint.com/Companies/lPlpcKo53YICXV0ZvF2F7K/Google-turning-its-lucrative-web-search-over-to-AI-machines.html
[64]
Reddit worth $1.8 billion. Retrieved from https://www.cnbc.com/2017/07/31/reddit-worth-1-point-8-billion.html
[65]
Stack Exchange, a site for software developers, raises $40 million | Fortune.com. Retrieved from http://fortune.com/2015/01/20/stack-exchange-40-million/
[66]
Google BigQuery. Retrieved from https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow
[67]
reddit.com: api documentation. Retrieved from https://www.reddit.com/dev/api/
[68]
Today I Learned (TIL). Retrieved from https://www.reddit.com/r/todayilearned/
[69]
Wikipedia:WikiProject Wikipedia/Assessment - Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Wikipedia/Assessment
[70]
Artificial intelligence service "ORES" gives Wikipedians X-ray specs to see through bad edits -- Wikimedia Blog. Retrieved July 17, 2017 from https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
[71]
Template:Grading scheme - Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Template:Grading_scheme
[72]
Causal Inference in Python - Causalinference 0.1.2 documentation. Retrieved from http://causalinferenceinpython.org/
[73]
Reddit's plan to become a real business could fall apart pretty easily - Recode. Retrieved from https://www.recode.net/2016/4/28/11586522/reddit-advertising-sales-plans
[74]
Reddit Gold Counter - How much is reddit making from reddit gold. Retrieved from http://gold.reddit-stream.com/gold/table
[75]
Ad Banners - Developer Advertising Solutions | Advertise on Stack Overflow. Retrieved from https://www.stackoverflowbusiness.com/advertise/solutions/ad-banners
[76]
IAB Study Says 26% of Desktop Users Turn On Ad Blockers -- Adweek. Retrieved from http://www.adweek.com/digital/iab-study-says-26-desktop-users-turn-ad-blockers-172665/
[77]
Book a campaign | Stack Exchange Self-Serve. Retrieved from https://www.selfserve-stackexchange.com/
[78]
Wikipedia:Statistics - Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:Statistics

Cited By

View all
  • (2024)Life Histories of Taboo Knowledge ArtifactsProceedings of the ACM on Human-Computer Interaction10.1145/36870448:CSCW2(1-32)Online publication date: 8-Nov-2024
  • (2024)An Efficient Approach to Store and Access Wikipedia's Revision History for Large-Scale AnalysisProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675150(309-315)Online publication date: 10-Sep-2024
  • (2024)U. S. Users’ Exposure to YouTube Videos On- and Off-platformProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644027(70-80)Online publication date: 21-May-2024
  • Show More Cited By

Index Terms

  1. Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
    April 2018
    8489 pages
    ISBN:9781450356206
    DOI:10.1145/3173574
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Best Paper

    Author Tags

    1. online communities
    2. peer production
    3. reddit
    4. stack overflow
    5. wikipedia

    Qualifiers

    • Research-article

    Funding Sources

    • U.S. NSF

    Conference

    CHI '18
    Sponsor:

    Acceptance Rates

    CHI '18 Paper Acceptance Rate 666 of 2,590 submissions, 26%;
    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI '25
    CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Life Histories of Taboo Knowledge ArtifactsProceedings of the ACM on Human-Computer Interaction10.1145/36870448:CSCW2(1-32)Online publication date: 8-Nov-2024
    • (2024)An Efficient Approach to Store and Access Wikipedia's Revision History for Large-Scale AnalysisProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675150(309-315)Online publication date: 10-Sep-2024
    • (2024)U. S. Users’ Exposure to YouTube Videos On- and Off-platformProceedings of the 16th ACM Web Science Conference10.1145/3614419.3644027(70-80)Online publication date: 21-May-2024
    • (2024)Reproducibility of issues reported in stack overflow questions: Challenges, impact & estimationJournal of Systems and Software10.1016/j.jss.2024.112158217(112158)Online publication date: Nov-2024
    • (2023)Taboo and Collaborative Knowledge Production: Evidence from WikipediaProceedings of the ACM on Human-Computer Interaction10.1145/36100907:CSCW2(1-25)Online publication date: 4-Oct-2023
    • (2023)The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data ProducersProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594070(1151-1161)Online publication date: 12-Jun-2023
    • (2022)Quantifying the Selective, Stochastic, and Complementary Drivers of Institutional Evolution in Online CommunitiesEntropy10.3390/e2409118524:9(1185)Online publication date: 25-Aug-2022
    • (2022)Representing COVID-19 information in collaborative knowledge graphs: The case of WikidataSemantic Web10.3233/SW-21044413:2(233-264)Online publication date: 3-Feb-2022
    • (2022)Ethical Tensions, Norms, and Directions in the Extraction of Online Volunteer WorkCompanion Publication of the 2022 Conference on Computer Supported Cooperative Work and Social Computing10.1145/3500868.3560923(273-277)Online publication date: 8-Nov-2022
    • (2022)Templates and Trust-o-meters: Towards a widely deployable indicator of trust in WikipediaProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517523(1-17)Online publication date: 29-Apr-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media