Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

Published: 27 March 2019 Publication History

Abstract

Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success but also a hindrance to good quality. Although Wikipedia has established guidelines for the “perfect article,” authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever-growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. Instead, our contribution is an interactive tool that combines automatic classification methods and human interaction in a toolkit, whereby experts can experiment with new quality metrics and share them with authors that need to identify weaknesses to improve a particular article. A design study shows that experts are able to effectively create complex quality metrics in a visual analytics environment. In turn, a user study evidences that regular users can identify flaws, as well as high-quality content based on the inspection of automatic quality scores.

References

[1]
B. Thomas Adler, Krishnendu Chatterjee, Luca De Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis. ACM, 26.
[2]
B. Thomas Adler and Luca De Alfaro. 2007. A content-driven reputation system for the Wikipedia. In Proceedings of the 16th International Conference on World Wide Web. ACM, 261--270.
[3]
Janet E. Alexander and Marsha A. Tate. 1999. Web Wisdom: How to Evaluate and Create Information Quality on the Webb (1st ed.). L. Erlbaum Associates Inc., Hillsdale, NJ.
[4]
Maik Anderka. 2013. Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Dissertation. Bauhaus-Universität Weimar.
[5]
Maik Anderka, Benno Stein, and Nedim Lipka. 2011. Detection of text quality flaws as a one-class classification problem. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, 2313--2316.
[6]
Maik Anderka, Benno Stein, and Nedim Lipka. 2011. Towards automatic quality assurance in Wikipedia. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). ACM, 5--6.
[7]
Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting quality flaws in user-generated content: The case of Wikipedia. In Proceedings of the 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR’12), Bill Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson (Eds.). ACM, 981--990.
[8]
Denise Anthony, Sean Smith, and Tim Williamson. 2009. Reputation and reliability in collective goods: The case of the online encyclopedia Wikipedia. Rational. Soc. 21, 3 (2009), 283--306.
[9]
Ofer Arazy, Oded Nov, Raymond Patterson, and Lisa Yeo. 2011. Information quality in Wikipedia: The effects of group composition and task conflict. J. Manage. Info. Syst. 27, 4 (2011), 71--98.
[10]
Ricardo Baeza-Yates. 2009. User generated content: How good is it? In Proceedings of the 3rd Workshop on Information Credibility on the Web (WICOW’09). ACM, 1--2.
[11]
Joshua E. Blumenstock. 2008. Size matters: Word count as a measure of quality on Wikipedia. In Proceedings of the 17th International Conference on World Wide Web. ACM, 1095--1096.
[12]
Ulrik Brandes, Patrick Kenis, Jürgen Lerner, and Denise van Raaij. 2009. Network analysis of collaboration structure in Wikipedia. In Proceedings of the 18th International Conference on World Wide Web. ACM, 731--740.
[13]
Fanny Chevalier, Stéphane Huot, and Jean-Daniel Fekete. 2010. Wikipediaviz: Conveying article quality for casual wikipedia readers. In Proceedings of the Pacific Visualization Symposium (PacificVis’10). IEEE, 49--56.
[14]
Wikipedia community. 2015. Wikipedia: Wikipedians. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:Wikipedians.
[15]
Wikipedia community. 2017. Gadget usage statistics. Retrieved from https://en.wikipedia.org/wiki/Special:GadgetUsage.
[16]
Answers Corporation. 2015. Answer Wikipedia. Retrieved from http://www.answers.com/.
[17]
Daniel Hasan Dalip, Raquel Lara Santos, Diogo Rennó Oliveira, Valéria Freitas Amaral, Marcos André Gonçalves, Raquel Oliveira Prates, Raquel Minardi, and Jussara Marques de Almeida. 2011. GreenWiki: A tool to support users’ assessment of the quality of Wikipedia articles. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries. ACM, 469--470.
[18]
Cecilia di Sciascio, David Strohmaier, Marcelo Errecalde, and Eduardo Veas. 2017. WikiLyzer: Interactive information quality assessment in Wikipedia. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 377--388.
[19]
William Emigh and Susan C. Herring. 2005. Collaborative authoring on the Web: A genre analysis of online encyclopedias. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS’05). IEEE Computer Society, 99.1.
[20]
Edgardo Ferretti, Marcelo Errecalde, Maik Anderka, and Benno Stein. 2014. On the use of reliable-negatives selection strategies in the PU learning approach for quality flaws prediction in Wikipedia. In Proceedings of the 11th International Workshop on Text-based Information Retrieval (TIR’14) (Held in Conjunction with DEXA’14). IEEE, 211--215.
[21]
E. Ferretti, D. H. Fusilier, R. G. Cabrera, M. Montes-y-Gómez, M. Errecalde, and P. Rosso. 2012. On the use of PU learning for quality flaw prediction in Wikipedia: Notebook for PAN at CLEF 2012. In Proceedings of the Conference and Labs of the Evaluation Forum (CLEF’12).
[22]
O. Ferschke, I. Gurevych, and M. Rittberger.2012. FlawFinder: A modular system for predicting quality flaws in Wikipedia: Notebook for PAN at CLEF 2012. In Proceedings of the Conference and Labs of the Evaluation Forum (CLEF’12).
[23]
Oliver Ferschke, Iryna Gurevych, and Marc Rittberger. 2013. The impact of topic bias on quality flaw prediction in Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, 721--730.
[24]
Cyril Goutte and Eric Gaussier. 2005. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Advances in Information Retrieval. Springer, 345--359.
[25]
Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Hanspeter Pfister, and Marc Streit. 2013. LineUp: Visual analysis of multi-attribute rankings. IEEE Trans. Visual. Comput. Graph. 19, 12 (Dec. 2013), 2277--86.
[26]
Daniel Hasan Dalip, Marcos André Gonçalves, Marco Cristo, and Pável Calado. 2009. Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, 295--304.
[27]
Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady Wirawan Lauw, and Ba-Quy Vuong. 2007. Measuring article quality in Wikipedia: Models and evaluation. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM’07). ACM, 243--252.
[28]
IBM. 2015. IBM History Flow. Retrieved from https://en.wikipedia.org/wiki/IBM_History_Flow_tool.
[29]
Myshkin Ingawale, Amitava Dutta, Rahul Roy, and Priya Seetharaman. 2013. Network analysis of user generated content quality in Wikipedia. Online Info. Rev. 37, 4 (2013), 602--619.
[30]
Elisabeth Lex, Michael Völske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein, and Michael Granitzer. 2012. Measuring the quality of web content using factual information. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality’12). ACM, 7--10.
[31]
Andrew Lih. 2004. Wikipedia as participatory journalism: Reliable sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism. 16--17.
[32]
Ee-Peng Lim, Ba-Quy Vuong, Hady Wirawan Lauw, and Aixin Sun. 2006. Measuring qualities of articles contributed by online communities. In Proceedings of the Conference on Web Intelligence. 81--87.
[33]
Nedim Lipka and Benno Stein. 2010. Identifying featured articles in Wikipedia: Writing style matters. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1147--1148.
[34]
Jeph Paul. 2015. Replay Edits. Retrieved from http://cosmiclattes.github.io/wikireplay/player.html.
[35]
Peter Pirolli, Evelin Wollny, and Bongwon Suh. 2009. So you know you’re getting the best possible information: A tool that increases Wikipedia credibility. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1505--1508.
[36]
Nathalie Henry Riche, Bongshin Lee, and Fanny Chevalier. 2010. iChase: Supporting exploration and awareness of editing activities on Wikipedia. In Proceedings of the International Conference on Advanced Visual Interfaces. ACM, 59--66.
[37]
Jeff Sauro and James R. Lewis. 2012. Quantifying the User Experience: Practical Statistics for User Research. Elsevier.
[38]
Michael Sedlmair, Miriah Meyer, and Tamara Munzner. 2012. Design study methodology: Reflections from the trenches and the stacks. IEEE Trans. Visual. Comput. Graph. 18, 12 (2012), 2431--2440.
[39]
Shilad Sen, Anja Beth Swoap, Qisheng Li, Brooke Boatman, Ilse Dippenaar, Rebecca Gold, Monica Ngo, Sarah Pujol, Bret Jackson, and Brent Hecht. 2017. Cartograph: Unlocking spatial visualization through semantic enhancement. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 179--190.
[40]
Marc Streit and Nils Gehlenborg. 2014. Bar charts and box plots. Nature Methods 11, 2 (Feb. 2014), 117.
[41]
Besiki Stvilia, Michael B. Twidale, Linda C. Smith, and Les Gasser. 2005. Assessing information quality of a community-based encyclopedia. In Proceedings of the 10th International Conference on Information Quality (ICIQ’05). MIT, 442--454.
[42]
User: Pyrospirit. 2015. User: Pyrospiri. Retrieved from https://en.wikipedia.org/wiki/User:Pyrospirit/metadata.
[43]
Ivo van Kamp. 2015. Axon. Retrieved from https://addons.mozilla.org/en-US/firefox/addon/axon/.
[44]
Jarke J. van Wijk. 2006. Views on visualization. IEEE Trans. Visual. Comput. Graph. 12, 4 (2006), 433--1000.
[45]
Fernanda B. Viégas, Martin Wattenberg, and Kushal Dave. 2004. Studying cooperation and conflict between authors with history flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 575--582.
[46]
Wikipedia Community. 2015. AutoWikiBrowser. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser.
[47]
Wikipedia Community. 2015. wikiED. Retrieved from https://en.wikipedia.org/wiki/User:Cacycle/wikEd.
[48]
Wikipedia Community. 2015. Wikipedia: Featured article criteria. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria.
[49]
Wikipedia Community. 2017. Wikipedia: WikiProject articles for creation/helper script. Retrieved from https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Articles_for_creation/Helper_script.
[50]
Dennis M. Wilkinson and Bernardo A. Huberman. 2007. Cooperation and quality in Wikipedia. In Proceedings of the 3rd International Symposium on Wikis and Open Collaboration (WikiSym’07). ACM, 157--164.
[51]
Thomas Wöhner and Ralf Peters. 2009. Assessing the quality of Wikipedia articles with lifecycle based metrics. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration. ACM, 16.

Cited By

View all
  • (2021)Personalizing alternatives for diverse learner groups: readability toolsIntelligent Systems and Learning Data Analytics in Online Education10.1016/B978-0-12-823410-5.00003-6(301-321)Online publication date: 2021
  • (2020)How can wikipedia be used to support the process of automatically building multilingual domain modules? a case study.Information Processing & Management10.1016/j.ipm.2020.10223257:4(102232)Online publication date: Jul-2020

Index Terms

  1. Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Interactive Intelligent Systems
        ACM Transactions on Interactive Intelligent Systems  Volume 9, Issue 2-3
        Special Issue on Highlights of ACM IUI 2017
        September 2019
        324 pages
        ISSN:2160-6455
        EISSN:2160-6463
        DOI:10.1145/3320251
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 March 2019
        Accepted: 01 May 2018
        Revised: 01 March 2018
        Received: 01 June 2017
        Published in TIIS Volume 9, Issue 2-3

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Text analytics
        2. Wikipedia
        3. information quality assessment
        4. user-generated content
        5. visual analytics

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Austrian COMET Program
        • WIQ-EI project within EC FP7-PEOPLE program
        • H2020 AFEL project

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)17
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 19 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)Personalizing alternatives for diverse learner groups: readability toolsIntelligent Systems and Learning Data Analytics in Online Education10.1016/B978-0-12-823410-5.00003-6(301-321)Online publication date: 2021
        • (2020)How can wikipedia be used to support the process of automatically building multilingual domain modules? a case study.Information Processing & Management10.1016/j.ipm.2020.10223257:4(102232)Online publication date: Jul-2020

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media