Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2835776.2835798acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Evolution of Privacy Loss in Wikipedia

Published: 08 February 2016 Publication History

Abstract

The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.

References

[1]
Supplementary material: Evolution of privacy loss in Wikipedia, 2015. http://goo.gl/JT6WK7.
[2]
A. Acquisti, L. K. John, and G. Loewenstein. What is privacy worth? The Journal of Legal Studies, 42(2):249--274, June 2013.
[3]
R. Almeida, B. Mozafari, and J. Cho. On the evolution of Wikipedia. In ICWSM '07, 2007.
[4]
D. Barth-Jones, K. E. Emam, J. Bambauer, a. Cavoukian, and B. Malin. Assessing data intrusion threats. Science, 348(6231):194--195, Apr. 2015.
[5]
d. boyd and A. E. Marwick. Social Privacy in Networked Publics: Teens' Attitudes, Practices, and Strategies. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, sep 2011.
[6]
C. Danescu-Niculescu-Mizil, L. Lee, B. Pang, and J. Kleinberg. Echoes of power: Language effects and power differences in social interaction. In WWW, page 699, 2012.
[7]
Y.-A. de Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports, 3, 2013.
[8]
A. Gibbons, D. Vetrano, and S. Biancani. Wikipedia: Nowhere to grow. Tech. report, Standford, 2012.
[9]
A. Halfaker, R. S. Geiger, J. T. Morgan, and J. Riedl. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist, 57(5):664--688, Dec. 2012.
[10]
C. J. Hoofnagle, J. King, S. Li, and J. Turow. How Different are Young Adults from Older Adults When it Comes to Information Privacy Attitudes and Policies? Ssrn scholarly paper, Apr. 2010.
[11]
G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning, volume 103 of Springer Texts in Statistics. Springer New York, 2013.
[12]
M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS, 110(15):5802--5805, 2013.
[13]
D. J. MacKay. Information theory, inference and learning algorithms. Cambridge university press, 2003.
[14]
A.-M. Meyer and D. Gotz. A new privacy debate. Science, 348(6231):194--194, Apr. 2015.
[15]
P. E. Meyer. R package 'infotheo', 2014.
[16]
Y.-a. D. Montjoye and a. S. Pentland. Assessing data intrusion threats--Response. Science, 348(6231):195--195, Apr. 2015.
[17]
Y.-a. D. Montjoye, L. Radaelli, V. K. Singh, and a. S. Pentland. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science, 347(6221):536--539, Jan. 2015.
[18]
A. Narayanan, E. Shi, and B. I. P. Rubinstein. Link prediction by de-anonymization: How we won the kaggle social network challenge. In IJCNN, pages 1825--1834, 2011.
[19]
A. Narayanan and V. Shmatikov. Myths and fallacies of personally identifiable information. Comm. of the ACM, 53(6):24--26, June 2010.
[20]
A. Ramachandran and A. Chaintreau. The Network Effect of Privacy Choices. In Workshop EcoNet, pages 1--4, 2015.
[21]
J. Saramaki, E. A. Leicht, E. Lopez, S. G. B. Roberts, F. Reed-Tsochas, and R. I. M. Dunbar. The persistence of social signatures in human communication. PNAS, 2014.
[22]
B. Suh, G. Convertino, E. H. Chi, and P. Pirolli. The Singularity is Not Near: Slowing Growth of Wikipedia. In WikiSym '09, pages 8:1--8:10. ACM, Oct. 2009.
[23]
L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557--570, 2002.
[24]
G. Ver Steeg and A. Galstyan. Information transfer in social media. In WWW '12, page 509, 2012.
[25]
L. Wasserman and S. Zhou. A statistical framework for differential privacy. Jour. of American Stat. Assoc., 105(489):375--389, 2010.
[26]
H. T. Welser, D. Cosley, G. Kossinets, A. Lin, F. Dokshin, G. Gay, and M. Smith. Finding social roles in wikipedia. In iConference, pages 122--129. ACM, 2011.
[27]
W. Youyou, M. Kosinski, and D. Stillwell. Computer-based personality judgments are more accurate than those made by humans. PNAS, 2014.

Cited By

View all
  • (2025)Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public FiguresComputer Speech & Language10.1016/j.csl.2024.10169089(101690)Online publication date: Jan-2025
  • (2023)Taboo and Collaborative Knowledge Production: Evidence from WikipediaProceedings of the ACM on Human-Computer Interaction10.1145/36100907:CSCW2(1-25)Online publication date: 4-Oct-2023
  • (2023)Predicting Relationship Labels and Individual Personality Traits From Telecommunication History in Social Networks Using Hawkes ProcessesIEEE Access10.1109/ACCESS.2023.323897011(8492-8503)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Check for updates

Author Tags

  1. de-anonymization
  2. online privacy
  3. temporal loss of privacy

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM 2016
WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
February 22 - 25, 2016
California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)18
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public FiguresComputer Speech & Language10.1016/j.csl.2024.10169089(101690)Online publication date: Jan-2025
  • (2023)Taboo and Collaborative Knowledge Production: Evidence from WikipediaProceedings of the ACM on Human-Computer Interaction10.1145/36100907:CSCW2(1-25)Online publication date: 4-Oct-2023
  • (2023)Predicting Relationship Labels and Individual Personality Traits From Telecommunication History in Social Networks Using Hawkes ProcessesIEEE Access10.1109/ACCESS.2023.323897011(8492-8503)Online publication date: 2023
  • (2022)Interval-censored Hawkes processesThe Journal of Machine Learning Research10.5555/3586589.358692723:1(15236-15319)Online publication date: 1-Jan-2022
  • (2022)An Analysis of Content Gaps Versus User Needs in the Wikidata Knowledge GraphThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_21(354-374)Online publication date: 16-Oct-2022
  • (2021)The Role of Local Content in Wikipedia: A Study on Reader and Editor EngagementÁrea Abierta10.5209/arab.7280121:2(123-151)Online publication date: 17-May-2021
  • (2021)Petit mode d’emploi des médias sociaux à l’usage des personnes malveillantesShort instructions for villains on how to use social mediaBreve manual sobre el uso de los medios sociales por personas malintencionadasRevue d'anthropologie des connaissances10.4000/rac.1972715:1Online publication date: 1-Mar-2021
  • (2021)Exploring the gender gap in the Spanish Wikipedia: Differences in engagement and editing practicesPLOS ONE10.1371/journal.pone.024670216:2(e0246702)Online publication date: 23-Feb-2021
  • (2021)Inferring Sociodemographic Attributes of Wikipedia Editors: State-of-the-art and Implications for Editor PrivacyCompanion Proceedings of the Web Conference 202110.1145/3442442.3452350(616-622)Online publication date: 19-Apr-2021
  • (2018)A Villain's Guide To Social Media And Web ScienceProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3210576(246-250)Online publication date: 3-Jul-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media