Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Disguising Reddit sources and the efficacy of ethical research

Published: 01 September 2022 Publication History

Abstract

Concerned researchers of online forums might implement what Bruckman (2002) referred to as disguise. Heavy disguise, for example, elides usernames and rewords quoted prose so that sources are difficult to locate via search engines. This can protect users (who might be members of vulnerable populations, including minors) from additional harms (such as harassment or additional identification). But does disguise work? I analyze 22 Reddit research reports: 3 of light disguise, using verbatim quotes, and 19 of heavier disguise, using reworded phrases. I test if their sources can be located via three different search services (i.e., Reddit, Google, and RedditSearch). I also interview 10 of the reports’ authors about their sourcing practices, influences, and experiences. Disguising sources is effective only if done and tested rigorously; I was able to locate all of the verbatim sources (3/3) and many of the reworded sources (11/19). There is a lack of understanding, among users and researchers, about how online messages can be located, especially after deletion. Researchers should conduct similar site-specific investigations and develop practical guidelines and tools for improving the ethical use of online sources.

References

[1]
Andalibi, N., Ozturk, P., & Forte, A. (2017). Sensitive self-disclosures, responses, and social support on Instagram. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing.
[2]
Reagle, J., & Gaur, M. (2022). Spinning words as disguise: Shady services for ethical research? First Monday.
[3]
Ayers, J. W., Caputi, T. L., Nebeker, C., & Dredze, M. (2018). Don’t quote me: Reverse identification of research participants in social media studies. NPJ Digital Medicine, 1(1),
[4]
Backes, M., Berrang, P., Goga, O., Gummadi, K. P., & Manoharan, P. (2016). On profile linkability despite anonymity in social media systems. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society - WPES’16.
[5]
Balamuta, J. (2018, November 13). Using Google BigQuery to obtain Reddit comment phrase counts. The Coatless Professor. https://thecoatlessprofessor.com/programming/sql/using-google-bigquery-to-obtain-reddit-comment-phrase-counts/
[6]
Barbaro, M., & Zeller, T. Jr. (2006, August 9). A face is exposed for AOL searcher no. 4417749. The New York Times. https://www.nytimes.com/2006/08/09/technology/09aol.html
[7]
Baumgartner, J. (2016, September 19). pushshift.io: API documentation: List of endpoints. pushshift.io. https://pushshift.io/api-parameters/
[8]
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., & Blackburn, J. (2020). The Pushshift Reddit dataset. Proceedings of The International AAAI Conference on Web and Social Media, 14(1), 830–839. https://ojs.aaai.org/index.php/ICWSM/article/view/7347
[9]
boyd, danah. (2007). Why youth heart social network sites. In D. Buckingham (Ed.), Youth, identity, anddigital media. MIT Press.
[10]
Buckingham, D. (Ed.). (2007). Why youth heart social network sitesYouth, identity, and digital media. MIT Press. boyd
[11]
Brown, A., & Abramson, M. (2015). Twitter fingerprints as active authenticators. 2015 IEEE International Conference on Data Mining Workshop (ICDMW).
[12]
Bruckman, A. (2002). Studying the amateur artist: a perspective on disguising data collected in human subjects research on the Internet.Ethics and Information Technology, 4(3). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.1591&rep=rep1&type=pdf
[13]
Bruckman, A., Luther, K., & Fiesler, and C (2015). When should we use real names in published accounts of internet research?. In E. Hargittai, & C. Sandvig (Eds.), Digital research confidential: The secrets of studying behavior online. MIT Press
[14]
Brunton, F., & Nissenbaum, H. (2015). Obfuscation: A user’s guide for privacy and protest. MIT Press. https://we.riseup.net/assets/355198/Obfuscation.pdf
[15]
Chen, Y., Sherren, K., Smit, M., & Lee, K. Y. (2021). Using social media images as data in social science research. New Media & Society, 146144482110387.
[16]
ConvoKit (2018, October 31). Reddit corpus (by subreddit). Cornell. https://convokit.cornell.edu/documentation/subreddit.html
[17]
Dym, B., & Fiesler, C. (2020). Ethical and privacy considerations for research using online fandom data. Transformative Works and Cultures, 33.
[18]
Ess, C., & Committee, A. E. W. (2002). Ethical decision-making and Internet research: recommendations from the AOIR Ethics Working Committee. http://aoir.org/reports/ethics.pdf
[19]
Eysenbach, G., & Till, J. E. (2001). Ethical issues in qualitative research on internet communities.BMJ,1103–1105. http://bmj.bmjjournals.com/cgi/content/full/323/7321/1103
[20]
Fiesler, C., & Proferes, N. (2018). “Participant” perceptions of Twitter research ethics. Social Media + Society, 4(1),
[21]
Finn J and Lavitt M Computer-based self-help groups for sexual abuse survivors Social Work With Groups 1994 17 1–2 21-46
[22]
Flicker S, Haans D, and Skinner H Ethical dilemmas in research on internet communities Qualitative Health Research 2004 14 1 124-134
[23]
Franzke, A. S., Bechmann, A., Zimmer, M., Ess, C., & AoIR (2020). and. Internet research: Ethical guidelines 3.0. AoIR. https://aoir.org/reports/ethics3.pdf
[24]
Gaffney, D., & Matias, J. N. (2018). Caveat emptor, computational social science: Large-scale missing data in a widely-published Reddit corpus. PLOS ONE, 13(7),
[25]
Guarino, A. (2013). Digital forensics as a big data challenge. ISSE 2013 Securing Electronic Business Processes, 197–203.
[27]
Haimson, O. L., Andalibi, N., & Pater, J. (2016, December 20). Ethical use of visual social media content in research publications. AHRECS. https://ahrecs.com/ethical-use-visual-social-media-content-research-publications/
[28]
Johansson, F., Kaati, L., & Shrestha, A. (2015). Timeprints for identifying social media users with multiple aliases. Security Informatics, 4(1),
[29]
King, S. A. (1996). Researching internet communities: Proposed ethical guidelines for the reporting of results. The Information Society, 12(2),
[30]
Kozinets RV Netnography: Redefined (Kindle) 2015 Limited SAGE Publications
[31]
Mann, C., & Stuart, F. (2000). Internet communication and qualitative research: a handbook for researching online. Sage
[32]
Markham, A. (2012). Fabrication as ethical practice: Qualitative inquiry in ambiguous Internet contexts. Information Communication & Society, 15(3),
[33]
Narayanan, A., Paskov, H., Gong, N. Z., Bethencourt, J., Stefanov, E., Shin, E. C. R., & Song, D. (2012). On the feasibility of internet-scale author identification. 2012 IEEE Symposium on Security and Privacy.
[34]
Narayanan, A., & Shmatikov, V. (2009). De-anonymizing social networks. 2009 30th IEEE Symposium on Security and Privacy.
[35]
Nguyen, H., & Cavallari, S. (2020). Neural multi-task text normalization and sanitization with pointer-generator. Proceedings of the First Workshop on Natural Language Interfaces.
[36]
Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization.UCLA Law Review, 58(2). https://www.uclalawreview.org/broken-promises-of-privacy-responding-to-the-surprising-failure-of-anonymization-2/
[37]
Pentzold C “What are these researchers doing in my Wikipedia?”: Ethical premises and practical judgment in internet-based ethnography Ethics and Information Technology 2017 19 2 143-155
[38]
Proferes, N., Jones, N., Gilbert, S., Fiesler, C., & Zimmer, M. (2021). Studying Reddit: A systematic overview of disciplines, approaches, methods, and ethics. Social Media + Society, 7(2),
[39]
Reddit (2021, January 27). Reddit by the numbers. RedditInc. https://www.redditinc.com/press
[40]
Reddit Search. (2021, January 14). Reddit. https://www.reddit.com/wiki/search
[41]
Reid, E. (1996). Informed consent in the study of online communities: A reflection on the effects of computer-mediated social research.Information Science, 12(2)
[42]
Reyes, V. (2017). Three models of transparency in ethnographic research: Naming places, naming people, and sharing data. Ethnography, 19(2),
[43]
Rodham K and Gavin J The ethics of using the internet to collect qualitative research data Research Ethics 2006 2 3 92-97
[44]
Sharf, B. (1999). Beyond netiquette: The ethics of doing naturalistic discourse research on the Internet. In S. Jones (Ed.), Doing internet research: Critical issues and methods for examining the net. Sage
[45]
Shklovski, I., & Vertesi, J. (2013, April 27). “UnGoogling” publications: The ethics and problems of anonymization. Proceedings of CHI 2013. https://pure.itu.dk/portal/files/80190129/p2169_shklovski.pdf
[46]
Siang, S. (1999). Researching ethically with human subjects in cyberspace.Professional Ethics Report, 22(4). http://www.aaas.org/spp/sfrl/per/per19.htm
[47]
Singal, J. (2016, March 9). 3 lingering questions from the Alice Goffman controversy. The Cut. https://www.thecut.com/2016/01/3-lingering-questions-about-alice-goffman.html
[48]
Singal, J. (2015, June 18). The internet accused Alice Goffman of faking details in her study of a black neighborhood. I went to Philadelphia to check. The Cut. https://www.thecut.com/2015/06/i-fact-checked-alice-goffman-with-her-subjects.html
[50]
Smith, J. S., & Murray, C. D. (2001). Pearls, pith, and provocation: Ethical issues in the documentary data analysis of internet posts and archives.Qualitative Health Research, 11(3)
[51]
Staff, R. (2022, April 14). New on Reddit: Comment search, improved search results relevance, updated search design. Reddit Inc. https://www.redditinc.com/blog/new-on-reddit-comment-search-improved-search-results-relevance-updated-search-design
[52]
Stuck_In_the_Matrix (2019, April 8). Pushshift will now be opting in by default to quarantined subreddits. r/pushshift. https://www.reddit.com/r/pushshift/comments/bazctc/pushshift_will_now_be_opting_in_by_default_to/.
[53]
Stuck_In_the_Matrix (2015, September 8). Reddit data for ~ 900,000 subreddits (includes both public and private subreddits). r/datasets. https://www.reddit.com/r/datasets/comments/3k3mr9/reddit_data_for_900000_subreddits_includes_both/
[54]
Waskul D and Douglas M Considering the electronic participant: polemical observations on the ethics of online research The Information Society 1996 12 129-139
[55]
Zhou X, Liang X, Zhang H, and Ma Y Cross-platform identification of anonymous identical users in multiple social media networks IEEE Transactions on Knowledge and Data Engineering 2016 28 2 411-424
[56]
Zimmer, M. (2010). “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4),

Cited By

View all
  • (2024)Whose Knowledge is Valued? Epistemic Injustice in CSCW ApplicationsProceedings of the ACM on Human-Computer Interaction10.1145/36870628:CSCW2(1-28)Online publication date: 8-Nov-2024
  • (2024)Remember the Human: A Systematic Review of Ethical Considerations in Reddit ResearchProceedings of the ACM on Human-Computer Interaction10.1145/36330708:GROUP(1-33)Online publication date: 16-Feb-2024
  • (2024)Here Be Livestreams: Trade-offs in Creating Temporal Maps of RedditProceedings of the 16th ACM Web Science Conference10.1145/3614419.3643999(81-91)Online publication date: 21-May-2024
  • Show More Cited By

Index Terms

  1. Disguising Reddit sources and the efficacy of ethical research
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Ethics and Information Technology
    Ethics and Information Technology  Volume 24, Issue 3
    Sep 2022
    197 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 September 2022
    Accepted: 19 July 2022

    Author Tags

    1. Ethics
    2. Research
    3. Online
    4. Reddit
    5. Disguise
    6. Fabrication

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Whose Knowledge is Valued? Epistemic Injustice in CSCW ApplicationsProceedings of the ACM on Human-Computer Interaction10.1145/36870628:CSCW2(1-28)Online publication date: 8-Nov-2024
    • (2024)Remember the Human: A Systematic Review of Ethical Considerations in Reddit ResearchProceedings of the ACM on Human-Computer Interaction10.1145/36330708:GROUP(1-33)Online publication date: 16-Feb-2024
    • (2024)Here Be Livestreams: Trade-offs in Creating Temporal Maps of RedditProceedings of the 16th ACM Web Science Conference10.1145/3614419.3643999(81-91)Online publication date: 21-May-2024
    • (2023)Using Online Discussions to Understand Challenges and Design Opportunities in Dementia CareProceedings of the 35th Australian Computer-Human Interaction Conference10.1145/3638380.3638394(211-220)Online publication date: 2-Dec-2023
    • (2023)A Systematic Review of Ethics Disclosures in Predictive Mental Health ResearchProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594082(1311-1323)Online publication date: 12-Jun-2023
    • (2023)Sliding into My DMs: Detecting Uncomfortable or Unsafe Sexual Risk Experiences within Instagram Direct Messages Grounded in the Perspective of YouthProceedings of the ACM on Human-Computer Interaction10.1145/35795227:CSCW1(1-29)Online publication date: 16-Apr-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media