research-article

Open access

Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations

Authors:

Shagun Jhaver,

Himanshu Rathi,

Koustuv SahaAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 191, Pages 1 - 9

https://doi.org/10.1145/3613904.3642204

Published: 11 May 2024 Publication History

All formats PDF

Abstract

Prior research on transparency in content moderation has demonstrated the benefits of offering post-removal explanations to sanctioned users. In this paper, we examine whether the influence of such explanations transcends those who are moderated to the bystanders who witness such explanations. We conduct a quasi-experimental study on two popular Reddit communities (r/AskReddit and r/science) by collecting their data spanning 13 months—a total of 85.5M posts made by 5.9M users. Our causal-inference analyses show that bystanders significantly increase their posting activity and interactivity levels as compared to their matched control set of users. In line with previous applications of Deterrence Theory on digital platforms, our findings highlight that understanding the rationales behind sanctions on other users significantly shapes observers’ behaviors. We discuss the theoretical implications and design recommendations of this research, focusing on how investing more efforts in post-removal explanations can help build thriving online communities.

1 Introduction

As the social media ecosystem continues to rapidly expand, platform designers and researchers are experimenting with new models of digital governance [24, 41, 60]. Recent research has begun extending guiding principles that could possibly serve such models [61, 76]. This includes rights-based legal approaches, such as international human rights law and American civil rights law [14]. The HCI community has especially centered around aspirational computer science principles of fairness, accountability, transparency, ethics, and responsibility [72]. Most famously, a group of human rights organizations, advocates, and academic experts developed and launched what they termed “the Santa Clara Principles on Transparency and Accountability in Content Moderation,” which aim to guide platforms on how to incorporate meaningful transparency and accountability around moderation of user-generated content [59].

Empirical research on incorporating transparency and how it may benefit users and platforms has also begun to emerge. For example, transparency through removal notification and providing moderators’ reasoning behind content removal has been shown as one of the key factors in users’ perception of the fairness of content moderation [19]. Another study has shown that when offered removal explanations in any online community, users tend to improve their posting behavior in that community in the future [22]. Such evidence has been used to motivate platforms, community moderators, and policymakers to continue to push for increased, meaningful transparency in their moderation practices.

This study seeks to add further empirical evidence to the effects of offering transparency in content moderation on social media platforms. Specifically, we look at whether such transparency can serve users other than those sanctioned. Prior research has provided evidence for the educational benefits of offering removal explanations for users whose content is removed [19, 22]. However, the effects on bystanders who witness the post-removal and the explanation behind it have not been tested. Focusing on bystanders allows us to examine the impact of indirect experiences with punishment on users’ behavior. In this research, we ask the question: Do public removal explanations intended for the sanctioned users influence the posting behavior of bystanders to those explanations?

We collected a dataset of 85.5M posts from two large Reddit communities, r/AskReddit and r/science, over the time period Dec 2021–Dec 2022. Next, we developed a computational framework based on causal inference that matched users who witnessed a removal explanation in June 2022 with users who did not witness any explanation. Comparing the post-treatment behavior of these matched groups, we found that exposure to removal explanations significantly boosted the posting activity and interactivity of bystanders as compared to non-bystanders. This shows that the behavioral impacts of moderation transparency on posting volumes are more broadly applicable than previously understood [19, 22]. Drawing upon this insight, we argue that community managers must invest more time and effort in increasing moderation transparency through explanation messages. On the other hand, witnessing explanation messages did not significantly enhance the posting quality of bystanders. We speculate on the causes of this empirical insight and offer directions for future research that may help us better understand the role of explanation messages.

2 Background and Related Work

2.1 Transparency in Content Moderation

Moderation systems on social media platforms are designed to regulate inappropriate user behaviors and often impose measures such as removing content, muting, or banning offenders [11, 15]. These measures are implemented by content moderators, who may either be volunteers among the platform’s user base or commercial content moderators hired by the platform [50, 64]. More recently, AI-driven tools have been used to assist in moderation processes [1, 16, 27, 28, 33]. We focus here on transparency in end-users’ experience with moderation processes. Transparency implies opening up “the working procedures not immediately visible to those not directly involved to demonstrate the good working of an institution” [47].

We situate our work within a line of research that examines the impact of content moderation on end-users. Scholars have investigated the impact of both user-level [19, 21, 63, 73] and community-wide sanctions [5, 6]. This has included studies using a variety of methods, such as interviews [27], design workshops [72], surveys [19, 74], and log analyses [5, 6, 21]. Prior work has also highlighted how offering moderation explanations benefits sanctioned users [19, 22]. We focus on end-users who witness, although they are not directly affected by, the moderation sanctions. By doing so, we contribute to building a theory [34] that prescribes to community managers which moderation interventions should be deployed, under what circumstances, and with what expected outcomes.

In examining the complexities of enacting content moderation, researchers have identified several issues regarding transparency in the procedures followed by platforms when applying punitive measures [40]. First, the criteria of inappropriate content might not be well-established before moderation decisions are made [62]. Legal experts have raised concerns that despite social media platforms publicly sharing their content policies, they often fail to adequately consider the contextual factors surrounding the content, such as its localized meaning and the identities of the speakers and audiences, when evaluating its appropriateness [75]. Second, there are inter-platform differences in how norm violations are conceptualized. For example, an HCI study comparing the content policies of 15 platforms found a lack of consensus in defining what qualifies as online harassment and how forcefully content deemed as harassment should be moderated [49]. Consequently, when these vague content policies are implemented for content regulation, it can lead to ambiguity in resolving moderation cases [75]. Finally, and most pertinent to our study, communication with end-users on moderation decisions is often found to be deficient in details [67, 74].

2.2 Removal Explanations and Bystanders to Norm Violations

Prior research has emphasized the significance of incorporating moderation notifications and explanations into the design of moderation systems [22, 36, 38, 71]. For example, researchers have shown that when Facebook and Reddit platforms do not inform users about their content removal [67], users question which platform policy they have violated [19, 74]. Besides removal notification, users desire a justification for why their posts got removed, deeming it a significant factor in their perception of moderation fairness [19]. Users also express dissatisfaction with the inconsistent punishments meted out to them versus others, leading them to request explanations further [39, 71]. Many studies have empirically shown the benefits of offering removal explanations in improving the behavior of moderated users [19, 22, 70]. For example, Tyler et al. found that users who were provided education about platform rules in the week following their post removal were less likely to post new violating content [70]. We extend this research by investigating the utility of explanations in influencing the behavior of bystanders.

Curiously, Reddit moderators offer explanations publicly by commenting on the removed submission. While this is not the sole communication mode—indeed, many moderators privately message users to inform them about moderation [22, 54]—prior research has argued that public explanations serve to enhance broader transparency efforts [19, 22]. On Reddit, users already engaging with a post retain access to it even after it is removed from the main subreddit; in this sense, removed submissions are not really removed, just hidden from the public view. By publicly explaining the reason behind post removal, explanation comments serve users who stumble upon it or are already engaged.

We extend prior inquiries into using Deterrence Theory [66] to evaluate the impact of punishments on deterring inappropriate behaviors online [12, 63]. Deterrence Theory makes a distinction between general and specific deterrence—specific deterrence refers to the effect of punitive measures on individuals subjected to them. In contrast, general deterrence pertains to the impact of the potential threat of such measures on uninvolved observers. By focusing on bystanders, we examine the effects of generalized deterrence in shaping user behavior. Seering et al. showed that banning any type of behavior on Twitch significantly reduced the frequency of that behavior in subsequent messages posted by others [63]. Building upon this, we examine whether clarifying which aspects of submissions prompt sanctions via explanation messages influences observers’ subsequent actions.

Encouraging voluntary compliance with behavioral norms in a community requires that community members know the norms and be aware of them when being active within the community. Kiesler et al. [34] argue that people learn the community norms in three ways: (1) observing other people’s behavior and its consequences, (2) seeing codes of conduct, and (3) behaving and directly receiving feedback. Prior research has demonstrated the importance of users seeing codes of conduct [42] and directly receiving feedback in improving their subsequent behavior [22, 70]. We focus here on establishing the utility of bystanders observing other people’s norm violations and the resulting consequences.

In terms of reducing the posting of norm-violating content, some research has focused on the roles bystanders can play in the context of online harassment. Blackwell et al. found that labeling a variety of technology-enabled abusive experiences as ‘online harassment’ helps bystanders understand the breadth and depth of this problem [3]. Further, designs that motivate bystander intervention discourage harassment through normative enforcement [2]. Taylor et al. [68] additionally found that design solutions that encourage empathy and accountability can promote bystander intervention in cyberbullying. Extending this line of research to a broader range of norm violations, we analyze how bystanders are affected by their exposure to post-removal explanations.

3 Data and Methods

3.1 Study Design and Rationale

We conducted an observational study to examine the effects of witnessing post-removal explanations on Reddit. Prior HCI and CSCW research has recognized that observational analyses of social media data can serve as a valuable tool for understanding society and evaluating changes in users’ behavior, especially regarding their use of social network sites [65]. Regarding our study’s context, empirical research on the effects of various content moderation interventions has often deployed observational analyses of social media logs [5, 6, 53, 63]. Similar to our work, such research has primarily examined behavior patterns over more extended timeframes, typically spanning months [17, 21, 22].

Examining the impact of an intervention, whether internal or external, is best studied through causal inference approaches, such as randomized controlled trials (RCTs). However, these approaches have certain limitations. First, experimental studies requiring participant consent can be constrained by concerns about the observer effect [44]—that individuals might alter their typical behavior when they are aware of being monitored or observed. Second, conducting experimental research without participants’ awareness is considered unethical, especially within the human-centered research paradigm [30, 46]. Finally, conducting experiments without prior awareness of their potential impact on participants can lead to long-term adverse consequences for both platforms and individuals.

As a result, observational studies can serve as a viable alternative in situations where experimental approaches may not be feasible or ethical. While observational studies may not provide true causality, they are structured to minimize confounds and investigate longitudinal data, offering stronger evidence than basic correlational analyses [18]. Recently, there has been growing interest in these types of studies within the fields of HCI and behavioral science, including those analyzing social media data [8, 31, 32, 48, 52, 55, 57, 78]. Significantly, the research conducted by Saha et al. prompted us to operationalize metrics for assessing social media behavior, including factors like activity and interactivity [58].

Given the above considerations, we drew on quasi-experimental approaches to observational data. We adopted a causal-inference approach based on the potential outcomes framework proposed by Rubin [51]. Figure 2 shows a schematic figure of our approach. This approach simulates an experimental setting by matching individuals (Treated and Control) on several covariates [18]. For a given treatment, T, two potential outcomes are compared: (1) when a user is exposed to T (T = 1), and (2) when a user is not exposed to T (T = 0). Because it is impossible to obtain both kinds of outcomes simultaneously for the same user, this framework estimates the missing counterfactual for a user based on the outcomes of a matched user—another user with similar covariates (attributes and behaviors) but not exposed to T. Our work drew motivation from prior works that adopted similar causal-inference approaches on social media data [6, 32, 53, 57].

Figure 1:

3.2 Choice of Subreddits

This paper focuses on two major subreddits, r/AskReddit (43M members) and r/science (31M members). r/AskReddit is a community focused on asking and answering questions that elicit thought-provoking discussions, offer light entertainment, and help users learn more about their fellow community members.¹ r/science is a science news and discussion community where users post links to research papers or reputable news items representing recent scientific research, and engage in science communication [29].

We analyze these two communities for two main reasons. First, due to their importance—they are among the largest and most active Reddit communities and have impacted society at large, e.g., through widespread sharing of personal experiences, expert testimony, and science communication on a range of topics [29, 37]. Second, both communities have a mature moderation approach—they have been active for more than 15 years, and have a well-described set of posting guidelines and dozens of active moderators. This made it more likely that their approach to offering removal explanations would deliver messages appropriate for our study.

Figure 1 shows example post-removals on the subreddits. We downloaded the data from these subreddits over 13 months between 01 December 2021–31 December 2022, using the pushshift.io service.

Table 1:

Subreddit	No. Submissions	No. Comments
r/Askreddit	287,954	5,358,662
r/science	2,453	175,007

Table 1: Summary statistics of the Reddit dataset.

We iterated through this dataset, decompressing and decoding it in smaller chunks, and simultaneously storing the readable data into SQLite database tables. We queried the database to access the data for the ensuing analyses in the paper. Table 1 summarizes the data (submissions and comments) collected for our study. Note that we use the term post to indicate posting activity in the form of either submissions or comments; therefore, for any given period \(\mathtt {T}\), \(\mathtt {N_p (T) = N_s (T) + N_c (T)}\) (where \(\mathtt {N_p}\), \(\mathtt {N_s}\), and \(\mathtt {N_c}\) denote the number of posts, submissions, and comments respectively).

3.3 Defining Treated and Control Users

Our study employed a causal-inference framework, drawing on similar approaches in prior research [6, 32, 58]. For this purpose, we defined treatment as exposure to post-removal explanation(s). Within our study period of 13 months, we considered the period between 01-30 June 2022, as our treatment period, i.e., we focused on explanations provided in this one month and collected six-month pre-treatment and six-month post-treatment period data for our analyses. We randomly selected June 2022 as our treatment period, following similar selections in prior moderation research [22].

Our Treated users comprise the “bystanders” or the users of a subreddit who witnessed a removal explanation during the treatment period. While this set would ideally consist of users who read and comprehended the explanation comment, we did not have access to users’ viewing logs. Therefore, we constituted Treated users by assuming the commenting activity as a proxy for exposure and including users who commented in the discussion thread containing the removal explanation. On the other hand, Control users comprise users of the same subreddit who did not comment in any discussion thread containing a removal explanation but posted elsewhere in the same subreddit during the pre-treatment period. We filtered out the data of any user exposed to post-removal explanations in the period between December 2021–May 2022 to ensure that we examined Treated users subjected to treatment in June 2022.

3.4 Gathering Post Removal Explanations Data

We obtained a list of 95 phrases indicating post-removal explanations from prior work by Jhaver et al. [22]. We used these phrases to query our database to collect all the removal explanations in our defined treatment period. Specifically, we queried the created database to retrieve the stickied² comments made by moderators within the treatment period and containing any of the above phrases. We obtained 257 removal explanations on r/AskReddit and 379 such removal explanations on r/science. Focusing on the discussion threads of each of these removal explanations, we next collected the information of the commenters, who are the bystanders or Treated users in our study. In some threads, the removed submission’s author also posted a comment in the discussion thread; we did not include such submission authors in the Treated users groups because our analysis centers on bystanders, not moderated users.

We obtained the timeline of posts made by the Treated users in the corresponding subreddit during the study period. For each subreddit, we also curated a list of Control users: this constituted users who were not Treated users, and who were not exposed to any post-removal explanations in the pre-treatment period. The Treated users were assigned with a treatment date on their first occurrence of witnessing a post-removal explanation during our treatment period. On the other hand, because the Control users did not have any treatment date per se, we simulated a set of placebo dates from the set of all possible treatment dates within the subreddit, such that the distributions of placebo dates and treatment dates were statistically similar. Then, each Control user was randomly assigned a placebo date from the set of placebo dates. For easier readability, any following reference to pre-treatment and post-treatment surrounds treatment date for a Treated user, and placebo date for a Control user.

3.5 Matching for Causal-Inference

Figure 2:

3.5.1 Covariates for Matching.

We operationalized a number of covariates that we would use for matching the Treated and Control users, motivated from prior work [6, 9, 22, 32, 56, 58], as listed below. Each covariate was measured using the data in the user’s pre-treatment history.

–Frequency of Comments : The normalized quantity of comments per day, as also used in prior work [6, 58].

–Frequency of Submissions : The normalized quantity of submissions per day, as also used in prior work [6, 58].

–User Interactivity : The ratio of number of comments to the total number of posts, as also used in prior work [56, 58].

–Submission Removal Rate : The ratio of removal submissions to total submissions posted by the user, as also used in prior work [22].

–Karma : Average karma across the comments and submissions made by the user, as also used in prior work [6, 58].

–Normalized n-grams : The normalized occurrences of the top 1000 n-grams (n = 1, 2),as also used in prior work [9, 57].

3.5.2 Stratified Propensity Score Matching.

As mentioned above, we used matching to find pairs (generalizable to groups) of Treated and Control users with statistically similar covariates. We adopted the propensity score matching approach that matches users based on propensity scores, which is essentially a user’s likelihood of receiving the treatment. However, exact one-to-one propensity score matching can suffer from biases [35]. Therefore, motivated by prior work [32, 55, 77], we adopted stratified propensity score matching that can balance the bias-variance tradeoff of either too biased (one-to-one match) or too variant (unmatched) data comparisons. In a stratified matching approach, users with similar propensity scores are grouped into strata. Hence, every stratum consists of users with similar covariates [32]. Through this approach, we isolated and estimated treatment effects within each stratum.

For the above matching, we computed the propensity scores by building a logistic regression model with the covariates as independent variables and a user’s binary treatment score (1 for Treated users and 0 for Control users) as dependent variable. We segregated the distribution of propensity scores into 200 strata of equal width. To ensure that our causal analysis was restricted to a sufficient number of similar users, we discarded strata with less than 10 Treated and 10 Control users. This led to a final matched dataset of 50 strata (4,842 Treated users and 146,922 Control users) in r/AskReddit and 33 strata (4,890 Treated users and 176,324 Control users) in r/science.

3.6 Measuring Treatment Effects

After matching the Treated and Control users, we measured the differences in the post-treatment behaviors of the users. For this, we operationalized three outcomes—1) Frequency of posting, 2) Interactivity, and 3) Submission Removal Rate for the users in the post-treatment period. We draw on the difference in differences approach in causal-inference [13] and prior work using these approaches on social media [6, 9, 56, 57], to calculate the average treatment effect (ATE) as the average of the difference of changes in the Treated users and the Control users per stratum. In addition, we obtained the effect size (Cohen’s d) and evaluated statistical significance in differences using relative t-tests. We conducted Kolmogorov-Smirnov (KS) test to evaluate the differences in the distributions of the Treated and Control groups’ outcomes.

4 Results

Table 2 summarizes our observations of the differences in the post-treatment outcomes in our study. We describe our findings below:

Table 2:

Outcome	ATE	Cohen’s d	t-test		KS-test
r/AskReddit
Posting Frequency	0.453	0.807	6.589	***	0.640	***
Interactivity	0.193	2.392	12.233	***	0.960	***
Post Removal Rate	0.000	0.005	0.024		0.200
r/science
Posting Frequency	0.025	1.075	8.890	***	0.515	***
Interactivity	0.216	1.445	17.469	***	0.879	***
Post Removal Rate	0.001	0.007	0.177		0.303

Table 2: Summary of changes in outcomes for the Treated and Control individuals. We report average treatment effect (ATE), effect size (Cohen’s d), relative t-test, and KS-test statistics (* p < 0.01, ** p < 0.001, *** p < 0.0001).

Posting Frequency. We find significant differences in the posting frequency of Treated and matched Control individuals. On r/AskReddit, the ATE is 0.453, which can be roughly interpreted as the treatment increases the frequency of posts by 1 for about 45.3% of the individuals. We see a high effect size (0.807) and significant differences as per t-test and KS-test (p < 0.0001). We also see convergent findings in r/science with an ATE of 0.025, Cohen’s d of 1.075, and significant differences as per t-test and KS-test (p < 0.0001). Higher posting frequency indicates that the Treated users (bystanders) became more active in the subreddits after witnessing the post-removal explanations. This measure is an indicator of positive community behavior [58].

Interactivity. Similar to the above, we find significant differences in the interactivity of Treated and Control individuals. On r/AskReddit we find an ATE of 0.193 and Cohen’s d of 2.392, along with statistical significance in differences as per t-test and KS-test (p<0.0001). Likewise, on r/science, ATE on interactivity is 0.216, Cohen’s d is 1.445, and t-test and KS-tests reveal statistical significance (p<0.0001). In addition to higher posting frequency, higher interactivity indicates that the Treated users not only created more new submissions but also replied more to others’ threads—an important factor for enhancing online community engagement [56, 58]. This suggests that post-removal explanations can potentially enhance community engagement and, subsequently, the sustainability and growth of a community with member activity.

Post Removal Rate. Interestingly, we find no significant effects on the post-removal rates. That is, we do not have conclusive evidence if the posting quality improved (or worsened) for the Treated users.

5 Discussion

5.1 Implications

Online communities rely on content generated by users, but inappropriate posts can detract from the quality of the user experience. Consequently, moderation systems typically aim to boost the overall volume of contributions while reducing the need for post-removals [15, 34]. Our analysis in this paper examined the behavioral impact of offering moderation explanations on bystanders over two dimensions — their future posting activity and the frequency of their future post-removals. Our results represent the impact of generalized deterrence — the indirect experience with punishment. Consistent with prior applications of Deterrence Theory in online platforms [12, 63], we show that understanding the reasons for sanctions on another user significantly shapes observers’ behaviors. In this section, we examine the implications of our findings for moderators, site administrators, designers, and future research.

5.1.1 Removal Explanations Help Boost Posting Frequency.

We found that on both r/AskReddit and r/science, users who got exposed to removal explanations directed at moderated others significantly increased their posting activity as compared to users who did not witness any explanations. It could be that seeing explanation messages indicated to bystanders that the community is well-moderated. This, in turn, could have enhanced their inclination to be active within the community.

We note that this result contrasts Jhaver et al.’ findings for moderated users—exposure to removal explanations reduced these users’ future posting activity [22]. One reason for this could be that users who suffer moderation may find it more difficult to accept the justification for their post-removals than other bystanders. Prior work has often grappled with the tradeoffs of moderation actions reducing posting traffic at the cost of improving posting quality [17, 19, 21, 22]. However, as this study’s framing highlights, for any given removed submission, there is only one moderated user but potentially many more bystanders. Thus, our results suggest that providing explanation messages may boost the overall posting frequency in a community. This empirical insight offers a powerful incentive to community managers considering the deployment of explanation messages.

5.1.2 Removal explanations help increase community engagement.

We found that exposure to others’ explanation messages increases the posting interactivity. That is, bystanders’ comments constitute a greater proportion of their posting volume after the treatment. Prior research has shown that this metric is an important factor in community engagement [58]. Therefore, this finding suggests that observing the reasoned explanation for post removals can inform bystanders why certain types of posts are unacceptable in the community, help them learn its accepted norms [7], and thereby increase their confidence in instituting a deeper engagement with the community. This further demonstrates the utility of offering post-removal explanations.

Another explanation for this finding is that users perceive moderators attend to and regulate inappropriate submissions more than inappropriate comments. This perception may incline them to engage more in posting comments than submissions to avoid experiencing post removals. As prior research shows, users often develop “folk theories” of content moderation processes to make sense of them [10, 19]. Going forward, qualitative studies could inquire whether the posting activity of users is shaped by their folk theories of where the content moderation efforts are focused.

5.1.3 Removal explanations do not impact post removals.

Our analysis shows that removal explanations do not significantly impact the future post-removals of bystanders. This contrasts previous results for moderated users: Jhaver et al. showed that offering removal explanations reduced the future post-removals of moderated users [22]. This suggests that explanation messages boost the posting quality of moderated users more than bystanders. Why is this the case? One reason could be that having experienced post removal, moderated users may be likelier to attend to all community guidelines before posting their next submissions. On the other hand, witnessing a removal explanation may not be a strong enough incentive for bystanders to ensure compliance with all community guidelines in their next submissions.

It is possible that witnessing explanation messages educates bystanders about the violated community norm specific to the corresponding removed post and leads them to avoid the same violation in the future, yet they continue violating other community norms. While beyond the scope of the current paper, a more granular analysis could examine whether norm-specific learning occurs through removal explanations among bystanders. Besides, prior research has shown that users often respond to moderation by changing their deviant posting activities to circumvent restrictions, especially when the moderation is automated and reliant on detecting specific keywords [4, 23]. Thus, removal explanations may offer users clues on how to avoid moderation, thereby depicting a paradox of enacting algorithmic transparency [26]. Therefore, beyond focusing on post removals, it is important to qualitatively evaluate the extent to which removal explanations prompt users to sincerely engage in adhering to the community’s expectations.

5.1.4 Design Implications.

This work bears design implications regarding the positive impacts of enacting transparency in online content moderation. The empirical evidence presented here informs community managers to put more effort into providing explanations for sanctions, and more importantly, make these explanations publicly visible, so that they can educate bystanders. While content moderation actions have proliferated to align with the growing scale of online communities, providing explanations is still not as prevalent. For instance, to conduct this study, we originally started with four large subreddits—we had also collected over ∼ 2M posts from r/politics (8.4M users) and r/technology (15M users). However, despite being large subreddits with many moderators, neither of these communities provided any post-removal explanations (which also prevented us from including their data in our analyses). Prior work has noted challenges in providing explanations in all instances, such as moderator fatigue and limitations of automated moderation tools [20, 22]. Many platforms may lack resources to provide moderation explanations. However, with the advent of generative AI and large-language model-based technologies, it would be interesting to explore the design space of curating automated explanation messages through these emerging technologies. Given that user attention is a limited resource [34], platforms must also negotiate the extent to which norm education through removal explanations intended for others be prioritized in the content shown to the users.

Besides, more research is needed to develop best practices for designing removal explanations in response to specific norm violations and other contextual details. The computational framework of our study can be extended to delineate the effects of different features of explanation messages, e.g., explanation length, politeness level, clarifying future graduated sanctions, including face-saving mechanisms [34]. The results of such analyses can inform platform owners and community managers about the suitability of different explanation types. Given the inherent connection of explanations to community guidelines, these efforts could also inform the latter’s design. On Reddit, explanations are made publicly visible through a stickied comment on the removed post. The visibility of such explanations can be further enhanced by sending notifications about them to users engaged in the sanctioned discussion threads.

5.2 Limitations and Future Directions

Our study focused on two large subreddits, so, our results are most readily applicable to other subreddits of similar size. Future analyses would benefit from investigating the circumstances under which these results replicate (or do not) on other platforms and communities. The computational framework we have presented here should help such inquiries. Prior similar efforts on developing extendable computational frameworks for evaluating moderation actions have similarly used data from a limited number of samples [5, 6, 21, 69].

For this work, we initially planned a comparative analysis of the effects of human v/s bot explanations on bystanders. However, our data showed that all r/AskReddit explanations were provided by bots and all r/science explanations by human moderators during the treatment period. Therefore, we could not conduct our planned comparative analysis for either community. Future work should explore how AI-generated explanations compare to human-offered explanations in influencing bystanders’ behavior, extending similar inquiries in prior research [22]. Additionally, it would be fruitful to investigate how explanation messages shape other aspects of bystanders’ behavior, e.g., their use of language and how other community members respond to them.

Our analysis does not consider the in-situ practical concerns and constraints under which content moderators work [43, 45]. Therefore, studies that examine how moderators draft, choose, and submit explanation messages, and help create tools that can make the workflow easier would empower moderators to send explanation messages at a higher frequency.

Our data collection constitutes Treated users by including everyone who commented in the discussion thread regardless of when they commented vis-a-vis the explanation message timestamp. This choice was inspired by our observation that Reddit users can access their posting history and may track the discussion long after their comment, especially since Reddit sends users notifications about posting activity in the threads where they contribute. Since explanation comments are highlighted at the top of the thread regardless of upvotes and posting time, exposure to them is likely for everyone viewing the thread. Still, some users might leave the discussion before the explanation message was posted and never returned. Further, some might have stumbled upon the thread and viewed the explanation message but never commented on the thread. Therefore, our measure of exposure to explanations is limited by our data access. Future research can measure this exposure more precisely by tracking users’ passive consumption of explanation messages.

6 Conclusion

Transparency in communications is a key concern for moderated users [23, 25, 67]. On the other hand, secretiveness about moderation decisions triggers speculation among users who suspect potential biases [19, 23, 74]. In this paper, we focus on one important mode of enacting greater transparency in moderation decisions: publicly visible messaging by moderators that reveals the reasons behind submission removals. Our analysis shows that witnessing such messages significantly boosts the posting and interactivity levels of bystanders. This suggests that adopting an educational approach to content moderation, as opposed to a strictly punitive one, can lead to enhanced community outcomes.

Acknowledgments

We thank Anish Gupta, Gayathri Ravipati, and Navreet Kaur for their research assistance on this project.

Footnotes

https://www.reddit.com/r/AskReddit/wiki/index

Removal explanations are usually stickied, i.e., locked to appear as the top comment in the discussion thread.

Supplemental Material

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

References

[1]

Reuben Binns, Michael Veale, Max Van Kleek, and Nigel Shadbolt. 2017. Like trainer, like bot? Inheritance of bias in algorithmic content moderation. In International Conference on Social Informatics. Springer, 405–415.

Abstract

1 Introduction

2 Background and Related Work

2.1 Transparency in Content Moderation

2.2 Removal Explanations and Bystanders to Norm Violations

3 Data and Methods

3.1 Study Design and Rationale

3.2 Choice of Subreddits

3.3 Defining Treated and Control Users

3.4 Gathering Post Removal Explanations Data

3.5 Matching for Causal-Inference

3.5.1 Covariates for Matching.

3.5.2 Stratified Propensity Score Matching.

3.6 Measuring Treatment Effects

4 Results

5 Discussion

5.1 Implications

5.1.1 Removal Explanations Help Boost Posting Frequency.

5.1.2 Removal explanations help increase community engagement.

5.1.3 Removal explanations do not impact post removals.

5.1.4 Design Implications.

5.2 Limitations and Future Directions

6 Conclusion

Acknowledgments

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

Does Transparency in Moderation Really Matter?: User Behavior After Content Removal Explanations on Reddit

Through the Looking Glass: Study of Transparency in Reddit's Moderation Practices

Disproportionate Removals and Differing Content Moderation Experiences for Conservative, Transgender, and Black Social Media Users: Marginalization and Moderation Gray Areas

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations