2.1 Transparency in Content Moderation
Moderation systems on social media platforms are designed to regulate inappropriate user behaviors and often impose measures such as removing content, muting, or banning offenders [
11,
15]. These measures are implemented by content moderators, who may either be volunteers among the platform’s user base or commercial content moderators hired by the platform [
50,
64]. More recently, AI-driven tools have been used to assist in moderation processes [
1,
16,
27,
28,
33]. We focus here on transparency in end-users’ experience with moderation processes.
Transparency implies opening up “the working procedures not immediately visible to those not directly involved to demonstrate the good working of an institution” [
47].
We situate our work within a line of research that examines the impact of content moderation on end-users. Scholars have investigated the impact of both user-level [
19,
21,
63,
73] and community-wide sanctions [
5,
6]. This has included studies using a variety of methods, such as interviews [
27], design workshops [
72], surveys [
19,
74], and log analyses [
5,
6,
21]. Prior work has also highlighted how offering moderation explanations benefits sanctioned users [
19,
22]. We focus on end-users who witness, although they are not directly affected by, the moderation sanctions. By doing so, we contribute to building a theory [
34] that prescribes to community managers which moderation interventions should be deployed, under what circumstances, and with what expected outcomes.
In examining the complexities of enacting content moderation, researchers have identified several issues regarding transparency in the procedures followed by platforms when applying punitive measures [
40]. First, the criteria of inappropriate content might not be well-established before moderation decisions are made [
62]. Legal experts have raised concerns that despite social media platforms publicly sharing their content policies, they often fail to adequately consider the contextual factors surrounding the content, such as its localized meaning and the identities of the speakers and audiences, when evaluating its appropriateness [
75]. Second, there are inter-platform differences in how norm violations are conceptualized. For example, an HCI study comparing the content policies of 15 platforms found a lack of consensus in defining what qualifies as online harassment and how forcefully content deemed as harassment should be moderated [
49]. Consequently, when these vague content policies are implemented for content regulation, it can lead to ambiguity in resolving moderation cases [
75]. Finally, and most pertinent to our study, communication with end-users on moderation decisions is often found to be deficient in details [
67,
74].
2.2 Removal Explanations and Bystanders to Norm Violations
Prior research has emphasized the significance of incorporating moderation notifications and explanations into the design of moderation systems [
22,
36,
38,
71]. For example, researchers have shown that when Facebook and Reddit platforms do not inform users about their content removal [
67], users question which platform policy they have violated [
19,
74]. Besides removal notification, users desire a justification for why their posts got removed, deeming it a significant factor in their perception of moderation fairness [
19]. Users also express dissatisfaction with the inconsistent punishments meted out to them versus others, leading them to request explanations further [
39,
71]. Many studies have empirically shown the benefits of offering removal explanations in improving the behavior of moderated users [
19,
22,
70]. For example, Tyler et al. found that users who were provided education about platform rules in the week following their post removal were less likely to post new violating content [
70]. We extend this research by investigating the utility of explanations in influencing the behavior of bystanders.
Curiously, Reddit moderators offer explanations publicly by commenting on the removed submission. While this is not the sole communication mode—indeed, many moderators privately message users to inform them about moderation [
22,
54]—prior research has argued that public explanations serve to enhance broader transparency efforts [
19,
22]. On Reddit, users already engaging with a post retain access to it even after it is removed from the main subreddit; in this sense, removed submissions are not really
removed, just hidden from the public view. By publicly explaining the reason behind post removal, explanation comments serve users who stumble upon it or are already engaged.
We extend prior inquiries into using Deterrence Theory [
66] to evaluate the impact of punishments on deterring inappropriate behaviors online [
12,
63]. Deterrence Theory makes a distinction between general and specific deterrence—specific deterrence refers to the effect of punitive measures on individuals subjected to them. In contrast, general deterrence pertains to the impact of the potential threat of such measures on uninvolved observers. By focusing on bystanders, we examine the effects of generalized deterrence in shaping user behavior. Seering et al. showed that banning any type of behavior on Twitch significantly reduced the frequency of that behavior in subsequent messages posted by others [
63]. Building upon this, we examine whether clarifying which aspects of submissions prompt sanctions via explanation messages influences observers’ subsequent actions.
Encouraging voluntary compliance with behavioral norms in a community requires that community members know the norms and be aware of them when being active within the community. Kiesler et al. [
34] argue that people learn the community norms in three ways: (1) observing other people’s behavior and its consequences, (2) seeing codes of conduct, and (3) behaving and directly receiving feedback. Prior research has demonstrated the importance of users seeing codes of conduct [
42] and directly receiving feedback in improving their subsequent behavior [
22,
70]. We focus here on establishing the utility of bystanders observing other people’s norm violations and the resulting consequences.
In terms of reducing the posting of norm-violating content, some research has focused on the roles bystanders can play in the context of online harassment. Blackwell et al. found that labeling a variety of technology-enabled abusive experiences as ‘online harassment’ helps bystanders
understand the breadth and depth of this problem [
3]. Further, designs that motivate bystander intervention discourage harassment through normative enforcement [
2]. Taylor et al. [
68] additionally found that design solutions that encourage empathy and accountability can promote bystander intervention in cyberbullying. Extending this line of research to a broader range of norm violations, we analyze how bystanders are affected by their exposure to post-removal explanations.