What It Is

An AI-powered intervention that prompts users to revise their comment before posting if it receives a high toxicity score. The intervention is most often powered by Jigsaw's Perspective API which rates a comment's toxicity.

Civic Signal Being Amplified

Welcome

Ensure user safety

When To Use It

What Is Its Intended Impact

This intervention reduces the number of toxic comments posted. It is particularly geared towards "road rage" style comments, in which an otherwise genuine user is immediately in the heat of the moment.

Evidence That It Works

A private study by OpenWeb (2020) on their own platform, found that about half of the users either revise their comment or decide not to post it when prompted that their comment may be inflammatory. They also observed a 12.5% increase in civil and thoughtful comments being posted overall. The authors observed that the intervention led to healthier conversations and opportunities for well-intentioned users to participate, which also boosted loyalty and overall community health.

In another study, a randomized controlled experiment conducted on Twitter (Katsaros et al., 2021), the authors found that users who received the intervention posted 6% fewer offensive Tweets than non-prompted users in the control. The decrease in offensive content was not just due to the deletion and revision of prompted Tweets, but also to a decrease in recidivism and the number of offensive replies to the prompted Tweets. They concluded that interventions allowing users to reconsider their comments can be an effective mechanism for reducing offensive content online.

Why It Matters

Most of the edits in response to the prompt were done in good faith, suggesting that users are generally well intentioned and open to positive change, needing only to be made mindful of the potential harmfulness of their comment when they may lack better judgment in the spur of the moment.

Special Considerations

While most edits in response to the prompt were done in good faith, there can be backlash and attempts to circumvent the intervention. In one study (Katsaros et al., 2021), for example, in 3% of cases in which the intervention was used, users edited their posts to add even more slurs, attacks, or profanity compared to what they originally intended to post.

And, as any API that rates the toxicity of comments is human-written, it will naturally carry the perspectives and biases of its creators.