Twitter Will Flag 'Offensive' Tweets Before You Send Them

Twitter Will Flag ‘Offensive’ Tweets Before You Send Them

Users will now see a prompt asking if they want to review their “potentially harmful” tweets before sending them

Twitter will now ask users to think twice before sending mean tweets, the company said on Wednesday.

Starting immediately, Twitter will now hit users with a prompt that encourages them “to pause and reconsider a potentially harmful or offensive reply before they hit send.”

What does Twitter consider “potentially harmful and offensive?” That’s not really clear. In its blog post announcing the decision, the company didn’t share any examples of words or phrases that would catch the attention of its internal censors, and company representative declined to share any examples with TheWrap.

Twitter did, however, share a few details about what it’s incorporated into the system that determines when to send a prompt, which you can find below:

Consideration of the nature of the relationship between the author and replier, including how often they interact. For example, if two accounts follow and reply to each other often, there’s a higher likelihood that they have a better understanding of preferred tone of communication.
Adjustments to our technology to better account for situations in which language may be reclaimed by underrepresented communities and used in non-harmful ways.
Improvement to our technology to more accurately detect strong language, including profanity.
Created an easier way for people to let us know if they found the prompt helpful or relevant.

The San Francisco-based company also shared an example of the prompt:

Twitter said the feature has shown promising results in testing; 34% of users revised or decided to scrap a tweet altogether when prompted, and on average, users sent out 11% fewer “offensive” replies after being hit with the prompt for the first time.

If this sounds a bit familiar, that’s because Twitter has leaned on similar measures in recent years to improve what CEO Jack Dorsey calls the “health” of conversations on the app. For example, Twitter started hiding rude tweets at the bottom of reply threads back in 2018. But that apparently wasn’t enough to weed out mean comments on the platform.