Twitter is testing out a new feature to reduce abuse, troll, or any other abusive content to keep the platform safe and clean. The feature will prompt users to pause and reconsider any reply to a tweet if the AI detects it as “harmful language.” The social media company has said it is rolling improved versions of the prompts to English-language users on both iOS and Android.
Although these are not a sure shot way to stop hate comments, these small nudges and psychological tricks often help in changing minds. Studies have indicated that introducing a nudge like this can lead people to edit and cancel posts they would have otherwise regretted.
Tests conducted by Twitter have also found that about 34% of people revised their initial reply after seeing the prompt or chose not to send the reply at all. On average, after being prompted once, people then composed 11% fewer offensive replies in the future. This indicates that the prompts really do work in reducing any bad behavior online.
However, Twitter’s early algorithms have failed to detect any foul language correctly. But it has been improving the system in the backend and is now ready for improved detection. Twitter will also take the relationship between the author and replier into consideration. That is, if both follow and reply to each other often, it’s more likely they have a better understanding of the preferred tone of communication than someone else who doesn’t.