Research has found that abusive posts are not as likely to be detected if they include emojis.
A new study from the University of Oxford suggests that emoji-based hate is proving a key emerging challenge for the detection of online hate.
According to the Oxford Internet Institute, abusive posts can be missed and suitable ones misidentified as abusive.
The findings are based on the institute’s HatemojiCheck, a test suite of 3,930 short-form statements that enable the evaluation of how detection models perform on hateful language expressed with emojis.
By using the test suite, it was able to expose weaknesses in existing hate detection models.
It said detecting online hate is a complex task and that low-performing detection models have “harmful consequences” when used for sensitive applications such as content moderation.
To address these weaknesses, the researchers’ dataset uses an innovative human-and-model-in-the-loop approach.
Models trained on these 5,912 adversarial examples performed substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate.
Recent Stories