Detecting hateful content with AI is difficult -- and it’s even more difficult when the content is multimodal, such as a meme. Memes can be understood by humans because we do not think about the words and photos independently but, instead, combine the two together. In contrast, most AI systems analyze text and image separately and do not learn a joint representation. This is both inefficient and flawed, and such systems are likely to fail when a non-hateful image is combined with non-hateful text to produce content that is nonetheless still hateful. For AI to detect this sort of hate it must learn to understand content the way that people do: holistically.
To accelerate research on multimodal understanding and detection of hate speech, Facebook AI created the hateful memes challenge in 2020, and released a dataset containing 10,000+ annotated memes. We now present this dataset for the WOAH 5 Shared task with additional newly created fine-grained labels for the protected category that has been attacked (e.g., women, black people, immigrants) as well as the type of attack (e.g., inciting violence, dehumanizing, mocking the group).
Tasks
Task A (multi-label): For each meme, detect the protected category. Protected categories are: race, disability, religion, nationality, sex. If the meme is not_hateful the protected category is: pc_empty.
Task B (multi-label): For each meme, detect the attack type. Attack types are: Attack types are: contempt, mocking, inferiority, slurs, exclusion, dehumanizing, inciting_violence. If the meme is not_hateful the protected category is: attack_empty.
Tasks A and B are multi-label because memes can contain attacks against multiple protected categories and can involve multiple attack types.