Supervised Event Coding from Text Written in Arabic: Introducing Hadath

Javier Osorio, Alejandro Reyes, Alejandro Beltrán, Atal Ahmadzai


Abstract
This article introduces Hadath, a supervised protocol for coding event data from text written in Arabic. Hadath contributes to recent efforts in advancing multi-language event coding using computer-based solutions. In this application, we focus on extracting event data about the conflict in Afghanistan from 2008 to 2018 using Arabic information sources. The implementation relies first on a Machine Learning algorithm to classify news stories relevant to the Afghan conflict. Then, using Hadath, we implement the Natural Language Processing component for event coding from Arabic script. The output database contains daily geo-referenced information at the district level on who did what to whom, when and where in the Afghan conflict. The data helps to identify trends in the dynamics of violence, the provision of governance, and traditional conflict resolution in Afghanistan for different actors over time and across space.
Anthology ID:
2020.aespen-1.9
Volume:
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Ali Hürriyetoğlu, Erdem Yörük, Vanni Zavarella, Hristo Tanev
Venue:
AESPEN
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
49–56
Language:
English
URL:
https://aclanthology.org/2020.aespen-1.9
DOI:
Bibkey:
Cite (ACL):
Javier Osorio, Alejandro Reyes, Alejandro Beltrán, and Atal Ahmadzai. 2020. Supervised Event Coding from Text Written in Arabic: Introducing Hadath. In Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020, pages 49–56, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Supervised Event Coding from Text Written in Arabic: Introducing Hadath (Osorio et al., AESPEN 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aespen-1.9.pdf