ArabicNLU 2024 Shared Task

Event Notification Type:

Call for Papers

Abbreviated Title:

2024 Shared Task

Location:

Bangkok

Sunday, 11 August 2024

State:

Country:

Thailand

Contact Email:

mustafajarrar@gmail.com

City:

Bangkok

Contact:

Mustafa Jarrar

Website:

https://sina.birzeit.edu/nlu_sharedtask2024/

Submission Deadline:

Wednesday, 10 April 2024

ArabicNLU 2024 Shared Task
(at ArabicNLP - co-located with ACL 2024)
https://sina.birzeit.edu/nlu_sharedtask2024/

Subtask 1: Word Sense Disambiguation (WSD)
================================================
WSD aims to disambiguate a word's semantics. Given a context (i.e., sentence), a target word in the context, and a set of candidate senses (i.e., glosses or definitions) for the target word, the goal of the WSD task is to determine which of these senses is the intended meaning for the target word.

Subtask 2: Location Mention Disambiguation (LMD)
================================================
Offering the LMD in a separate task supports developing precise models capable of accurately resolving Location Mentions (LMs) within microblogs and linking them to toponyms in geo-positioning databases. LMD represents challenging retrieval and classification problems such as the lack of context, toponymic polysemy, toponymic homonymous, to name a few.

Datasets
============
SALMA is our WSD data and it includes 1,440 sentences and 34K tokens (8,760 unique tokens with 3,875 unique lemmas). All tokens are sense-annotated manually, with a total of 4,151 senses. More details can be found in our article.

IDRISI-DA is the first Arabic LMD dataset, which encompasses 2,869 posts from diverse dialects, featuring 3,893 locations. More details on the LMD dataset can be found in our article.

Google Colab Notebooks
========================
WSD Colab: To allow you to experiment with the baseline, we authored a Google Colab notebook that demonstrates how to load and evaluate the data. The notebook demonstrates how to load the data and how to use our WSD system to predict the sense for the target words.
https://drive.google.com/file/u/0/d/11ZFqz5rZ9WRv8sBGbSQHBYpbJJfct4t_/edit

LMD Colab: A notebook was created to demonstrate how to pull the data from Hugging Face, call OSM API to retrieve candidate toponyms, and evaluate the OSM.
https://colab.research.google.com/gist/mohammedkhalilia/2b682b67f33c922f...

Key Dates:
============
· March 15, 2024: Data-sharing and Evaluation on Development Set Available
· April 10, 2024: Shared Task Registration Deadline
· April 22, 2024: Evaluation on Test Set (TEST) Deadline
· May 17, 2024: Shared Task System Paper Submission Due

Organizers:
============
ArabicNLU 2024 Shared Task
(at ArabicNLP - co-located with ACL 2024)
https://sina.birzeit.edu/nlu_sharedtask2024/

Subtask 1: Word Sense Disambiguation (WSD)
================================================
WSD aims to disambiguate a word's semantics. Given a context (i.e., sentence), a target word in the context, and a set of candidate senses (i.e., glosses or definitions) for the target word, the goal of the WSD task is to determine which of these senses is the intended meaning for the target word.

Subtask 2: Location Mention Disambiguation (LMD)
================================================
Offering the LMD in a separate task supports developing precise models capable of accurately resolving Location Mentions (LMs) within microblogs and linking them to toponyms in geo-positioning databases. LMD represents challenging retrieval and classification problems such as the lack of context, toponymic polysemy, toponymic homonymous, to name a few.

Datasets
============
SALMA is our WSD data and it includes 1,440 sentences and 34K tokens (8,760 unique tokens with 3,875 unique lemmas). All tokens are sense-annotated manually, with a total of 4,151 senses. More details can be found in our article.

IDRISI-DA is the first Arabic LMD dataset, which encompasses 2,869 posts from diverse dialects, featuring 3,893 locations. More details on the LMD dataset can be found in our article.

Google Colab Notebooks
========================
WSD Colab: To allow you to experiment with the baseline, we authored a Google Colab notebook that demonstrates how to load and evaluate the data. The notebook demonstrates how to load the data and how to use our WSD system to predict the sense for the target words.
https://drive.google.com/file/u/0/d/11ZFqz5rZ9WRv8sBGbSQHBYpbJJfct4t_/edit

LMD Colab: A notebook was created to demonstrate how to pull the data from Hugging Face, call OSM API to retrieve candidate toponyms, and evaluate the OSM.
https://colab.research.google.com/gist/mohammedkhalilia/2b682b67f33c922f...

Key Dates:
============
· March 15, 2024: Data-sharing and Evaluation on Development Set Available
· April 10, 2024: Shared Task Registration Deadline
· April 22, 2024: Evaluation on Test Set (TEST) Deadline
· May 17, 2024: Shared Task System Paper Submission Due

Organizers:
============
- Mohammed Khalilia, Qualtrics/Birzeit University, USA (Contact Person)
- Imed Zitouni, Google, USA
- Mustafa Jarrar, Birzeit University , Palestine
- Tamer Elsayed, Qatar University, Qatar
- Sanad Malaysha, Birzeit University , Palestine
- Ala’ Jabari, Birzeit University , Palestine
- Reem Suwaileh, Hamad Bin Khalifa University, Qatar

Contact
For any questions related to this task, please contact the organizers directly using the following email address: NLUSharedtask2024 [at] gmail.com.

Menu

Latest Events

Menu

ArabicNLU 2024 Shared Task

User login

Latest Events