Shared Task on Low-level NLP Tools for Magahi and Bhojpuri

Event Notification Type: 
Call for Papers
Abbreviated Title: 
Location: 
University of Trento
Wednesday, 11 September 2019 to Thursday, 12 September 2019
State: 
Country: 
Italy
Contact Email: 
City: 
Trento
Contact: 
Submission Deadline: 
Sunday, 30 June 2019

Apologies for cross-posting. You are requested to please circulate it for wider publicity
....................................................................................................
Shared Task on Low-level NLP Tools for Magahi and Bhojpuri

----------------------------------------------------------------------------
Workshop Date: 11-12 September 2019
Venue: Department of Information Engineering and Computer Science -DISI, University of Trento, Italy (Organized under Workshop on NLP Solutions for Under Resourced Languages (11-12 September 2019)

Website:
Main website - http://nsurl.org/
Shared Task for Magahi: http://nsurl.org/tasks/task-9-low-level-nlp-tools-for-magahi-language/
Shared Task for Bhojpuri: http://nsurl.org/tasks/task-10-low-level-nlp-tools-for-bhojpuri-language/
Registration Link: https://docs.google.com/forms/d/1UHzelcIpitpD4njv3hTIkwRiOAye5Fzkz_aCeJi...
Submit papers on - https://easychair.org/conferences/?conf=nsurl2019

-----------------------------------------------------------------------------

Task Description
=================
The task is to develop low-level NLP tools for Magahi and Bhojpuri. Both Magahi and Bhojpuri are Eastern Indo-Aryan languages spoken largely in the Eastern states of Bihar, Jharkhand and Uttar Pradesh in India. These languages are part of what is considered a dialect continuum running the Eastern part of India to its Weatern part and consisting of approximately 50 languages / varieties. Hindi, the official language of India, is part of the same continuum and as such these are closely related to each other. However, despite this similarity, these languages have large divergences in terms of lexicon as well as morphological make-up. As such most of the tools developed for Hindi do not perform very well with the other languages. For this task, we are providing small annotated datasets for Magahi and Bhojpuri in order to develop part-of-speech tagger and morphological analyser for these languages. The dataset is annotated with the part of speech categories and morphological features from Universal Dependencies tagset.

Sub tasks
===========
The task 9 and 10 has 2 sub-tasks -
A. POS tagger for each language
b. Number, Gender, Person, Tense, Aspect, Honorificity and Case relation analyser for each language

Data
======
We will provide 5,000 annotated sentences (in CONLL-U format) for each of the 2 languages. In addition to this, participants are also encouraged to use the Hindi dataset available with Universal Dependencies project. Additionally they are free to use any other dataset as long as the dataset is freely available for research.

Evaluation Procedure
=======================
The standard evaluation metrics for evaluating and ranking the teams will be macro-averaged F1 scores.

Baseline
=========
The simple probabilistic baseline (the most frequent tags get assigned to each token) will be provided by the organisers.

Important Dates
================
Training dataset will be made available by 15th April, 2019. Other deadlines are as per the workshop schedule.

Results
=========
Results will be made available as per the workshop schedule

Paper submission
==================
Paper submission instructions will be same as for the workshop

Task Organizers
================
If you have any queries regarding this task, please refer to the task organizers:

Ritesh Kumar, Dr. B.R. University Agra, India
Atul Kumar Ojha, Panlingua Language Processing LLP and JNU, New Delhi, India

Regarding Query:

If you have any queries regarding this task, please send an email at nlp-lrl@googlegroups.com