Second Biomedical Abbreviation Recognition and Resolution track

Event Notification Type: 
Call for Participation
Abbreviated Title: 
BARR2
Location: 
Tuesday, 18 September 2018
State: 
Andalucia
Country: 
Spain
Contact Email: 
City: 
Seville
Contact: 
krallinger.martin@gmail.com
montserrat.marimon@gmail.com
Submission Deadline: 
Sunday, 20 May 2018

Overview

The recognition and resolution of abbreviations, acronyms and symbols is a critical step for a number of tasks including named entity recognition (NER), machine translation, information retrieval/indexing and document categorization among others. Therefore the implementation and availability of abbreviation recognition systems is of great practical impact for text mining and language processing.

In case of domains such as biomedicine and clinical research, abbreviations are particularly frequent, often referring to entities and concepts of importance such as genes, diseases, symptoms, drugs/chemicals or treatments. NER, relation extraction and clinical document coding systems usually need to cope with recognizing correctly short forms or abbreviations.

Abbreviations can be regarded as a ShortForm (SF) that denotes a longer word or phrase (LongForm, LF), typically its definition. Different strategies have been tested to detect short forms in English biomedical texts (Torii et al., 2007), using for instance alignment-based approaches, machine learning methods or rule-based strategies and some manually annotated corpora do exist (e.g. MEDSTRACT, Ab3P or BOADI, see Islamaj Doğan et al., 2014). Far less effort has been made to detect short form- long form pairs in text written in other languages.

There is a growing number of biomedical and clinical documents written in Spanish, such as medical literature, medical agency reports, patents and particularly electronic health records. Moreover, according to some estimates there are over 500 million Spanish speakers worldwide.

As part of the IBEREVAL 2018 (http://cabrillo.lsi.uned.es/n) initiative we have proposed the Second Biomedical Abbreviation Recognition and Resolution (BARR2) track with the aim promoting the development and evaluation of biomedical abbreviation identification systems.

While the previous BARR track (http://temu.bsc.es/BARR) was focused on biomedical literature in Spanish, the BARR2 track has the aim to promote the development and evaluation of clinical abbreviation identification systems by providing Gold Standard training and test corpora manually annotated by domain experts with abbreviation-definition pairs within abstracts of clinical texts and clinical case studies written in Spanish.

The results of the previous BARR were published in Intxaurrondo et al. (2017): The Biomedical Abbreviation Recognition and Resolution (BARR) track: benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts, accessible online at http://ceur-ws.org/Vol-1881/

The BARR2 track will be structured into two sub-tasks, namely:

Sub-track 1: asking participating teams to provide systems able to detect only explicit occurrences of abbreviation-definition pairs

Sub-track 2: provide resolution of short forms regardless whether its definitions is mentioned within the actual document

In order to carry out these tasks we will release the BARR2 corpus, consisting in a manually labeled collection of Spanish abstracts of clinical texts and clinical case studies constructed using a customized version of AnnotateIt, BRAT as well as using the Markyt annotation system. The BARR2 corpus will be structured into a training and a test set, each manually labeled with their corresponding offset annotations by a team of domain experts. The primary evaluation metric used for the BARR2 track will consist in precision, recall, y f-score of the predictions against manual gold standard. A larger background set of additional automatically labeled Spanish clinical case reports will be released together with the BARR2 corpus.

Additional details, sample sets, FAQ and inscription details can be found at: BARR2 track URL: http://temu.bsc.es/BARR2

Important Dates

15th April 2018: Training corpus available
30th April 2018: Test corpus available
20th May 2018: Submission of the results
25th May 2018: Publication of results
11th June 2018: Working notes submission
1st July 2018: Release of the working notes reviews
15th July 2018: Camera ready paper submission
18 September 2018: Workshop IberEval 2018

BARR2 track organizers

Martin Krallinger, Biological Text Mining Unit (Bio-TeMUC), CNIO, Spain
Alfonso Valencia, Structural Computational Biology Group, CNIO, Spain
Nuria Bel, UPS, Barcelona, Spain
Ander Intxaurrondo, Biological Text Mining Unit (Bio-TeMUC), CNIO, Spain
Marta Villegas, Barcelona Supercomputing Center (Bio-TeMUC), CNIO, Spain
Jose Antonio Lopez, Hospital 12 de Octubre, Madrid
Montserrat Marimon, Barcelona Supercomputing Center (Bio-TeMUC), Spain
Aitor Gonzalez-Agirre, Barcelona Supercomputing Center (Bio-TeMUC), CNIO, Spain