SemEval 2014 Task 7 - Analysis of Clinical Text
The purpose of this task is to enhance current research in natural language processing methods used in the clinical domain.
The aim of the task is to identify entities in the clinical domain and to map entities to UMLS CUIs (Concept Unique Identifiers). In this task, the focus will be to identify and disambiguate disorder mentions.
The task is a continuation of the CLEF/eHealth ShARe 2013 Shared Task. Significant additional annotations will be provided for subtasks A and B with the aim of correcting any existing errors and creating additional data to address sparsity issues.
This includes the recognition of mentions of concepts that belong to the UMLS semantic group disorders.
Here are a few examples—more are provided in the annotation guidelines and in the page on Task website (under Datasets).
- The rhythm appears to be atrial fibrillation.
- The left atrium is moderately dilated.
- 53 year old man s/p fall from ladder.
In examples 1. and 3., the phrases atrial fibrillation and fall from ladder fall in the disorder semantic group in the UMLS. Example 2. is a case of discontigous mentions represented by left atrium...dialated. This phenomenon where a discontiguous phrase is the best representative of the disorder occurs more commonly in the clinical domain than in the general domain, and therefore is annotated as such.
This task involves the mapping of each disorder mention to a unique UMLS CUI. This is referred to as the task of normalization and the mapping is limited to UMLS CUIs of SNOMED codes.
The disorder entities in the Considering examples above map to the following CUIs:
- atrial fibrillation - C0004238; UMLS preferred term atrial fibrillation
- left atrium...dilated - C0344720; UMLS preferred term left atrial dilatation
- fall from ladder - C0337212; UMLS preferred term is accidental fall from ladder
The following tarball contains trial data along with their annotations:
Access to full training data will require Data User Agreement (DUA). Details are provided in the task website under "Data and Tools" tab.
Participants are free to participate in one or both tasks.
Trial data ready October 31, 2013
Training data ready December 15, 2013
Evaluation period March 15-30, 2014
Paper submission due April 30, 2014 [TBC]
SemEval workshop August 23-24, 2014, co-located with COLING and *SEM in Dublin, Ireland.
The Semeval-2014 Task 7 website includes details on the training data, evaluation, and examples of the comparison types:
Sameer S. Pradhan, Harvard University
Suresh Manandhar, University of York, UK
Wendy W. Chapman, University of Utah
Noemie Elhadad, Columbia University
Guergana K. Savova, Harvard University