Difference between revisions of "Temporal Information Extraction (State of the art)"
Jump to navigation
Jump to search
(Adds system scores for Clinical TempEval event expression task) |
(Adds system scores for Clinical TempEval temporal relation task) |
||
Line 492: | Line 492: | ||
====Temporal relations==== | ====Temporal relations==== | ||
+ | The table shows the best result for each system. Lower scoring runs for the same system are not shown. | ||
+ | {| width="100%" class="wikitable sortable" | ||
+ | |- | ||
+ | ! rowspan="2" | System name (best run) | ||
+ | ! rowspan="2" | Short description | ||
+ | ! rowspan="2" | Main publication | ||
+ | ! colspan="3" | To Document Time | ||
+ | ! colspan="6" | Narrative Containers | ||
+ | ! rowspan="2" | Software | ||
+ | ! rowspan="2" | License | ||
+ | |- | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | ! P | ||
+ | ! R | ||
+ | ! F1 | ||
+ | |- | ||
+ | ! colspan="14" | Phase 1: text only | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | - | ||
+ | | 0.600 | ||
+ | | 0.555 | ||
+ | | 0.577 | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | Baseline | ||
+ | | TIMEX3 to closest EVENT | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.368 | ||
+ | | 0.061 | ||
+ | | 0.104 | ||
+ | | 0.400 | ||
+ | | 0.061 | ||
+ | | 0.106 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 2 | ||
+ | | Supervised machine learning | ||
+ | | - | ||
+ | | 0.712 | ||
+ | | 0.693 | ||
+ | | 0.702 | ||
+ | | 0.080 | ||
+ | | 0.142 | ||
+ | | 0.102 | ||
+ | | 0.094 | ||
+ | | 0.179 | ||
+ | | 0.123 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | ! colspan="14" | Phase 2: manual EVENTs and TIMEX3s | ||
+ | |- | ||
+ | | Baseline | ||
+ | | Memorize | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.608 | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | Baseline | ||
+ | | TIMEX3 to closest EVENT | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.433 | ||
+ | | 0.162 | ||
+ | | 0.235 | ||
+ | | 0.469 | ||
+ | | 0.162 | ||
+ | | 0.240 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | | BluLab: run 2 | ||
+ | | Supervised machine learning | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | 0.791 | ||
+ | | 0.109 | ||
+ | | 0.210 | ||
+ | | 0.143 | ||
+ | | 0.140 | ||
+ | | 0.254 | ||
+ | | 0.181 | ||
+ | | - | ||
+ | | - | ||
+ | |- | ||
+ | |} | ||
==References== | ==References== |
Revision as of 12:02, 30 March 2015
TempEval 2007
- TempEval, Temporal Relation Identification, 2007: web page
TempEval 2010
- TempEval-2, Evaluating Events, Time Expressions, and Temporal Relations, 2010: web page
TempEval 2013
- TempEval-3, Evaluating Time Expressions, Events, and Temporal Relations, 2013: web page
Performance measures
Results
Task A: Temporal expression extraction and normalisation
The table shows the best result for each system. Different runs per system are not shown.
System name (best run) | Short description | Main publication | Identification | Normalisation | Overall score | Software | License | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Strict matching | Lenient matching | Accuracy | |||||||||||
Pre. | Rec. | F1 | Pre. | Rec. | F1 | Type | Value | ||||||
HeidelTime (t) | rule-based | Stro ̈tgen et al., 2013 | 83.85 | 78.99 | 81.34 | 93.08 | 87.68 | 90.30 | 90.91 | 85.95 | 77.61 | Download | GNU GPL v3 |
NavyTime (1,2) | rule-based | Chambers, 2013 | 78.72 | 80.43 | 79.57 | 89.36 | 91.30 | 90.32 | 88.90 | 78.58 | 70.97 | - | - |
ManTIME (4) | CRF, probabilistic post-processing pipeline, rule-based normaliser | Filannino et al., 2013 | 78.86 | 70.29 | 74.33 | 95.12 | 84.78 | 89.66 | 86.31 | 76.92 | 68.97 | Demo & Download | GNU GPL v2 |
SUTime | deterministic rule-based | Chang et al., 2013 | 78.72 | 80.43 | 79.57 | 89.36 | 91.30 | 90.32 | 88.90 | 74.60 | 67.38 | Demo & Download | GNU GPL v2 |
ATT (2) | MaxEnt, third party normalisers | Jung et al., 2013 | 90.57 | 69.57 | 78.69 | 98.11 | 75.36 | 85.25 | 91.34 | 76.91 | 65.57 | - | - |
ClearTK (1,2) | SVM, Logistic Regression, third party normaliser | Bethard, 2013 | 85.94 | 79.71 | 82.71 | 93.75 | 86.96 | 90.23 | 93.33 | 71.66 | 64.66 | Download | BSD-3 Clause |
JU-CSE | CRF, rule-based normaliser | Kolya et al., 2013 | 81.51 | 70.29 | 75.49 | 93.28 | 80.43 | 86.38 | 87.39 | 73.87 | 63.81 | - | - |
KUL (2) | Logistic regression, post-processing, rule-based normaliser | Kolomiyets et al., 2013 | 76.99 | 63.04 | 69.32 | 92.92 | 76.09 | 83.67 | 88.56 | 75.24 | 62.95 | - | - |
FSS-TimEx | rule-based | Zavarella et al., 2013 | 52.03 | 46.38 | 49.04 | 90.24 | 80.43 | 85.06 | 81.08 | 68.47 | 58.24 | - | - |
Task B: Event extraction and classification
System name (best run) | Short description | Main publication | Identification | Attributes | Overall score | Software | License | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Strict matching | Accuracy | ||||||||||
Pre. | Rec. | F1 | Class | Tense | Aspect | ||||||
ATT (1) | Jung et al., 2013 | 81.44 | 80.67 | 81.05 | 88.69 | 73.37 | 90.68 | 71.88 | |||
KUL (2) | Kolomiyets et al., 2013 | 80.69 | 77.99 | 79.32 | 88.46 | - | - | 70.17 | |||
ClearTK (4) | Bethard, 2013 | 81.40 | 76.38 | 78.81 | 86.12 | 78.20 | 90.86 | 67.87 | Download | BSD-3 Clause | |
NavyTime (1) | Chambers, 2013 | 80.73 | 79.87 | 80.30 | 84.03 | 75.79 | 91.26 | 67.48 | |||
Temp: (ESAfeature) | X, 2013 | 78.33 | 61.61 | 68.97 | 79.09 | - | - | 54.55 | |||
JU_CSE | Kolya et al., 2013 | 80.85 | 76.51 | 78.62 | 67.02 | 74.56 | 91.76 | 52.69 | |||
FSS-TimeEx | Zavarella et al., 2013 | 63.13 | 67.11 | 65.06 | 66.00 | - | - | 42.94 |
Task C: Annotating relations given gold entities
Task ABC: Temporal awareness evaluation
Clinical TempEval 2015
- Clinical TempEval 2015, Clinical TempEval, 2015: web page
Performance measures
Results
Time expressions
The table shows the best result for each system. Lower scoring runs for the same system are not shown.
System name (best run) | Short description | Main publication | Span | Class | Software | License | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | A | |||||
Baseline: memorize | - | - | 0.743 | 0.372 | 0.496 | 0.723 | 0.362 | 0.483 | 0.974 | - | - |
KPSCMI: run 1 | Rule-based | - | 0.272 | 0.782 | 0.404 | 0.223 | 0.642 | 0.331 | 0.819 | - | - |
KPSCMI: run 3 | Supervised machine learning | - | 0.693 | 0.706 | 0.699 | 0.657 | 0.669 | 0.663 | 0.948 | - | - |
UFPRSheffield-SVM: run 2 | Supervised machine learning | - | 0.741 | 0.655 | 0.695 | 0.723 | 0.640 | 0.679 | 0.977 | - | - |
UFPRSheffield-Hynx: run 5 | Rule-based | - | 0.411 | 0.795 | 0.542 | 0.391 | 0.756 | 0.516 | 0.952 | - | - |
BluLab: run 1-3 | Supervised machine learning | - | 0.797 | 0.664 | 0.725 | 0.778 | 0.652 | 0.709 | 0.978 | - | - |
Event expressions
The table shows the best result for each system. Lower scoring runs for the same system are not shown.
System name (best run) | Short description | Main publication | Span | Modality | Degree | Polarity | Type | Software | License | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | A | P | R | F1 | A | P | R | F1 | A | P | R | F1 | A | |||||
Baseline | Memorize | - | 0.876 | 0.810 | 0.842 | 0.810 | 0.749 | 0.778 | 0.924 | 0.871 | 0.806 | 0.838 | 0.995 | 0.800 | 0.740 | 0.769 | 0.913 | 0.846 | 0.783 | 0.813 | 0.966 | - | - |
BluLab: run 1-3 | Supervised machine learning | - | 0.887 | 0.864 | 0.875 | 0.834 | 0.813 | 0.824 | 0.942 | 0.882 | 0.859 | 0.870 | 0.994 | 0.868 | 0.846 | 0.857 | 0.979 | 0.834 | 0.812 | 0.823 | 0.941 | - | - |
Temporal relations
The table shows the best result for each system. Lower scoring runs for the same system are not shown.
System name (best run) | Short description | Main publication | To Document Time | Narrative Containers | Software | License | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | |||||
Phase 1: text only | |||||||||||||
Baseline | Memorize | - | 0.600 | 0.555 | 0.577 | - | - | - | - | - | - | - | - |
Baseline | TIMEX3 to closest EVENT | - | - | - | - | 0.368 | 0.061 | 0.104 | 0.400 | 0.061 | 0.106 | - | - |
BluLab: run 2 | Supervised machine learning | - | 0.712 | 0.693 | 0.702 | 0.080 | 0.142 | 0.102 | 0.094 | 0.179 | 0.123 | - | - |
Phase 2: manual EVENTs and TIMEX3s | |||||||||||||
Baseline | Memorize | - | - | - | 0.608 | - | - | - | - | - | - | - | - |
Baseline | TIMEX3 to closest EVENT | - | - | - | - | 0.433 | 0.162 | 0.235 | 0.469 | 0.162 | 0.240 | - | - |
BluLab: run 2 | Supervised machine learning | - | - | - | 0.791 | 0.109 | 0.210 | 0.143 | 0.140 | 0.254 | 0.181 | - | - |
References
- UzZaman, N., Llorens, H., Derczynski, L., Allen, J., Verhagen, M., and Pustejovsky, J. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 1–9.
- Bethard, S. ClearTK-TimeML: A minimalist approach to tempeval 2013. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), vol. 2, Association for Computational Linguistics, Association for Computational Linguistics, pp. 10–14.
- Stro ̈tgen, J., Zell, J., and Gertz, M. Heideltime: Tuning english and developing spanish resources for tempeval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 15–19.
- Jung, H., and Stent, A. ATT1: Temporal annotation using big windows and rich syntactic and semantic features. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 20–24.
- Filannino, M., Brown, G., and Nenadic, G. ManTIME: Temporal expression identification and normalization in the Tempeval-3 challenge. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evalu- ation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 53–57.
- Zavarella, V., and Tanev, H. FSS-TimEx for tempeval-3: Extracting temporal information from text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 58–63.
- Kolya, A. K., Kundu, A., Gupta, R., Ekbal, A., and Bandyopadhyay, S. JU_CSE: A CRF based approach to annotation of temporal expression, event and temporal relations. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 64–72.
- Chambers, N. Navytime: Event and time ordering from raw text. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 73–77.
- Chang, A., and Manning, C. D. SUTime: Evaluation in TempEval-3. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 78–82.
- Kolomiyets, O., and Moens, M.-F. KUL: Data-driven approach to temporal parsing of newswire articles. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceed- ings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 83–87.
- Laokulrat, N., Miwa, M., Tsuruoka, Y., and Chikayama, T. UTTime: Temporal relation classification using deep syntactic features. In Second Joint Conference on Lexical and Computational Se- mantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia, USA, June 2013), Association for Computational Linguistics, pp. 88– 92.