Difference between revisions of "MUC-7 (State of the art)"

From ACL Wiki
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
* '''Recall:''' percentage of named entities defined in the corpus that were found by the program
 
* '''Recall:''' percentage of named entities defined in the corpus that were found by the program
 
* Exact calculation of precision and recall is explained in the [http://www.itl.nist.gov/iad/894.02/related_projects/muc/muc_sw/muc_sw_manual.html MUC scoring software]
 
* Exact calculation of precision and recall is explained in the [http://www.itl.nist.gov/iad/894.02/related_projects/muc/muc_sw/muc_sw_manual.html MUC scoring software]
 +
  
 
* '''Training data:''' Training section of MUC-7 dataset
 
* '''Training data:''' Training section of MUC-7 dataset
 +
* '''Dryrun data:''' Dryrun section of MUC-7 dataset
 
* '''Testing data:''' Formal section of MUC-7 dataset
 
* '''Testing data:''' Formal section of MUC-7 dataset
  
Line 15: Line 17:
 
! System name
 
! System name
 
! Short description
 
! Short description
! System type
+
! System type (1)
 
! Main publications
 
! Main publications
 
! Software
 
! Software
! Results (F)
+
! Results
 
|-
 
|-
 
| Annotator
 
| Annotator
Line 33: Line 35:
 
| -
 
| -
 
| 93.39%
 
| 93.39%
 +
|-
 +
| Balie
 +
| Unsupervised approach: no prior training
 +
| U
 +
| Nadeau, Turney and Matwin (2006)
 +
| [http://balie.sourceforge.net sourceforge.net]
 +
| 77.71% (2)
 
|-
 
|-
 
| Baseline
 
| Baseline
Line 39: Line 48:
 
| Whitelaw and Patrick (2003)
 
| Whitelaw and Patrick (2003)
 
| -
 
| -
| 58.89%
+
| 58.89% (2)
 
|-
 
|-
 
|}
 
|}
  
* '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid  
+
* (1) '''System type''': R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid  
 +
* (2) Calculated on Enamex types only.
  
  
Line 49: Line 59:
  
 
Mikheev, A., Grover, C. and Moens, M. (1998). [http://www-nlpir.nist.gov/related_projects/muc/proceedings/muc_7_proceedings/ltg_muc7.pdf Description of the LTG system used for MUC-7]. ''Proceedings of the Seventh Message Understanding Conference (MUC-7)''. Fairfax, Virginia.
 
Mikheev, A., Grover, C. and Moens, M. (1998). [http://www-nlpir.nist.gov/related_projects/muc/proceedings/muc_7_proceedings/ltg_muc7.pdf Description of the LTG system used for MUC-7]. ''Proceedings of the Seventh Message Understanding Conference (MUC-7)''. Fairfax, Virginia.
 +
 +
Nadeau, D., Turney, P. D. and Matwin, S. (2006) [http://iit-iti.nrc-cnrc.gc.ca/publications/nrc-48727_e.html Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity]. ''Proceedings 19th Canadian Conference on Artificial Intelligence''. Québec, Canada.
  
 
Whitelaw, C. and Patrick, J. (2003) [http://www.springerlink.com/content/ju66c6a2734fl20u/ Evaluating Corpora for Named Entity Recognition Using Character-Level Features]. ''Proceeding of the 16th Australian Conference on AI''. Perth, Australia.  
 
Whitelaw, C. and Patrick, J. (2003) [http://www.springerlink.com/content/ju66c6a2734fl20u/ Evaluating Corpora for Named Entity Recognition Using Character-Level Features]. ''Proceeding of the 16th Australian Conference on AI''. Perth, Australia.  
Line 56: Line 68:
 
* [[Named Entity Recognition (State of the art)|Named Entity Recognition]]
 
* [[Named Entity Recognition (State of the art)|Named Entity Recognition]]
 
* [[State of the art]]
 
* [[State of the art]]
 +
 +
[[Category:State of the art]]

Latest revision as of 07:51, 7 August 2007

  • Performance measure: F = 2 * Precision * Recall / (Recall + Precision)
  • Precision: percentage of named entities found by the algorithm that are correct
  • Recall: percentage of named entities defined in the corpus that were found by the program
  • Exact calculation of precision and recall is explained in the MUC scoring software


  • Training data: Training section of MUC-7 dataset
  • Dryrun data: Dryrun section of MUC-7 dataset
  • Testing data: Formal section of MUC-7 dataset


Table of results

System name Short description System type (1) Main publications Software Results
Annotator Human annotator - MUC-7 proceedings - 97.60%
LTG Best MUC-7 participant H Mikheev, Grover and Moens (1998) - 93.39%
Balie Unsupervised approach: no prior training U Nadeau, Turney and Matwin (2006) sourceforge.net 77.71% (2)
Baseline Vocabulary transfer from training to testing S Whitelaw and Patrick (2003) - 58.89% (2)
  • (1) System type: R = hand-crafted rules, S = supervised learning, U = unsupervised learning, H = hybrid
  • (2) Calculated on Enamex types only.


References

Mikheev, A., Grover, C. and Moens, M. (1998). Description of the LTG system used for MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7). Fairfax, Virginia.

Nadeau, D., Turney, P. D. and Matwin, S. (2006) Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Proceedings 19th Canadian Conference on Artificial Intelligence. Québec, Canada.

Whitelaw, C. and Patrick, J. (2003) Evaluating Corpora for Named Entity Recognition Using Character-Level Features. Proceeding of the 16th Australian Conference on AI. Perth, Australia.

See also