ACL Wiki - User contributions [en]

POS Tagging (State of the art)

2014-11-26T09:11:27Z

Terrance1026:

==Test collections==
* '''Performance measure:''' per token accuracy. (The convention is for this to be measured on all tokens, including punctuation tokens and other unambiguous tokens.)
* '''English'''
** '''Penn Treebank''' ''Wall Street Journal'' (WSJ) release 3 (LDC99T42). The splits of data for this task were not standardized early on (unlike for parsing) and early work uses various data splits defined by counts of tokens or by sections. Most work from 2002 on adopts the following data splits, introduced by Collins (2002):
*** '''Training data:''' sections 0-18
*** '''Development test data:''' sections 19-21
*** '''Testing data:''' sections 22-24

* '''French'''
** '''French TreeBank''' (FTB, Abeillé et al; 2003) ''Le Monde'', December 2007 version, 28-tag tagset (CC tagset, Crabbé and Candito, 2008). Classical data split (10-10-80):
*** '''Training data:''' sentences 2471 to 12351
*** '''Development test data:''' sentences 1236 to 2470
*** '''Testing data:''' sentences 1 to 1235

== Tables of results ==

===WSJ===

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publication
! Software
! Extra Data?***
! All tokens
! Unknown words
! License
|-
| TnT*
| Hidden markov model
| Brants (2000)
| [http://www.coli.uni-saarland.de/~thorsten/tnt/ TnT]
| No
| 96.46%
| 85.86%
| Academic/research use only ([http://www.coli.uni-saarland.de/~thorsten/tnt/tnt-license.html license])
|-
| MElt
| MEMM with external lexical information
| Denis and Sagot (2009)
| [https://gforge.inria.fr/projects/lingwb/ Alpage linguistic workbench]
| No
| 96.96%
| 91.29%
| CeCILL-C
|-
| GENiA Tagger**
| Maximum entropy cyclic dependency network
| Tsuruoka, et al (2005)
| [http://www.nactem.ac.uk/tsujii/GENIA/tagger/ GENiA]
| No
| 97.05%
| Not available
| Gratis for non-commercial usage
|-
| Averaged Perceptron
| Averaged Perception discriminative sequence model
| Collins (2002)
| Not available
| No
| 97.11%
| Not available
| Unknown
|-
| Maxent easiest-first
| Maximum entropy bidirectional easiest-first inference
| Tsuruoka and Tsujii (2005)
| [http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/ Easiest-first]
| No
| 97.15%
| Not available
| Unknown
|-
| SVMTool
| SVM-based tagger and tagger generator
| Giménez and Márquez (2004)
| [http://www.lsi.upc.es/~nlp/SVMTool/ SVMTool]
| No
| 97.16%
| 89.01%
| LGPL 2.1
|-
| LAPOS
| Perceptron based training with lookahead
| Tsuruoka, Miyao, and Kazama (2011)
| [http://www.logos.t.u-tokyo.ac.jp/~tsuruoka/lapos/ LAPOS]
| No
| 97.22%
| Not available
| MIT
|-
| Morče/COMPOST
| Averaged Perceptron
| Spoustová et al. (2009)
| [http://ufal.mff.cuni.cz/compost COMPOST]
| No
| 97.23%
| Not available
| Non-free ([http://ufal.mff.cuni.cz/compost/register.php academic-only])
|-
| Morče/COMPOST
| Averaged Perceptron
| Spoustová et al. (2009)
| [http://ufal.mff.cuni.cz/compost COMPOST]
| Yes
| 97.44%
| Not available
| Unknown
|-
| Stanford Tagger 1.0
| Maximum entropy cyclic dependency network
| Toutanova et al. (2003)
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]
| No
| 97.24%
| 89.04%
| GPL v2+
|-
| Stanford Tagger 2.0
| Maximum entropy cyclic dependency network
| Manning (2011)
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]
| No
| 97.29%
| 89.70%
| GPL v2+
|-
| Stanford Tagger 2.0
| Maximum entropy cyclic dependency network
| Manning (2011)
| [http://nlp.stanford.edu/software/tagger.shtml Stanford Tagger]
| Yes
| 97.32%
| 90.79%
| GPL v2+
|-
| LTAG-spinal
| Bidirectional perceptron learning
| Shen et al. (2007)
| [http://www.cis.upenn.edu/~xtag/spinal/ LTAG-spinal]
| No
| 97.33%
| Not available
| Unknown
|-
| SCCN
| Semi-supervised condensed nearest neighbor
| Søgaard (2011)
| [http://cst.dk/anders/scnn/ SCCN]
| Yes
| 97.50%
| Not available
| Unknown
|-
| structReg
| CRFs with structure regularization
| Sun(2014)
| Not available
| No
| 97.36%
| Not available
| Unknown
|}

(*) TnT: Accuracy is as reported by Giménez and Márquez (2004) for the given test collection. Brants (2000) reports 96.7% token accuracy and 85.5% unknown word accuracy on a 10-fold cross-validation of the Penn WSJ corpus.

(**) GENiA: Results are for models trained and tested on the given corpora (to be comparable to other results). The distributed GENiA tagger is trained on a mixed training corpus and gets 96.94% on WSJ, and 98.26% on GENiA biomedical English.

(***) Extra data: Whether system training exploited (usually large amounts of) extra unlabeled text, such as by semi-supervised learning, self-training, or using distributional similarity features, beyond the standard supervised training data.

===FTB===

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publication
! Software
! Extra Data?***
! All tokens
! Unknown words
! License
|-
| Morfette
| Perceptron with external lexical information*
| Chrupała et al. (2008), Seddah et al. (2010)
| [http://sites.google.com/site/morfetteweb/ Morfette]
| No
| 97.68%
| 90.52%
| New BSD
|-
| SEM
| CRF with external lexical information*
| Constant et al. (2011)
| [http://www.univ-orleans.fr/lifo/Members/Isabelle.Tellier/SEM.html SEM]
| No
| 97.7%
| Not available
| "GNU"(?)
|-
| MElt
| MEMM with external lexical information*
| Denis and Sagot (2009)
| [https://gforge.inria.fr/projects/lingwb/ Alpage linguistic workbench]
| No
| 97.80%
| 91.77%
| CeCILL-C
|}

(*) External lexical information from the Lefff lexicon (Sagot 2010, [https://gforge.inria.fr/frs/?group_id=482 Alexina project])

== References ==

* Brants, Thorsten. 2000. [http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf TnT -- A Statistical Part-of-Speech Tagger]. "6th Applied Natural Language Processing Conference".

* Chrupała, Grzegorz, Dinu, Georgiana and van Genabith, Josef. 2008. [http://www.lrec-conf.org/proceedings/lrec2008/pdf/594_paper.pdf Learning Morphology with Morfette]. "LREC 2008".

* Collins, Michael. 2002. [http://people.csail.mit.edu/mcollins/papers/tagperc.pdf Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms]. ''EMNLP 2002''.

* Constant, Matthieu, Tellier, Isabelle, Duchier, Denys, Dupont, Yoann, Sigogne, Anthony, and Billot, Sylvie. [http://www.lirmm.fr/~lopez/TALN2011/Longs-TALN+RECITAL/Tellier_taln11_submission_54.pdf Intégrer des connaissances linguistiques dans un CRF : application à l'apprentissage d'un segmenteur-étiqueteur du français]. "TALN'11"

* Denis, Pascal and Sagot, Benoît. 2009. [http://alpage.inria.fr/~sagot/pub/paclic09tagging.pdf Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort]. "PACLIC 2009"

* Giménez, J., and Márquez, L. 2004. [http://www.lsi.upc.es/~nlp/SVMTool/lrec2004-gm.pdf SVMTool: A general POS tagger generator based on Support Vector Machines]. ''Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04)''. Lisbon, Portugal.

* Manning, Christopher D. 2011. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? In Alexander Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, pp. 171--189. Springer.

* Seddah, Djamé, Chrupała, Grzegorz, Çetinoglu, Özlem and Candito, Marie. 2010. [http://aclweb.org/anthology-new/W/W10/W10-1410.pdf Lemmatization and Lexicalized Statistical Parsing of Morphologically Rich Languages: the Case of French] "SPMRL 2010 (NAACL 2010 workshop)"

* Shen, L., Satta, G., and Joshi, A. 2007. [http://acl.ldc.upenn.edu/P/P07/P07-1096.pdf Guided learning for bidirectional sequence classification]. ''Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007)'', pages 760-767.

* Søgaard, Anders. 2011. Semi-supervised condensed nearest neighbor for part-of-speech tagging. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). Portland, Oregon.

* Spoustová, Drahomíra "Johanka", Jan Hajič, Jan Raab and Miroslav Spousta. 2009. Semi-supervised Training for the Averaged Perceptron POS Tagger. Proceedings of the 12 EACL, pages 763-771.

* Toutanova, K., Klein, D., Manning, C.D., Yoram Singer, Y. 2003. [http://nlp.stanford.edu/kristina/papers/tagging.pdf Feature-rich part-of-speech tagging with a cyclic dependency network]. ''Proceedings of HLT-NAACL 2003'', pages 252-259.

* Tsuruoka, Yoshimasa, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/pci05.pdf Developing a Robust Part-of-Speech Tagger for Biomedical Text, Advances in Informatics]" - ''10th Panhellenic Conference on Informatics'', '''LNCS 3746''', pp. 382-392, 2005

* Tsuruoka, Yoshimasa, Yusuke Miyao, and Jun’ichi Kazama. 2011. "[http://aclweb.org/anthology-new/W/W11/W11-0328.pdf Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models?]" ''Proceedings of the Fifteenth Conference on Computational Natural Language Learning'', pp 238–246, 2011.

* Tsuruoka, Yoshimasa and Jun'ichi Tsujii. 2005. "[http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/emnlp05bidir.pdf Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data]", ''Proceedings of HLT/EMNLP 2005'', pp. 467-474.

* Sun, Xu. "[http://papers.nips.cc/paper/5643-structure-regularization-for-structured-prediction.pdf Structure Regularization for Structured Prediction]". ''In Neural Information Processing Systems (NIPS)''. 2402-2410. 2014

== See also ==
* [[POS Induction (State of the art)]]
* [[Part-of-speech tagging]]
* [[State of the art]]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-11T05:45:12Z

Terrance1026: /* Table of results */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Reports (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| V06
| Conditional random fields + Stochastic Meta Decent (SMD)
| S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark Schmidt, and Kevin Murphy (2006), ICML
| No
| 93.6%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

T. Kudo and Y. Matsumoto (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

T. Kudo and Y. Matsumoto (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

H. Shen and A. Sarkar (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

S. V. N. Vishwanathan, N. Schraudolph, M. Schmidt, and K. Murphy. Accelerated Training Conditional Random Fields with Stochastic Gradient Methods. In Proc. Intl. Conf. Machine Learning, pp. 969 – 976, ACM Press, New York, NY, USA, 2006.

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T18:18:01Z

Terrance1026: /* Table of results */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| V06
| Conditional random fields + Stochastic Meta Decent (SMD)
| S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark Schmidt, and Kevin Murphy (2006), ICML
| No
| 93.6%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

T. Kudo and Y. Matsumoto (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

T. Kudo and Y. Matsumoto (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

H. Shen and A. Sarkar (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

S. V. N. Vishwanathan, N. Schraudolph, M. Schmidt, and K. Murphy. Accelerated Training Conditional Random Fields with Stochastic Gradient Methods. In Proc. Intl. Conf. Machine Learning, pp. 969 – 976, ACM Press, New York, NY, USA, 2006.

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T18:13:21Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

T. Kudo and Y. Matsumoto (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

T. Kudo and Y. Matsumoto (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

H. Shen and A. Sarkar (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

S. V. N. Vishwanathan, N. Schraudolph, M. Schmidt, and K. Murphy. Accelerated Training Conditional Random Fields with Stochastic Gradient Methods. In Proc. Intl. Conf. Machine Learning, pp. 969 – 976, ACM Press, New York, NY, USA, 2006.

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T16:37:13Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

T. Kudo and Y. Matsumoto (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

T. Kudo and Y. Matsumoto (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

H. Shen and A. Sarkar (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T16:35:55Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

T. Kudo and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

T. Kudo and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

H. Shen and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T16:34:22Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

F. Sha and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/htmls/Papers.html Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

R. McDonald, K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

X. Sun, L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T16:28:34Z

Terrance1026: /* Table of results */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000), CONLL
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING
| HCRF Library
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T16:27:18Z

Terrance1026: /* Table of results */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (2001), NAACL'01
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (2003), HLT/NAACL'03
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (2005), HLT/EMNLP'05
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun, Louis-Philippe Morency, Daisuke Okanohara and Jun'ichi Tsujii (2008), COLING'08
| HCRF Library
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T12:55:11Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (NAACL 2001)
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (HLT/NAACL 2003)
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (HLT/EMNLP 2005)
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun and Louis-Philippe Morency and Daisuke Okanohara and Jun'ichi Tsujii (COLING 2008)
| No
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). [http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T12:53:23Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (NAACL 2001)
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (HLT/NAACL 2003)
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (HLT/EMNLP 2005)
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun and Louis-Philippe Morency and Daisuke Okanohara and Jun'ichi Tsujii (COLING 2008)
| No
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). Shallow Parsing with Conditional Random Fields. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T12:51:12Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (NAACL 2001)
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (HLT/NAACL 2003)
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (HLT/EMNLP 2005)
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun and Louis-Philippe Morency and Daisuke Okanohara and Jun'ichi Tsujii (COLING 2008)
| No
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). [Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). [http://www.aclweb.org/anthology-new/C/C08/C08-1106.pdf Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference]. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T12:49:28Z

Terrance1026: /* References */

* '''Performance measure:''' F = 2 * Precision * Recall / (Recall + Precision)
* '''Precision:''' percentage of NPs found by the algorithm that are correct
* '''Recall:''' percentage of NPs defined in the corpus that were found by the chunking program
* '''Training data:''' sections 15-18 of Wall Street Journal corpus (Ramshaw and Marcus)
* '''Testing data:''' section 20 of Wall Street Journal corpus (Ramshaw and Marcus)
* original data of the NP chunking experiments by Lance Ramshaw and Mitch Marcus
* data contains one word per line and each line contains six fields of which only the first three fields are relevant: the word, the part-of-speech tag assigned by the Brill tagger, and the correct IOB tag

== Table of results ==

{| border="1" cellpadding="5" cellspacing="1" width="100%"
|-
! System name
! Short description
! Main publications
! Software
! Results (F)
|-
| KM00
| B-I-O tagging using SVM classifiers with polynomial kernel
| Kudo and Matsumoto (2000)
| [http://chasen.org/~taku/software/yamcha/ YAMCHA Toolkit] (but models are not provided)
| 93.79%
|-
| KM01
| learning as in KM00, but voting between different representations
| Kudo and Matsumoto (NAACL 2001)
| No
| 94.22%
|-
| SP03
| Second order conditional random fields
| Fei Sha and Fernando Pereira (HLT/NAACL 2003)
| No
| 94.3%
|-
| SS05
| specialized HMM + voting between different representations
| Shen and Sarkar (2005)
| No
| 95.23%
|-
| M05
| Second order conditional random fields + multi-label classification
| Ryan McDonald, KOby Crammer and Fernando Pereira (HLT/EMNLP 2005)
| No
| 94.29%
|-
| S08
| Second order latent-dynamic conditional random fields + an improved inference method based on A* search
| Xu Sun and Louis-Philippe Morency and Daisuke Okanohara and Jun'ichi Tsujii (COLING 2008)
| No
| 94.34%
|-
|}

== References ==

Kudo, T., and Matsumoto, Y. (2000). [http://acl.ldc.upenn.edu/W/W00/W00-0730.pdf Use of support vector learning for chunk identification]. ''Proceedings of the 4th Conference on CoNLL-2000 and LLL-2000'', pages 142-144, Lisbon, Portugal.

Kudo, T., and Matsumoto, Y. (2001). [http://acl.ldc.upenn.edu/N/N01/N01-1025.pdf Chunking with support vector machines]. ''Proceedings of NAACL-2001''.

Sha, F., and F. Pereira (2003). [http://www-rcf.usc.edu/~feisha/pubs/shallow03.pdf Shallow Parsing with Conditional Random Fields]. ''Proceedings of HLT-NAACL 2003'', pages 213-220. Edmonton, Canada.

Shen, H., and Sarkar, A. (2005). [http://www.cs.sfu.ca/~anoop/papers/pdf/ai05.pdf Voting between multiple data representations for text chunking]. ''Proceedings of the Eighteenth Meeting of the Canadian Society for Computational Intelligence, Canadian AI 2005''.

McDonald, R., K. Crammer and F. Pereira (2005). http://ryanmcd.googlepages.com/segmentationHLT-EMNLP2005.pdf Flexible Text Segmentation with Structured Multilabel Classification]. ''Human Language Technologies and Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005''

Sun, X., L.P. Morency, D. OKanohara and J. Tsujii (2008). Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference. ''Proceedings of The 22nd International Conference on Computational Linguistics (COLING 2008)''. Pages 841-848. Manchester, UK.

== See also ==

* [[State of the art]]

== External links ==

* dataset is available from [ftp://ftp.cis.upenn.edu/pub/chunker/ ftp://ftp.cis.upenn.edu/pub/chunker/]
* more information is available from [http://ifarm.nl/erikt/research/np-chunking.html NP Chunking]

[[Category:State of the art]]

NP Chunking (State of the art)

2009-01-10T12:37:13Z

Terrance1026: /* Table of results */