Resources for Chinese: Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
Bond (talk | contribs)
Nonfree or Unknown license: added link to Lancaster
Vladob54 (talk | contribs)
Added: Araneum
Line 13: Line 13:


===Nonfree or Unknown license===
===Nonfree or Unknown license===
* [http://ucts.uniba.sk/aranea_about/ Araneum Sinicum], Gigaword Chinese web corpus
* [http://www.chinesecomputing.com Chinese Computing]  
* [http://www.chinesecomputing.com Chinese Computing]  
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]

Revision as of 19:31, 8 March 2015

Tools

Free software

  • rseg word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)
  • ctbparser word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)
  • ZPar word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)
  • DuDuPlus: a graph-based dependency parser for English and Chinese ("Other Open Source" license?)
    • where is the source code?

Corpora

Free license

Nonfree or Unknown license