Corpora for English: Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 66: Line 66:
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]


==Arabic==
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
==Bosnian==
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
==Bulgarian==
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian]
==Croatian==
*[http://riznica.ihjj.hr/en/ Croatian Language Corpus at the IHJJ]
==Czech==
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]
==Danish==
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus]


==Finnish==
==Finnish==

Revision as of 20:01, 26 April 2008

For languages other than English, see List of resources by language.

English


Link collections

Corpora tools


Finnish

French

German

Haitian Creole

Italian

Japanese

Polish

Romanian

Sanskrit

Slovenian

Spanish

Swahili

Uncategorized