Corpora for English: Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
Chiari (talk | contribs)
mNo edit summary
Chiari (talk | contribs)
No edit summary
Line 97: Line 97:
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]
*[http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]
*[http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]
*[http://bokrcorpora.narod.ru Bokr Russian Reference Corpus]


==Slovak==
==Slovak==
Line 109: Line 110:
*[http://corpus.cilta.unibo.it:8080/coris_ita.html Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)]
*[http://corpus.cilta.unibo.it:8080/coris_ita.html Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)]
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
==Link Collections==
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
*[http://www.alphabit.net Isabella Chiari: Corpora, Software and Linguistic resources]
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
==Corpora Tools==
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
*[http://www.sketchengine.co.uk/ The Sketch Engine]
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]


==Uncategorized==
==Uncategorized==
Line 118: Line 132:
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank]
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System]
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
*[http://www.aot.ru/search1.html AOT]
*[http://www.aot.ru/search1.html AOT]
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1]
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]
*[http://thetis.bl.uk/ BNC Online Service]
*[http://thetis.bl.uk/ BNC Online Service]
*[http://bokrcorpora.narod.ru Bokr Russian Reference Corpus]
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular]
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
*[http://www.corpusdelespanol.org/ Corpus del Espanol]
Line 151: Line 162:
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS]
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection]
Line 177: Line 187:
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus]
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts]
*[http://www.sketchengine.co.uk/ The Sketch Engine]
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages]
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]

Revision as of 12:38, 3 November 2006

This list needs some cleaning. Please help.

English

German

Multilingual

Russian

Slovak


Italian

Link Collections

Corpora Tools

Uncategorized