Corpora for English: Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
No edit summary
Line 2: Line 2:


==English==
==English==
<!-- Please keep this list in alphabetical order -->


*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
Line 21: Line 22:
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus], 426 gigabytes of text
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
*[http://nora.hd.uib.no/icame.html ICAME]
*[http://nora.hd.uib.no/icame.html ICAME]
Line 41: Line 42:


==German==
==German==
<!-- Please keep this list in alphabetical order -->


*[http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
*[http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
Line 47: Line 49:


==Multilingual==
==Multilingual==
<!-- Please keep this list in alphabetical order -->


*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
Line 90: Line 93:


==Russian==
==Russian==
<!-- Please keep this list in alphabetical order -->


*[http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora]
*[http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora]
Line 101: Line 105:


==Slovak==
==Slovak==
<!-- Please keep this list in alphabetical order -->


*[http://korpus.juls.savba.sk/index.en.html Slovak National Corpus]
*[http://korpus.juls.savba.sk/index.en.html Slovak National Corpus]


==Italian==
==Italian==
<!-- Please keep this list in alphabetical order -->


*[http://languageserver.uni-graz.at/badip/badip/20_corpusLip.php LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP]
*[http://languageserver.uni-graz.at/badip/badip/20_corpusLip.php LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP]
Line 111: Line 117:
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]


==Link Collections==
==Link collections==
<!-- Please keep this list in alphabetical order -->


*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
Line 118: Line 125:
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]


==Corpora Tools==
==Corpora tools==
<!-- Please keep this list in alphabetical order -->


*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
Line 125: Line 133:


==Uncategorized==
==Uncategorized==
<!-- Please keep this list in alphabetical order -->


*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]

Revision as of 22:25, 3 November 2006

This list needs some cleaning. Please help.

English

German

Multilingual

Russian

Slovak

Italian

Link collections

Corpora tools

Uncategorized