Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
Line 2: Line 2:
  
 
==English==
 
==English==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
 
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]
Line 21: Line 22:
 
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
 
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus], 426 gigabytes of text
+
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://nora.hd.uib.no/icame.html ICAME]
 
*[http://nora.hd.uib.no/icame.html ICAME]
Line 41: Line 42:
  
 
==German==
 
==German==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
 
*[http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]
Line 47: Line 49:
  
 
==Multilingual==
 
==Multilingual==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
Line 90: Line 93:
  
 
==Russian==
 
==Russian==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora]
 
*[http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora]
Line 101: Line 105:
  
 
==Slovak==
 
==Slovak==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://korpus.juls.savba.sk/index.en.html Slovak National Corpus]
 
*[http://korpus.juls.savba.sk/index.en.html Slovak National Corpus]
  
 
==Italian==
 
==Italian==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://languageserver.uni-graz.at/badip/badip/20_corpusLip.php LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP]
 
*[http://languageserver.uni-graz.at/badip/badip/20_corpusLip.php LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP]
Line 111: Line 117:
 
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
 
*[http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]
  
==Link Collections==
+
==Link collections==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
Line 118: Line 125:
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
  
==Corpora Tools==
+
==Corpora tools==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]
Line 125: Line 133:
  
 
==Uncategorized==
 
==Uncategorized==
 +
<!-- Please keep this list in alphabetical order -->
  
 
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]
 
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]

Revision as of 16:25, 3 November 2006

This list needs some cleaning. Please help.

English

German

Multilingual

Russian

Slovak

Italian

Link collections

Corpora tools

Uncategorized