Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
(HamleDT)
(7 intermediate revisions by 2 users not shown)
Line 25: Line 25:
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 +
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]
 +
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]
 
*[http://nora.hd.uib.no/icame.html ICAME]
 
*[http://nora.hd.uib.no/icame.html ICAME]
Line 43: Line 45:
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 +
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]
 +
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 +
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
 
*[http://wacky.sslmit.unibo.it/ WaCky]
 
*[http://wacky.sslmit.unibo.it/ WaCky]
 
*[http://www.webcorp.org.uk/guide/ WebCorp]
 
*[http://www.webcorp.org.uk/guide/ WebCorp]
 +
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
 +
  
 
==Link collections==
 
==Link collections==
Line 51: Line 58:
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
 
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]
*[http://www.alphabit.net Isabella Chiari: Corpora, Software and Linguistic resources]
 
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
 
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]
  

Revision as of 09:41, 26 May 2014

For languages other than English, see List of resources by language.


Link collections

Corpora tools