Difference between revisions of "Corpora for English"

From ACL Wiki
Jump to navigation Jump to search
(start work on cleaning up this mess)
(more cleanup)
Line 13: Line 13:
 
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]
 
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]
 
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]
 
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]
*[http://www.cs.pitt.edu/mpqa/ Multi-Perspective Question Answering (MPQA)]
 
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]
 
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]
 
*[http://pie.usna.edu/ Phrases in English]
 
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]
 
*[http://www.sketchengine.co.uk/ Sketch Engine]
 
 
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]
 
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]
 
 
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]
 
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm The Dialogue Diversity Corpus]
 
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
 
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]
Line 28: Line 21:
 
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]
 
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
 
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]
*[http://wacky.sslmit.unibo.it/ WaCky]
+
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
*[http://www.webcorp.org.uk/guide/ WebCorp]
 
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl
 
  
===Proprietary===
+
===Proprietary or Require Prior Permission===
 
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus
 
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus
 
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus
 
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus
Line 41: Line 32:
 
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
 
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text
 
+
*[http://mpqa.cs.pitt.edu Multi-Perspective Question Answering (MPQA)]
 +
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]
 +
*[http://www.sketchengine.co.uk/ Sketch Engine]
 +
*[http://wacky.sslmit.unibo.it/ WaCky]
 +
*[http://www.webcorp.org.uk/guide/ WebCorp]
  
  
Line 56: Line 51:
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]
 
*[http://nora.hd.uib.no/icame.html ICAME]
 
*[http://nora.hd.uib.no/icame.html ICAME]
 
+
*[http://pie.usna.edu/ Phrases in English]
 +
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]
 +
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]
 
-->
 
-->
  

Revision as of 09:43, 17 June 2015

For languages other than English, see List of resources by language.

Free and Downloadable

Proprietary or Require Prior Permission


Link collections

Corpora tools