Difference between revisions of "Multilingual Corpora"

From ACL Wiki
Jump to navigation Jump to search
(add another UN parallel corpus)
(move monolingual corpora to individual language's pages, part 2)
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
For individual languages, see [[List of resources by language]].
+
This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]].
  
 
See also [[Multilingual resources]].
 
See also [[Multilingual resources]].
Line 6: Line 6:
 
<!-- Please keep this list in alphabetical order -->
 
<!-- Please keep this list in alphabetical order -->
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
 
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]
*[http://spraakbanken.gu.se/ Bank of Swedish]
 
 
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]
 
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]
*[http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]
 
*[http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]
 
 
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]
 
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]
 
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]
 
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]
Line 17: Line 14:
 
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]
 
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]
 
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]
 
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine]
 
 
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]
 
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol]
 
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
 
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.
*[http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]
 
 
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]
 
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]
 
*[http://korpus.pl/ IPI PAN Corpus of Polish]
 
 
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]
 
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]
 
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]
 
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]
Line 35: Line 27:
 
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]
 
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]
 
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]
 
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]
 
 
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]
 
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]
*[http://www.corpusdoportugues.org/ Portuguese Corpus]
 
 
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]
 
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]
 
 
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]
 
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]
 
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]
 
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]
 
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]
 
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]
 
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish
 
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish
*[http://www.corpusdelespanol.org/ Spanish Corpus]
 
 
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]
 
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]
 
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]
 
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]

Revision as of 08:56, 26 June 2016

This page lists multilingual corpora. For monolingual corpora, see List of resources by language.

See also Multilingual resources.