https://aclweb.org/aclwiki/api.php?action=feedcontributions&user=Jonsafari&feedformat=atom
ACL Wiki - User contributions [en]
2024-03-28T18:34:27Z
User contributions
MediaWiki 1.35.2
https://aclweb.org/aclwiki/index.php?title=Multilingual_resources&diff=11568
Multilingual resources
2016-06-26T14:59:04Z
<p>Jonsafari: rm dead link</p>
<hr />
<div>For individual languages, see [[List of resources by language]].<br />
<br />
* [[Multilingual Dictionaries]] - including Bilingual Dictionaries<br />
* [[Multilingual Corpora]] - including Bilingual Corpora<br />
* [[Multilingual Tools and Software]] - including Bilingual Tools and Software<br />
<br />
== Related resources ==<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
* [http://www.dunglish.nl/ Dunglish (Dutch & English)]<br />
* [http://www.omegawiki.org/ OmegaWiki] - a collaborative project to produce a free (as in free software) multilingual resource in every language, with lexicological, terminological and thesaurus information<br />
* [https://www.mpi-inf.mpg.de/yago-naga/uwn/ UWN]: Multilingual wordnet covering over 200 languages, freely available as a TSV file<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Multilingual_Corpora&diff=11567
Multilingual Corpora
2016-06-26T14:56:45Z
<p>Jonsafari: move monolingual corpora to individual language's pages, part 2</p>
<hr />
<div>This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]<br />
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]<br />
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]<br />
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]<br />
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]<br />
*[http://www.debian.org/international/ Debian free software community]<br />
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus]<br />
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]<br />
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]<br />
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]<br />
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]<br />
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]<br />
*[http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus]<br />
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages]<br />
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources]<br />
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]<br />
*[http://multisemcor.itc.it MultiSemCor]<br />
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]<br />
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]<br />
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]<br />
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]<br />
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]<br />
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]<br />
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]<br />
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish<br />
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]<br />
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX]<br />
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif]<br />
*[http://wacky.sslmit.unibo.it/ WaCky Project]<br />
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch]<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Portugese&diff=11566
Resources for Portugese
2016-06-26T14:53:35Z
<p>Jonsafari: +Corpus do Portugues</p>
<hr />
<div><br />
==Corpora==<br />
* [http://corporavm.uni-koeln.de/colonia/ Colonia], corpus of historical Portuguese.<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://www.corpusdoportugues.org/ o Corpus do Português] (web-only interface)<br />
<br />
<br />
==Software==<br />
* [http://lael.pucsp.br/corpora/segmentador/ CEPRIL] - Portugese Segmenter<br />
* [http://www.linguateca.pt/corpografo Corpógrafo] - a Web-based environment for corpora research<br />
<br />
==Word Lists==<br />
* [http://www.uni-koeln.de/~mzampier/resources/pawl.txt P-AWL] - the Portuguese academic word list compiled as described in [http://link.springer.com/chapter/10.1007/978-3-642-12320-7_15#page-1 Baptista et al. (2010)]<br />
<br />
[[Category:Resources by language|Portugese]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=11565
Corpora for English
2016-06-26T14:48:46Z
<p>Jonsafari: +ICE</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
===Free and Downloadable===<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[https://corpling.uis.georgetown.edu/gum/ GUM - Georgetown University Multilayer corpus], multiple parses, coreference, entities, sentence types and RST<br />
*[https://www.gutenberg.org Project Gutenberg]<br />
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm International Corpus of English]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Proprietary or Require Prior Permission===<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://www.athel.com/cpsa.html Corpus of Spoken Professional English]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
*[http://mpqa.cs.pitt.edu Multi-Perspective Question Answering (MPQA)]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
<br />
<br />
<!-- Dead links<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
--><br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://corpus-tools.org/annis/ ANNIS] - open source search tool for complex multilayer corpora<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Multilingual_Corpora&diff=11564
Multilingual Corpora
2016-06-26T14:45:49Z
<p>Jonsafari: move monolingual corpora to individual language's page, part 1</p>
<hr />
<div>This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]<br />
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]<br />
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]<br />
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]<br />
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]<br />
*[http://www.debian.org/international/ Debian free software community]<br />
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus]<br />
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]<br />
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]<br />
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]<br />
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]<br />
*[http://korpus.pl/ IPI PAN Corpus of Polish]<br />
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]<br />
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]<br />
*[http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus]<br />
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages]<br />
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources]<br />
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]<br />
*[http://multisemcor.itc.it MultiSemCor]<br />
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]<br />
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]<br />
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]<br />
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]<br />
*[http://www.corpusdoportugues.org/ Portuguese Corpus]<br />
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]<br />
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]<br />
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]<br />
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]<br />
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]<br />
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish<br />
*[http://www.corpusdelespanol.org/ Spanish Corpus]<br />
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]<br />
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX]<br />
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif]<br />
*[http://wacky.sslmit.unibo.it/ WaCky Project]<br />
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch]<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Hungarian&diff=11563
Resources for Hungarian
2016-06-26T14:44:35Z
<p>Jonsafari: +hungarian national corpus</p>
<hr />
<div>==Corpora==<br />
===Free===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://mokk.bme.hu/resources/webcorpus/ Hungarian Webcorpus] - 590 million tokens<br />
<br />
===Non-Free===<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Hungaricum], Gigaword Hungarian web corpus<br />
* Hunglish parallel corpus ([http://mokk.bme.hu/resources/hunglishcorpus download], [http://hunglish.hu/search search])<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]<br />
<br />
<br />
== Tools ==<br />
* [http://code.google.com/p/hunpos/ hunpos] - open-source POS-tagger<br />
* [http://mokk.bme.hu/resources/hunmorph/ hunmorph] - open-source morphological analyzer<br />
<br />
<br />
<br />
[[Category:Resources by language|Hungarian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Multilingual_Corpora&diff=11562
Multilingual Corpora
2016-06-26T14:35:55Z
<p>Jonsafari: clarification</p>
<hr />
<div>This page lists ''multilingual'' corpora. For ''monolingual'' corpora, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]<br />
*[http://spraakbanken.gu.se/ Bank of Swedish]<br />
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]<br />
*[http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]<br />
*[http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]<br />
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]<br />
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]<br />
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]<br />
*[http://www.debian.org/international/ Debian free software community]<br />
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus]<br />
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]<br />
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]<br />
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine]<br />
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]<br />
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]<br />
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]<br />
*[http://korpus.pl/ IPI PAN Corpus of Polish]<br />
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]<br />
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]<br />
*[http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus]<br />
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages]<br />
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources]<br />
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]<br />
*[http://multisemcor.itc.it MultiSemCor]<br />
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]<br />
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]<br />
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]<br />
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]<br />
*[http://www.corpusdoportugues.org/ Portuguese Corpus]<br />
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]<br />
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]<br />
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]<br />
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]<br />
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]<br />
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish<br />
*[http://www.corpusdelespanol.org/ Spanish Corpus]<br />
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]<br />
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX]<br />
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif]<br />
*[http://wacky.sslmit.unibo.it/ WaCky Project]<br />
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch]<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Multilingual_Corpora&diff=11561
Multilingual Corpora
2016-06-26T14:31:53Z
<p>Jonsafari: add another UN parallel corpus</p>
<hr />
<div>For individual languages, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]<br />
*[http://spraakbanken.gu.se/ Bank of Swedish]<br />
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]<br />
*[http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]<br />
*[http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]<br />
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]<br />
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]<br />
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]<br />
*[http://www.debian.org/international/ Debian free software community]<br />
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus]<br />
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]<br />
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]<br />
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine]<br />
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]<br />
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]<br />
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]<br />
*[http://korpus.pl/ IPI PAN Corpus of Polish]<br />
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]<br />
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]<br />
*[http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus]<br />
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages]<br />
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources]<br />
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]<br />
*[http://multisemcor.itc.it MultiSemCor]<br />
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]<br />
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]<br />
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]<br />
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]<br />
*[http://www.corpusdoportugues.org/ Portuguese Corpus]<br />
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]<br />
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]<br />
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]<br />
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]<br />
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]<br />
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish<br />
*[http://www.corpusdelespanol.org/ Spanish Corpus]<br />
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]<br />
*[http://conferences.unite.un.org/UNCorpus/ UN parallel corpora]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX]<br />
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif]<br />
*[http://wacky.sslmit.unibo.it/ WaCky Project]<br />
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch]<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Conference_acceptance_rates&diff=11187
Conference acceptance rates
2015-08-12T10:40:19Z
<p>Jonsafari: /* EMNLP */ 2015</p>
<hr />
<div>==[[ACL]]==<br />
<br />
=== Main Session ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>1997</td><br />
<td>264</td><br />
<td>83</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>1998 (w/COLING)</td><br />
<td>550</td><br />
<td>137</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>1999</td><br />
<td>320</td><br />
<td>80</td><br />
<td>25%</td><br />
</tr><tr> <br />
<td>2000</td><br />
<td>267</td><br />
<td>70</td><br />
<td>26.2%</td><br />
</tr><br />
<tr><br />
<td>2001</td><br />
<td>260</td><br />
<td>69</td><br />
<td>27%</td><br />
</tr><br />
<tr><br />
<td>2002</td><br />
<td>256</td><br />
<td>66</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>360</td><br />
<td>71</td><br />
<td>20%</td><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>348</td><br />
<td>88</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2005</td><br />
<td>423</td><br />
<td>77</td><br />
<td>18%</td><br />
</tr><br />
<tr><br />
<td>2006 (w/COLING)</td><br />
<td>630</td><br />
<td>147</td><br />
<td>23%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td>588</td><br />
<td>131</td><br />
<td>22.3%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>470</td><br />
<td>119</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>569</td><br />
<td>121</td><br />
<td>21%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>638</td><br />
<td>160</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>634</td><br />
<td>164</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>571</td><br />
<td>111</td><br />
<td>19%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>664</td><br />
<td>174</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>572</td><br />
<td>146</td><br />
<td>26.2%</td><br />
</tr><br />
</table><br />
<br />
=== Student Session ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>1992</td><br />
<td>48</td><br />
<td>20</td><br />
<td>42%</td><br />
</tr><br />
<tr><br />
<td>1993</td><br />
<td>30</td><br />
<td>11</td><br />
<td>37%</td><br />
</tr><br />
<tr><br />
<td>1994</td><br />
<td>41</td><br />
<td>10</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>1995</td><br />
<td>48</td><br />
<td>19</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>1996</td><br />
<td>32</td><br />
<td>14</td><br />
<td>44%</td><br />
</tr><br />
<tr><br />
<td>1997</td><br />
<td>42</td><br />
<td>10</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>1998</td><br />
<td>46</td><br />
<td>12</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>1999</td><br />
<td>30</td><br />
<td>10</td><br />
<td>33%</td><br />
</tr><br />
<tr> <br />
<td>2000</td><br />
<td>36</td><br />
<td>10</td><br />
<td>28%</td><br />
</tr><br />
<tr><br />
<td>2005</td><br />
<td>70</td><br />
<td>26</td><br />
<td>37%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>40</td><br />
<td>15</td><br />
<td>38%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td>52</td><br />
<td>16</td><br />
<td>31%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>27</td><br />
<td>12</td><br />
<td>44%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>25</td><br />
<td>12</td><br />
<td>48%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>37</td><br />
<td>19</td><br />
<td>51.4%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>57</td><br />
<td>22</td><br />
<td>38.6%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>31</td><br />
<td>14</td><br />
<td>45.2%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>52</td><br />
<td>25</td><br />
<td>48.1%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>26</td><br />
<td>13</td><br />
<td>50%</td><br />
</tr><br />
</table><br />
<br />
=== Posters/Short Papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2005</td><br />
<td>56</td><br />
<td>31</td><br />
<td>55%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>630</td><br />
<td>125</td><br />
<td>20%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td></td><br />
<td></td><br />
<td></td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>275</td><br />
<td>64</td><br />
<td>23%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>356</td><br />
<td>93</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>318</td><br />
<td>70</td><br />
<td>22%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>512</td><br />
<td>128</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>369</td><br />
<td>76</td><br />
<td>21%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>624</td><br />
<td>154</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>551</td><br />
<td>139</td><br />
<td>26.1%</td><br />
</tr><br />
</table><br />
<br />
==[[CICLing]]==<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr> <br />
<td>2000</td><br />
<td>??</td><br />
<td>29</td><br />
<td>??</td><br />
</tr><br />
<tr><br />
<td>2001</td><br />
<td>72</td><br />
<td>41</td><br />
<td>57%</td><br />
</tr><br />
<tr> <br />
<td>2002</td><br />
<td>67</td><br />
<td>35</td><br />
<td>52%</td><br />
</tr><br />
<tr> <br />
<td>2003</td><br />
<td>92</td><br />
<td>43</td><br />
<td>46%</td><br />
</tr><br />
<tr> <br />
<td>2004</td><br />
<td>129</td><br />
<td>40</td><br />
<td>31%</td><br />
</tr><br />
<tr> <br />
<td>2005</td><br />
<td>151</td><br />
<td>53</td><br />
<td>35%</td><br />
</tr><br />
<tr> <br />
<td>2006</td><br />
<td>176 (141 full + 35 short)</td><br />
<td>59 (43 full + 16 short)</td><br />
<td>30.4% full & 45.7% short</td><br />
</tr><br />
<tr> <br />
<td>2007</td><br />
<td>179</td><br />
<td>53</td><br />
<td>29.6%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>204</td><br />
<td>52</td><br />
<td>25.5%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>167</td><br />
<td>44</td><br />
<td>26.3%</td><br />
</tr><br />
<tr> <br />
<td>2010</td><br />
<td>271</td><br />
<td>61</td><br />
<td>22.5%</td><br />
</tr><br />
<tr> <br />
<td>2012</td><br />
<td>307</td><br />
<td>88</td><br />
<td>28.6%</td><br />
</tr><br />
</table><br />
<br />
==[[COLING]]==<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>1998</td><br />
<td>550</td><br />
<td>137</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2000</td><br />
<td>323</td><br />
<td>110</td><br />
<td>34%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>630</td><br />
<td>147</td><br />
<td>23%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>600</td><br />
<td>145</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>815</td><br />
<td>334</td><br />
<td>41%</td><br />
</tr><br />
</table><br />
<br />
==[[CONLL]]==<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>35</td><br />
<td>17</td><br />
<td>48.6%</td><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>23</td><br />
<td>11</td><br />
<td>47.8%</td><br />
</tr><br />
<tr><br />
<td>2005</td><br />
<td>70</td><br />
<td>19</td><br />
<td>27%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>52</td><br />
<td>18</td><br />
<td>35%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>85</td><br />
<td>20</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>59</td><br />
<td>25</td><br />
<td>42%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>81</td><br />
<td>25</td><br />
<td>31%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>78</td><br />
<td>27</td><br />
<td>35%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>100</td><br />
<td>25</td><br />
<td>25%</td><br />
</tr><br />
</table><br />
<br />
==[[EACL]]==<br />
<br />
=== Main Session ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>?</td><br />
<td>?</td><br />
<td>26.5%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>264</td><br />
<td>52</td><br />
<td>20%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>360</td><br />
<td>100</td><br />
<td>28%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>316</td><br />
<td>82</td><br />
<td>25.95%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>317</td><br />
<td>78</td><br />
<td>24.6%</td><br />
</tr><br />
</table><br />
<br />
=== Short Papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>199</td><br />
<td>46</td><br />
<td>23.1%</td><br />
</tr><br />
</table><br />
<br />
=== Demonstration Papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>37</td><br />
<td>21</td><br />
<td>57%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td></td><br />
<td>26</td><br />
<td></td><br />
</tr><br />
</table><br />
<br />
=== Student Session ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>1993</td><br />
<td>34</td><br />
<td>6</td><br />
<td>18%</td><br />
</tr><br />
<tr><br />
<td>1995</td><br />
<td>37</td><br />
<td>8</td><br />
<td>22%</td><br />
</tr><br />
<tr><br />
<td>1997</td><br />
<td>42</td><br />
<td>10</td><br />
<td>24%</td><br />
</tr><tr><br />
<td>1999</td><br />
<td>17</td><br />
<td>8</td><br />
<td>47%</td><br />
</tr><br />
<tr> <br />
<td>2003</td><br />
<td>18</td><br />
<td>6</td><br />
<td>33.3%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>33</td><br />
<td>9</td><br />
<td>27%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>38</td><br />
<td>11</td><br />
<td>29%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>38</td><br />
<td>10</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>42</td><br />
<td>13</td><br />
<td>30%</td><br />
</tr><br />
</table><br />
<br />
==[[EMNLP]]==<br />
<br />
=== Main session ===<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>1997</td><br />
<td>??</td><br />
<td>??</td><br />
<td>35%</td><br />
</tr><br />
<tr><br />
<td>2002</td><br />
<td>142</td><br />
<td>35</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>???</td><br />
<td>28</td><br />
<td>??%</td><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>247</td><br />
<td>58</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2005</td><br />
<td>402</td><br />
<td>127</td><br />
<td>32%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>234</td><br />
<td>73</td><br />
<td>31%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td>398</td><br />
<td>109</td><br />
<td>27%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>385</td><br />
<td>116</td><br />
<td>30%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>475</td><br />
<td>163</td><br />
<td>34%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>500</td><br />
<td>125</td><br />
<td>25%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>628</td><br />
<td>149</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>570</td><br />
<td>139</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>772</td><br />
<td>205</td><br />
<td>27%</td><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>510</td><br />
<td>155</td><br />
<td>30%</td><br />
</tr><br />
<tr><br />
<td>2015</td><br />
<td>1315</td><br />
<td>312</td><br />
<td>24%</td><br />
</tr><br />
</table><br />
<br />
=== Short papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2014</td><br />
<td>252</td><br />
<td>70</td><br />
<td>28%</td><br />
</tr><br />
</table><br />
<br />
==[[NAACL HLT]]==<br />
<br />
=== Main Session - long papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr> <br />
<td>2000</td><br />
<td>166</td><br />
<td>43</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2001</td><br />
<td>110</td><br />
<td>31</td><br />
<td>28%</td><br />
</tr><br />
<tr><br />
<td>2002</td><br />
<td>141</td><br />
<td>28</td><br />
<td>20%</td><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>162</td><br />
<td>37</td><br />
<td>23%</td><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>168</td><br />
<td>43</td><br />
<td>26%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>257</td><br />
<td>62</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td>298</td><br />
<td>72</td><br />
<td>24%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>260</td><br />
<td>75</td><br />
<td>29%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>291</td><br />
<td>90</td><br />
<td>31%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>196</td><br />
<td>61</td><br />
<td>31% </td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>293</td><br />
<td>88</td><br />
<td>30%</td><br />
</tr><br />
<br />
</table><br />
<br />
=== Short papers / late-breaking results ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2003</td><br />
<td>80</td><br />
<td>39</td><br />
<td>49%</td><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>84</td><br />
<td>40</td><br />
<td>48%</td><br />
</tr><br />
<tr><br />
<td>2006</td><br />
<td>127</td><br />
<td>52</td><br />
<td>41%</td><br />
</tr><br />
<tr><br />
<td>2007</td><br />
<td>150 </td><br />
<td>55</td><br />
<td>37%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>178 </td><br />
<td>71</td><br />
<td>40%</td><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>159</td><br />
<td>56</td><br />
<td>35%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>105</td><br />
<td>36</td><br />
<td>34%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>162</td><br />
<td>51</td><br />
<td>37%</td><br />
</tr><br />
<br />
</table><br />
<br />
=== Student Session ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>29</td><br />
<td>17</td><br />
<td>59%</td><br />
</tr><br />
</table><br />
<br />
==[[IJCNLP]]==<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2004</td><br />
<td>211</td><br />
<td>66</td><br />
<td>31%</td><br />
</tr><br />
<tr> <br />
<td>2005</td><br />
<td>289</td><br />
<td>90</td><br />
<td>31%</td><br />
</tr><br />
<tr><br />
<td>2008</td><br />
<td>270</td><br />
<td>75</td><br />
<td>28%</td><br />
</tr><br />
<tr><br />
<td>2009</td><br />
<td>569</td><br />
<td>121</td><br />
<td>21%</td><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>478</td><br />
<td>176</td><br />
<td>36%</td><br />
</tr><br />
</table><br />
<br />
==[[LREC]]==<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2010</td><br />
<td>930</td><br />
<td>662</td><br />
<td>71%</td><br />
</tr><br />
<tr><br />
<td>2012</td><br />
<td>1013</td><br />
<td>697</td><br />
<td>69%</td><br />
</tr><br />
</table><br />
<br />
==[[IWCS]]==<br />
<br />
=== Long Papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>72</td><br />
<td>30</td><br />
<td>42%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>60</td><br />
<td>25</td><br />
<td>42%</td><br />
</tr><br />
</table><br />
<br />
=== Short Papers ===<br />
<br />
<table cellspacing="1" cellpadding="1" border="1" width="20%"><br />
<tr><br />
<th>Year</th><br />
<th>Submitted</th><br />
<th>Accepted</th><br />
<th>Rate</th><br />
</tr><br />
<tr><br />
<td>2011</td><br />
<td>38</td><br />
<td>20</td><br />
<td>53%</td><br />
</tr><br />
<tr><br />
<td>2013</td><br />
<td>25</td><br />
<td>16</td><br />
<td>64%</td><br />
</tr><br />
</table><br />
<br />
<br />
[[Category:Conferences]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_English&diff=11083
Resources for English
2015-06-17T15:45:07Z
<p>Jonsafari: fix link</p>
<hr />
<div>For other languages, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
* [[Corpora for English|Corpora]]<br />
* [[Dictionaries (English)|Dictionaries]]<br />
* [[Generation grammars]]<br />
* [[Geographical words (English)|Geographical words]]<br />
* [[Knowledge collections and datasets (English)|Knowledge collections and datasets]]<br />
* [[Lexicons (English)|Lexicons]]<br />
* [[Subject specific resources (English)|Subject specific resources]]<br />
* [[Tools and Software for English|Tools and Software]]<br />
* [[Uncategorized resources]] - ''please help in categorizing''<br />
<br />
==Other resource lists==<br />
* [[Lists of resources|Other lists of resources]]<br />
<br />
==Additional information==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [[Anthology Statistics]]<br />
* [[Bibliographies]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
* [[Conferences]]<br />
* [[Courses]]<br />
* [[Journals]]<br />
* [[Newsgroups, mailing lists|Newsgroups and mailing lists]]<br />
* [[Papers]]<br />
<br />
[[Category:Resources by language|English]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Talk:Corpora_(English)&diff=11082
Talk:Corpora (English)
2015-06-17T15:44:26Z
<p>Jonsafari: Jonsafari moved page Talk:Corpora (English) to Talk:Corpora for English: align with other related articles</p>
<hr />
<div>#REDIRECT [[Talk:Corpora for English]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Talk:Corpora_for_English&diff=11081
Talk:Corpora for English
2015-06-17T15:44:26Z
<p>Jonsafari: Jonsafari moved page Talk:Corpora (English) to Talk:Corpora for English: align with other related articles</p>
<hr />
<div>== Organization ==<br />
<br />
This page should be organized in several ways/levels.<br />
1st level of division should be on monolingual and multilingual corpora<br />
2nd level of division should be either by language families or by individual languages, preferably alphabetically sorted.<br />
Any comments/suggestions/opinions are more than welcome.<br />
<br />
::That sounds good to me. Please go ahead. By the way, you can sign your "talk" messages by pressing the button in edit mode that looks like a signature. When you save your edit, your user name and the date will be inserted. --[[User:Pdturney|Pdturney]] 07:24, 8 November 2006 (EST)<br />
<br />
:::Might be a good idea to have two pages? One page for [[List of corpora by language]] (giving just a flat list per language) and another [[List of corpora]] (which will have a breakdown of types [monolingual/bilingual/multilingual/aligned etc.]). - [[User:Francis Tyers|Francis Tyers]] 20:34, 10 November 2006 (EST)</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_(English)&diff=11080
Corpora (English)
2015-06-17T15:44:26Z
<p>Jonsafari: Jonsafari moved page Corpora (English) to Corpora for English: align with other related articles</p>
<hr />
<div>#REDIRECT [[Corpora for English]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=11079
Corpora for English
2015-06-17T15:44:26Z
<p>Jonsafari: Jonsafari moved page Corpora (English) to Corpora for English: align with other related articles</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
===Free and Downloadable===<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[https://www.gutenberg.org Project Gutenberg]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Proprietary or Require Prior Permission===<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://www.athel.com/cpsa.html Corpus of Spoken Professional English]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
*[http://mpqa.cs.pitt.edu Multi-Perspective Question Answering (MPQA)]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
<br />
<br />
<!-- Dead links<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
--><br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=11078
Corpora for English
2015-06-17T15:43:39Z
<p>Jonsafari: more cleanup</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
===Free and Downloadable===<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[https://www.gutenberg.org Project Gutenberg]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Proprietary or Require Prior Permission===<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://www.athel.com/cpsa.html Corpus of Spoken Professional English]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
*[http://mpqa.cs.pitt.edu Multi-Perspective Question Answering (MPQA)]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
<br />
<br />
<!-- Dead links<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
--><br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=11077
Corpora for English
2015-06-17T15:21:28Z
<p>Jonsafari: start work on cleaning up this mess</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
===Free and Downloadable===<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[https://www.gutenberg.org Project Gutenberg]<br />
*[http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://www.cs.pitt.edu/mpqa/ Multi-Perspective Question Answering (MPQA)]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm The Dialogue Diversity Corpus]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Proprietary===<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum], Gigaword English web corpus<br />
*[http://ucts.uniba.sk/aranea_about/ Araneum Anglicum Asiaticum], Gigaword Asian English web corpus<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://www.athel.com/cpsa.html Corpus of Spoken Professional English]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
<br />
<br />
<br />
<!-- Dead links<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
<br />
--><br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_English&diff=11076
Resources for English
2015-06-17T14:59:25Z
<p>Jonsafari: rephrase</p>
<hr />
<div>For other languages, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
* [[Corpora (English)|Corpora]]<br />
* [[Dictionaries (English)|Dictionaries]]<br />
* [[Generation grammars]]<br />
* [[Geographical words (English)|Geographical words]]<br />
* [[Knowledge collections and datasets (English)|Knowledge collections and datasets]]<br />
* [[Lexicons (English)|Lexicons]]<br />
* [[Subject specific resources (English)|Subject specific resources]]<br />
* [[Tools and Software for English|Tools and Software]]<br />
* [[Uncategorized resources]] - ''please help in categorizing''<br />
<br />
==Other resource lists==<br />
* [[Lists of resources|Other lists of resources]]<br />
<br />
==Additional information==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [[Anthology Statistics]]<br />
* [[Bibliographies]]<br />
* [[Blogs]]<br />
* [[Books]]<br />
* [[Conferences]]<br />
* [[Courses]]<br />
* [[Journals]]<br />
* [[Newsgroups, mailing lists|Newsgroups and mailing lists]]<br />
* [[Papers]]<br />
<br />
[[Category:Resources by language|English]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_French&diff=11075
Resources for French
2015-06-17T14:57:24Z
<p>Jonsafari: /* Corpora */ update WMT link</p>
<hr />
<div>==Corpora==<br />
* [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus]<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Francogallicum], Gigaword French web corpus<br />
* [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]<br />
* [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC]<br />
* [http://www.up.univ-mrs.fr/veronis/donnees/index.html French Stopword List]<br />
* [http://www.cnrtl.fr/lexiques/morphalou/ Lexique Morphalou]<br />
* [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction]<br />
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
* [http://88milsms.huma-num.fr/ Large SMS corpus in French (88milSMS)]<br />
<br />
== Grammars/parsers ==<br />
===Free software===<br />
* [http://led.loria.fr/en_outils.php#114 HPSG FroG] (under the LGPLLR according to [http://2009.rmll.info/IMG/pdf/RMLL2009-Sciences-Sebastien_Paumier-LGPLLR.pdf this presentation])<br />
* [http://alpage.inria.fr/~sagot/wolf.html WOLF] – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)<br />
* [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina]<br />
* [http://sites.google.com/site/morfetteweb/ Morfette] data driven PoS tagger and lemmatizer, New BSD License<br />
* [http://wiki.apertium.org/wiki/Main_Page Apertium] has analysers/generators in the [[lttoolbox]] format for French, along with statistical disambiguation models, see e.g. the files in [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-ca fr-ca], [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-es fr-es] and [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr br-fr]<br />
<br />
===Unknown licence===<br />
* [[Generation grammars|KPML generation grammar]]<br />
* [http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html Treetagger] has some French support (gratis for research)<br />
* [https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz MeLT], data driven pos tagger<br />
<br />
==Morphology, dictionaries==<br />
===Free software===<br />
* [http://www.dicollecte.org/ Dicollecte] LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL<br />
* [http://www.univ-nancy2.fr/pers/namer/Telecharger_Flemm.html Flemmv3.1] - inflectional morphology parser for French -- perl, GPL license.<br />
<br />
[[Category:Resources by language|French]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_German&diff=11074
Resources for German
2015-06-17T14:57:02Z
<p>Jonsafari: /* Free license */ update WMT link</p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Germanicum], Gigaword German web corpus<br />
* [http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]<br />
* [http://corpora.ids-mannheim.de/~cosmas/ COSMAS II]<br />
* [http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]<br />
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC]<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus]<br />
* [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml Tübingen Treebank of Written German (TüBa-D/Z)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebads.shtml Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuepp.shtml Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)]<br />
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
<br />
==Evaluation datasets==<br />
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation]<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
<br />
== Morphological analysis ==<br />
=== Free software ===<br />
* [https://code.google.com/p/morphisto/ Morphisto], based on [[SMOR]], is an [[SFST]]-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)<br />
* [http://www.danielnaber.de/morphologie/index_en.html German morphology data], based on [http://www.wolfganglezius.de/doku.php?id=cl:morphy Morhpy], licensed under CC-BY-SA 3.0<br />
<br />
==Lexicons==<br />
===Free software===<br />
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).<br />
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL)<br />
<br />
===Proprietary/gratis===<br />
* [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.")<br />
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars<br />
<br />
===Unknown license===<br />
* [http://www.ims.uni-stuttgart.de/projekte/IMSLex/ IMSLex German Lexicon] (no license information, but only "sample" download)<br />
* [http://www.cl.uzh.ch/CL/siclemat/sprachanalyse/molif/ mOlif morphological analyzer] (broken link)<br />
<br />
==Resource Access==<br />
* [http://wortschatz.uni-leipzig.de/Webservices/ Web service access to German language statistics]<br />
<br />
==Timeline Analysis==<br />
* [http://wortschatz.uni-leipzig.de/wort-des-tages/ German Words of the Day]<br />
* [http://www.sfs.uni-tuebingen.de/~lothar/nw/ Wortwarte (selection of German neologisms for each day) ]<br />
<br />
[[Category:Resources by language|German]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Czech&diff=11073
Resources for Czech
2015-06-17T14:56:41Z
<p>Jonsafari: update WMT link</p>
<hr />
<div>==Corpora==<br />
* [http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus]<br />
* [http://ufal.mff.cuni.cz/pdt2.0/ Prague Dependency Treebank]<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
<br />
<br />
<br />
[[Category:Resources by language|Czech]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Russian&diff=11072
Resources for Russian
2015-06-17T14:55:48Z
<p>Jonsafari: /* Free open source */ update WMT link</p>
<hr />
<div>==Corpora==<br />
===Free open source===<br />
* [http://www.euromatrixplus.net/multi-un/ MultiUN] "A Multilingual corpus from United Nation Documents", the Russian portion is 876 MB, the other languages in the multilingual corpus are: English/French/Spanish/Arabic/Chinese/German<br />
* [http://www.statmt.org/wmt15/translation-task.html#download WMT corpora], including the Yandex 1M corpus, News Commentary, and News Crawl<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Russicum], Gigaword Russian web corpus<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://www.helsinki.fi/venaja/english/e-material/hanco/index.htm HANCO: The Helsinki annotated corpus of Russian texts] (searchable, no visible download links)<br />
* [http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora (uni-tuebingen.de)] (searchable, no visible download links)<br />
* [http://corpus.leeds.ac.uk/ruscorpora.html Russian Internet Corpus]<br />
* [http://www.ruscorpora.ru/ Russian National Corpus] <br />
* [http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus]<br />
* [http://lib.ru/ Various texts in Russian (lib.ru)]<br />
<br />
== POS taggers ==<br />
<br />
* [http://www.aot.ru/ AOT, morphological analyser]<br />
* [http://corpus.leeds.ac.uk/mocky/ Mocky, statistical taggers and lemmatiser]<br />
* [http://company.yandex.ru/technology/mystem/ Mystem, morphological analyser]<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
* [http://abisource.com/projects/link-grammar/ Link Grammar Parser], includes Russian dictionaries.<br />
<br />
==Various resources==<br />
* [http://rykov-cl.narod.ru/r.html Russian Corpora (rykov-cl.narod.ru)]<br />
* [http://corpus.leeds.ac.uk/serge/frqlist/ Russian frequency lists]<br />
* [http://www.philol.msu.ru/rus/galya-1 Russian Phonetics on the Web]<br />
* [http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources]<br />
<br />
<br />
[[Category:Resources by language|Russian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Hungarian&diff=11071
Resources for Hungarian
2015-06-17T14:53:01Z
<p>Jonsafari: Split corpus section</p>
<hr />
<div>==Corpora==<br />
===Free===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://mokk.bme.hu/resources/webcorpus/ Hungarian Webcorpus] - 590 million tokens<br />
<br />
===Non-Free===<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Hungaricum], Gigaword Hungarian web corpus<br />
* Hunglish parallel corpus ([http://mokk.bme.hu/resources/hunglishcorpus download], [http://hunglish.hu/search search])<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
<br />
<br />
== Tools ==<br />
* [http://code.google.com/p/hunpos/ hunpos] - open-source POS-tagger<br />
* [http://mokk.bme.hu/resources/hunmorph/ hunmorph] - open-source morphological analyzer<br />
<br />
<br />
<br />
[[Category:Resources by language|Hungarian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Turkish&diff=11070
Resources for Turkish
2015-06-17T14:40:50Z
<p>Jonsafari: /* Corpora */ TS Corpus requires login</p>
<hr />
<div>==Morphological analysis==<br />
<br />
===Free software===<br />
* [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph] "is a relatively complete morphological analyzer for Turkish. It is implemented using [[SFST]], and uses a lexicon based on (but heavily modified) the wordlist of [[Zemberek]] spell checker. The morphological analyzer is distributed under the [[GPL]]."<br />
===Proprietary===<br />
<br />
<br />
==Lexical resources==<br />
<br />
* [http://www.tdk.gov.tr Turkish Language Association] <br />
<br />
==Corpora==<br />
===Free===<br />
<br />
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)<br />
<br />
===Proprietary===<br />
<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
* [http://tscorpus.com/ TS Corpus] (PoSTagged Turkish Corpus. The corpus also presents morphological and lemma tags of the data. Consists of 491 Million tokens)<br />
* [http://www.ii.metu.edu.tr/~corpus/treebank.html METU-Sabanci Turkish treebank]<br />
* [http://corpora.informatik.uni-leipzig.de/ Turkish plain text and Co-occurrences at LCC]<br />
<br />
==Bibliography==<br />
* K. Oflazer, "Two-level Description of Turkish Morphology," Literary and Linguistic Computing, vol. 9, pp. 137-148, 1995. [http://citeseer.ist.psu.edu/rd/28364199%2C121124%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/1850/ftp:zSzzSzftp.cs.bilkent.edu.trzSzpubzSztech-reportszSz1993zSzBU-CEIS-9304.pdf/oflazer93twolevel.pdf Backwards PDF]<br />
<br />
==See also==<br />
<br />
<br />
==External links==<br />
* [http://www.hlst.sabanciuniv.edu Sabancı University Natural Language Processing Tools (Turkish Morphological Analyzer, BalkaNET)]<br />
* [http://ddi.ce.itu.edu.tr Istanbul Technical University Natural Language Processing Research Group]<br />
* [http://nooj4nlp.net/pages/turkish.html NooJ_TR by Mersin University Turkish National Corpus Project Team]<br />
<br />
[[Category:Resources by language|Turkish]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Finnish&diff=11069
Resources for Finnish
2015-06-17T14:32:32Z
<p>Jonsafari: distinguish free vs. non-free corpora; +corpus link; etc.</p>
<hr />
<div>==Corpora==<br />
===Free===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://www.statmt.org/wmt15/translation-task.html WMT News Crawl] monolingual corpus. Currently 14M tokens.<br />
* [http://corpora.informatik.uni-leipzig.de/ Finnish plain text and Co-occurrences at LCC]<br />
* [http://ufal.mff.cuni.cz/hamledt HamleDT], harmonized dependency treebanks of many languages, common annotation style.<br />
<br />
===Non-Free===<br />
* [http://ucts.uniba.sk/aranea_about/ Araneum Finnicum], Gigaword Finnish web corpus<br />
* [http://www.kielipankki.fi CSC Kielipankki] Language Bank at the [http://www.csc.fi/ CSC] Scientific Computing Centre, including some 200 million word tokens of Finnish texts.<br />
<br />
==Morphological analysers==<br />
===Free software===<br />
* [https://gna.org/projects/omorfi/ Omorfi] is an Open Morphology for Finnish, in association with the [[voikko]] speller project, see also https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OmorfiHFSTVersion for installing with [[HFST]]. (LGPL/GPL)<br />
<br />
<br />
[[Category:Resources by language|Finnish]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=POS_Induction_(State_of_the_art)&diff=10578
POS Induction (State of the art)
2014-03-08T00:40:10Z
<p>Jonsafari: /* Software */ fix mkcls link</p>
<hr />
<div>==Evaluation==<br />
<br />
'''Many-to-1:''' Map every induced label to a gold standard tag greedily (45 labels to 45 tags of the Penn tag set). Use the mapping to compute tag accuracy on the Wall Street Journal portion of the Penn TreeBank. <br />
<br />
<br />
==Results==<br />
'''Listed in order of decreasing accuracy'''<br />
<br />
<br />
{| border="1" cellpadding="5" cellspacing="1" width="100%"<br />
|-<br />
! System name<br />
! Short description<br />
! Main publications<br />
! Software<br />
! Many-to-1<br />
|-<br />
| UPOS<br />
| Learning Syntactic Categories Using Paradigmatic Representations of Word Context<br />
| Yatbaz et al. (2012)<br />
| [https://github.com/ai-ku/upos/tree/emnlp2012 upos]<br />
| 80.2%<br />
|-<br />
| Brown+proto<br />
| MRF initialized with Brown prototypes<br />
| Christodoulopoulos, Goldwater and Steedman (2010)<br />
| <br />
| 76.1%<br />
|-<br />
| <br />
| Logistic regression with features and LBFGS<br />
| Berg-Kirkpatrick et al. (2010)<br />
| <br />
| 75.5%<br />
|-<br />
| Clark DMF<br />
| Distributional clustering + morphology + frequency<br />
| Clark (2003)<br />
| [http://www.cs.rhul.ac.uk/home/alexc/pos2.tar.gz alexc]<br />
| 71.2%*<br />
|-<br />
|}<br />
<br />
<nowiki>*</nowiki> according to Christodoulopoulos, Goldwater and Steedman (2010)<br />
<br />
<br />
== References ==<br />
'''Listed alphabetically.'''<br />
<br />
* Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. [http://www.aclweb.org/anthology/N/N10/N10-1083.pdf Painless Unsupervised Learning with Features]. NAACL 2010.<br />
* Christodoulopoulos, Christos, Sharon Goldwater and Mark Steedman. 2010. [http://www.aclweb.org/anthology/D/D10/D10-1056.pdf Two Decades of Unsupervised POS induction: How far have we come?] In Proceedings of EMNLP 2010.<br />
* Clark, Alexander. 2003. [http://www.aclweb.org/anthology/E/E03/E03-1009.pdf Combining distributional and morphological information for part of speech induction]. In Proceedings of EACL 2003, pages 59–66, Morristown, NJ, USA.<br />
* Yatbaz, Mehmet Ali, Enis Sert and Deniz Yuret. 2012. [http://aclweb.org/anthology//D/D12/D12-1086.pdf Learning Syntactic Categories Using Paradigmatic Representations of Word Context]. In Proceedings of EMNLP 2012, pages 940–951.<br />
<br />
== Software ==<br />
* [http://www.cs.rhul.ac.uk/home/alexc/pos2.tar.gz alexc]<br />
* [https://github.com/percyliang/brown-cluster brown-cluster]<br />
* [https://code.google.com/p/giza-pp/ mkcls]<br />
* [http://wortschatz.uni-leipzig.de/~cbiemann/software/unsupos.html unsupos]<br />
* [https://github.com/ai-ku/upos upos]<br />
<br />
== See also ==<br />
* [[POS Tagging (State of the art)]]<br />
* [[Part-of-speech tagging]]<br />
* [[State of the art]]<br />
<br />
<br />
[[Category:State of the art]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Best_paper_awards&diff=10577
Best paper awards
2014-03-07T23:39:27Z
<p>Jonsafari: start wikification</p>
<hr />
<div>=ACL=<br />
<br />
A few items are still missing. Please help complete this table and link papers to PDFs in the [http://aclweb.org/anthology/ ACL anthology].<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|'''Year'''<br />
|'''Author'''<br />
|'''Paper Title'''<br />
|-<br />
|2001<br />
|Eugene Charniak<br />
|Immediate-head parsing for language modeling <br />
|-<br />
|2001<br />
|Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada<br />
|Fast Decoding and Optimal Decoding for Machine Translation<br />
|-<br />
|2002<br />
|Franz Och and Hermann Ney<br />
|Discriminative Traing and Maximum Entropy Models for Statistical Machine Translation<br />
|-<br />
|2003<br />
|Dan Klein and Chris Manning<br />
|Accurate Unlexicalized Parsing<br />
|-<br />
|2003<br />
|Yukiko Nakano, Gabe Reinstein, Tom Stocky, and Justine Cassell<br />
|Towards a Model of Face-to-Face Grounding<br />
|-<br />
|2004<br />
|Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll<br />
|Finding Predominant Word Senses in Untagged Text<br />
|-<br />
|2005<br />
|David Chiang<br />
|A hierarchical phrase-based model for statistical machine translation<br />
|-<br />
|2006<br />
|Rion Snow, Dan Jurafsky, and Andrew Y. Ng<br />
|Semantic taxonomy induction from heterogenous evidence<br />
|-<br />
|2007<br />
|Y. W. Wong and R. J. Mooney<br />
|Learning synchronous grammars for semantic parsing with lambda calculus<br />
|-<br />
|2008<br />
|Liang Huang<br />
|Forest Reranking: Discriminative Parsing with Non-Local Features<br />
|-<br />
|2008<br />
|Libin Shen, Jinxi Xu and Ralph Weischedel<br />
|A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model<br />
|-<br />
|2009<br />
|Andre Martins, Noah Smith and Eric Xing<br />
|Concise Integer Linear Programming Formulations for Dependency Parsing<br />
|-<br />
|2009<br />
|S.R.K. Branavan, Harr Chen, Luke Zettlemoyer and Regina Barzilay<br />
|Reinforcement Learning for Mapping Instructions to Actions<br />
|-<br />
|2009<br />
|Adam Pauls and Dan Klein<br />
|K-Best A* Parsing <br />
|-<br />
|2010 (Long)<br />
|Matthew Gerber and Joyce Chai<br />
|Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates<br />
|-<br />
|2010 (Short)<br />
|Michael Lamar, Yariv Maron, Mark Johnson and Elie Bienenstock<br />
|SVD and Clustering for Unsupervised POS Tagging<br />
|-<br />
|2010 (Student)<br />
|David Elson, Nicholas Dames and Kathleen McKeown<br />
|Extracting Social Networks from Literary Fiction<br />
|-<br />
|2011<br />
|Dipanjan Das and Slav Petrov<br />
|Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections<br />
|-<br />
|2012 (Long)<br />
|Hiroyuki Shindo, Yusuke Miyao, Akinori Fujino and Masaaki Nagata<br />
|[http://aclweb.org/anthology/P/P12/P12-1046.pdf Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing]<br />
|-<br />
|2012 (Student)<br />
|Fan Bu, Hang Li and Xiaoyan Zhu<br />
|[http://aclweb.org/anthology/P/P12/P12-1047.pdf String Re-writing Kernel]<br />
|-<br />
|2013<br />
|Haonan Yu and Jeffrey Mark Siskind<br />
|[http://aclweb.org/anthology/P/P13/P13-1006.pdf Grounded Language Learning from Video Described with Sentences]<br />
|-<br />
|}<br />
<br />
=NAACL=<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|'''Year'''<br />
|'''Author'''<br />
|'''Paper Title'''<br />
|-<br />
|2004<br />
|Regina Barzilay, MIT, and Lillian Lee, Cornell<br />
|Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization<br />
|-<br />
|2006<br />
|Mehryar Mohri and Brian Roark<br />
|Probabilistic Context-Free Grammar Induction Based on Structural Zeros<br />
|-<br />
|2006<br />
|Aria Haghighi and Dan Klein<br />
|Prototype-Driven Learning for Sequence Models<br />
|-<br />
|2007<br />
|Antti-Veikko Rosti, Bing Xiang, Spyros Matsoukas, Richard Schwartz, Necip Fazil Ayan and Bonnie Dorr<br />
|Combining Outputs from Multiple Machine Translation Systems<br />
|-<br />
|2009<br />
|Hoifung Poon, Colin Cherry and Kristina Toutanova<br />
|Unsupervised Morphological Segmentation with Log-Linear Models<br />
|-<br />
|2009<br />
|David Chiang, Kevin Knight and Wei Wang<br />
|11,001 New Features for Statistical Machine Translation<br />
|-<br />
|2010 (long)<br />
|Aria Haghighi and Dan Klein<br />
|Coreference Resolution in a Modular, Entity-Centered Model<br />
|-<br />
|2010 (short)<br />
|Jennifer Foster<br />
|“cba to check the spelling”: Investigating Parser Performance on Discussion Forum Posts<br />
|-<br />
|2012 (full)<br />
|Alexander Rush and Slav Petrov<br />
|Vine Pruning for Efficient Multi-Pass Dependency Parsing<br />
|-<br />
|2012 (short)<br />
|Jacob Devlin and Spyros Matsoukas<br />
|Trait-Based Hypothesis Selection for Machine Translation<br />
|-<br />
|2012 (student)<br />
|Oscar Taeckstroem, Ryan McDonald and Jakob Uszkoreit<br />
|Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure<br />
|-<br />
|2013 (full)<br />
|no award given<br />
|<br />
|-<br />
|2013 (short)<br />
|Marta Recasens, Marie-Catherine de Marneffe and Christopher Potts<br />
|[http://aclweb.org/anthology/N/N13/N13-1071.pdf The Life and Death of Discourse Entities: Identifying Singleton Mentions]<br />
|-<br />
|2013 (student)<br />
|Bradley Hauer and Greg Kondrak<br />
|[http://aclweb.org/anthology/N/N13/N13-1072.pdf Automatic Generation of English Respellings]<br />
|-<br />
|}<br />
<br />
=EMNLP=<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|'''Year'''<br />
|'''Author'''<br />
|'''Paper Title'''<br />
|-<br />
|2002<br />
|Michael Collins<br />
|Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms<br />
|-<br />
|2002<br />
|Frank Keller, Maria Lapata, and Olga Ourioupina<br />
|Using the Web to Overcome Data Sparseness<br />
|-<br />
|2003<br />
|Peng Xu, Ahmad Emami and Frederick Jelinek<br />
|Training Connectionist Models for the Structured Language Model<br />
|-<br />
|2004<br />
|Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning<br />
|Max-Margin Parsing<br />
|-<br />
|2005 (best student paper)<br />
|Ryan McDonald, Fernando Pereira, Kiril Ribarov and Jan Hajic<br />
|Non-Projective Dependency Parsing using Spanning Tree Algorithms<br />
|-<br />
|2006<br />
|no award given<br />
|<br />
|-<br />
|2007<br />
|James Clarke and Maria Lapata<br />
|Modelling Compression with Discourse Constraints<br />
|-<br />
|2008<br />
|no award given<br />
|<br />
|-<br />
|2009<br />
|Hoifung Poon and Pedro Domingos<br />
|Unsupervised semantic parsing<br />
|-<br />
|2010<br />
|Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, and David Sontag<br />
|Dual Decomposition for Parsing with Non-Projective Head Automata<br />
|-<br />
|2011<br />
|Wei Lu and Hwee Tou Ng<br />
|A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions<br />
|-<br />
|2012<br />
|Annie Louis and Ani Nenkova<br />
|[http://aclweb.org/anthology/D/D12/D12-1106.pdf A Coherence Model Based on Syntactic Patterns]<br />
|-<br />
|2013<br />
|Valentin Spitkovsky, Hiyan Alshawi and Daniel Jurafsky<br />
|[http://aclweb.org/anthology/D/D13/D13-1204.pdf Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction]<br />
|-<br />
|}<br />
<br />
<br />
=IJCNLP=<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
|'''Year'''<br />
|'''Author'''<br />
|'''Paper Title'''<br />
|-<br />
|2009<br />
|Andre Martins, Noah Smith and Eric Xing<br />
|Concise Integer Linear Programming Formulations for Dependency Parsing<br />
|-<br />
|2009<br />
|S.R.K. Branavan, Harr Chen, Luke Zettlemoyer and Regina Barzilay<br />
|Reinforcement Learning for Mapping Instructions to Actions<br />
|-<br />
|2009<br />
|Adam Pauls and Dan Klein<br />
|K-Best A* Parsing <br />
|-<br />
|2011<br />
|Caecilia Zirn, Mathias Niepert, Heiner Stuckenschmidt, and Michael Strube<br />
|Fine-Grained Sentiment Analysis with Structural Features<br />
|-<br />
|2011<br />
|Siva Reddy, Ioannis Klapaftis, Diana McCarthy and Suresh Manandhar<br />
|[http://aclweb.org/anthology/I/I11/I11-1079.pdf Dynamic and Static Prototype Vectors for Semantic Composition]<br />
|-<br />
|2013<br />
|Houda Bouamor, Behrang Mohit and Kemal Oflazer<br />
|[http://aclweb.org/anthology/I/I13/I13-1031.pdf SuMT: A Framework of Summarization and MT]<br />
|-<br />
|}<br />
[[Category:Awards]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=POS_Induction_(State_of_the_art)&diff=10576
POS Induction (State of the art)
2014-03-07T23:24:21Z
<p>Jonsafari: +software links</p>
<hr />
<div>==Evaluation==<br />
<br />
'''Many-to-1:''' Map every induced label to a gold standard tag greedily (45 labels to 45 tags of the Penn tag set). Use the mapping to compute tag accuracy on the Wall Street Journal portion of the Penn TreeBank. <br />
<br />
<br />
==Results==<br />
'''Listed in order of decreasing accuracy'''<br />
<br />
<br />
{| border="1" cellpadding="5" cellspacing="1" width="100%"<br />
|-<br />
! System name<br />
! Short description<br />
! Main publications<br />
! Software<br />
! Many-to-1<br />
|-<br />
| UPOS<br />
| Learning Syntactic Categories Using Paradigmatic Representations of Word Context<br />
| Yatbaz et al. (2012)<br />
| [https://github.com/ai-ku/upos/tree/emnlp2012 upos]<br />
| 80.2%<br />
|-<br />
| Brown+proto<br />
| MRF initialized with Brown prototypes<br />
| Christodoulopoulos, Goldwater and Steedman (2010)<br />
| <br />
| 76.1%<br />
|-<br />
| <br />
| Logistic regression with features and LBFGS<br />
| Berg-Kirkpatrick et al. (2010)<br />
| <br />
| 75.5%<br />
|-<br />
| Clark DMF<br />
| Distributional clustering + morphology + frequency<br />
| Clark (2003)<br />
| [http://www.cs.rhul.ac.uk/home/alexc/pos2.tar.gz alexc]<br />
| 71.2%*<br />
|-<br />
|}<br />
<br />
<nowiki>*</nowiki> according to Christodoulopoulos, Goldwater and Steedman (2010)<br />
<br />
<br />
== References ==<br />
'''Listed alphabetically.'''<br />
<br />
* Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. [http://www.aclweb.org/anthology/N/N10/N10-1083.pdf Painless Unsupervised Learning with Features]. NAACL 2010.<br />
* Christodoulopoulos, Christos, Sharon Goldwater and Mark Steedman. 2010. [http://www.aclweb.org/anthology/D/D10/D10-1056.pdf Two Decades of Unsupervised POS induction: How far have we come?] In Proceedings of EMNLP 2010.<br />
* Clark, Alexander. 2003. [http://www.aclweb.org/anthology/E/E03/E03-1009.pdf Combining distributional and morphological information for part of speech induction]. In Proceedings of EACL 2003, pages 59–66, Morristown, NJ, USA.<br />
* Yatbaz, Mehmet Ali, Enis Sert and Deniz Yuret. 2012. [http://aclweb.org/anthology//D/D12/D12-1086.pdf Learning Syntactic Categories Using Paradigmatic Representations of Word Context]. In Proceedings of EMNLP 2012, pages 940–951.<br />
<br />
== Software ==<br />
* [http://www.cs.rhul.ac.uk/home/alexc/pos2.tar.gz alexc]<br />
* [https://github.com/percyliang/brown-cluster brown-cluster]<br />
* [http://www.statmt.org/moses/giza/mkcls.html mkcls]<br />
* [http://wortschatz.uni-leipzig.de/~cbiemann/software/unsupos.html unsupos]<br />
* [https://github.com/ai-ku/upos upos]<br />
<br />
== See also ==<br />
* [[POS Tagging (State of the art)]]<br />
* [[Part-of-speech tagging]]<br />
* [[State of the art]]<br />
<br />
<br />
[[Category:State of the art]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=POS_Induction_(State_of_the_art)&diff=10575
POS Induction (State of the art)
2014-03-07T23:12:14Z
<p>Jonsafari: clean refs</p>
<hr />
<div>==Evaluation==<br />
<br />
'''Many-to-1:''' Map every induced label to a gold standard tag greedily (45 labels to 45 tags of the Penn tag set). Use the mapping to compute tag accuracy on the Wall Street Journal portion of the Penn TreeBank. <br />
<br />
<br />
==Results==<br />
'''Listed in order of decreasing accuracy'''<br />
<br />
<br />
{| border="1" cellpadding="5" cellspacing="1" width="100%"<br />
|-<br />
! System name<br />
! Short description<br />
! Main publications<br />
! Software<br />
! Many-to-1<br />
|-<br />
| UPOS<br />
| Learning Syntactic Categories Using Paradigmatic Representations of Word Context<br />
| Yatbaz et al. (2012)<br />
| [https://github.com/ai-ku/upos/tree/emnlp2012 upos]<br />
| 80.2%<br />
|-<br />
| Brown+proto<br />
| MRF initialized with Brown prototypes<br />
| Christodoulopoulos, Goldwater and Steedman (2010)<br />
| <br />
| 76.1%<br />
|-<br />
| <br />
| Logistic regression with features and LBFGS<br />
| Berg-Kirkpatrick et al. (2010)<br />
| <br />
| 75.5%<br />
|-<br />
| Clark DMF<br />
| Distributional clustering + morphology + frequency<br />
| Clark (2003)<br />
| [http://www.cs.rhul.ac.uk/home/alexc/pos2.tar.gz alexc]<br />
| 71.2%*<br />
|-<br />
|}<br />
<br />
<nowiki>*</nowiki> according to Christodoulopoulos, Goldwater and Steedman (2010)<br />
<br />
<br />
== References ==<br />
'''Listed alphabetically.'''<br />
<br />
* Berg-Kirkpatrick, Taylor, Alexandre Bouchard-Cote, John DeNero, and Dan Klein. 2010. [http://www.aclweb.org/anthology/N/N10/N10-1083.pdf Painless Unsupervised Learning with Features]. NAACL 2010.<br />
* Christodoulopoulos, Christos, Sharon Goldwater and Mark Steedman. 2010. [http://www.aclweb.org/anthology/D/D10/D10-1056.pdf Two Decades of Unsupervised POS induction: How far have we come?] In Proceedings of EMNLP 2010.<br />
* Clark, Alexander. 2003. [http://www.aclweb.org/anthology/E/E03/E03-1009.pdf Combining distributional and morphological information for part of speech induction]. In Proceedings of EACL 2003, pages 59–66, Morristown, NJ, USA.<br />
* Yatbaz, Mehmet Ali, Enis Sert and Deniz Yuret. 2012. [http://aclweb.org/anthology//D/D12/D12-1086.pdf Learning Syntactic Categories Using Paradigmatic Representations of Word Context]. In Proceedings of EMNLP 2012, pages 940–951.<br />
<br />
<br />
== See also ==<br />
* [[POS Tagging (State of the art)]]<br />
* [[Part-of-speech tagging]]<br />
* [[State of the art]]<br />
<br />
<br />
[[Category:State of the art]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Persian&diff=10490
Resources for Persian
2013-12-28T04:03:55Z
<p>Jonsafari: +UPDT. Thanks Mojgan</p>
<hr />
<div>== Corpora ==<br />
===Free===<br />
*[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain)<br />
<br />
===Proprietary===<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://ece.ut.ac.ir/DBRG/Bijankhan/ Bijankhan corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S50 CALLFRIEND Farsi (speech)], LDC<br />
*[http://ece.ut.ac.ir/dbrg/hamshahri/ Hamshahri corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.elda.org/catalogue/en/speech/S0112.html Persian speech database Farsdat], ELRA<br />
<br />
<br />
==Lexical resources==<br />
===Free===<br />
*[http://www.ling.ohio-state.edu/~jonsafari/corpora/wikipedia_fa-en_20120217.txt.xz Persian - English dictionary], derived from Wikipedia article names. Retains Wikipedia's CC-BY-SA 3.0 license.<br />
<br />
===Proprietary===<br />
*[http://pwn.ir Persian WordNet]<br />
<br />
<br />
==Machine translation==<br />
===Free===<br />
<br />
===Proprietary===<br />
*[http://crl.nmsu.edu/Research/Projects/shiraz/index.html The Shiraz project] (Persian -> English)<br />
*[http://ece.ut.ac.ir/NLP/resources.htm Tehran English-Persian Parallel Corpus] by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.<br />
<br />
<br />
==Morphology tools==<br />
===Free===<br />
*[http://sourceforge.net/projects/perstem Perstem] - Persian stemmer, light morphological analyzer, and character set converter.<br />
*[http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tg-fa/apertium-tg-fa.fa.dix Morphological dictionary] &mdash; compiled using [[lttoolbox]].<br />
<br />
<br />
==Parsing==<br />
===Free===<br />
* [http://www.ling.ohio-state.edu/~jonsafari/persianlg/ Persian dictionaries] for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. By [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari]. These require the Perstem stemming package, above.<br />
* [http://stp.lingfil.uu.se/~mojgan/UPDT.html Uppsala Persian Dependency Treebank], Creative Commons Attribution 3.0 License<br />
<br />
===Proprietary===<br />
*[http://dadegan.ir/en/persiandependencytreebank Dadegan Dependency Treebank] for research purposes only.<br />
*[http://hpsg.fu-berlin.de/~ghayoomi/PTB.html HPSG Persian Treebank (PerTreeBank)] for academic research purposes only.<br />
<br />
<br />
<br />
==Bibliography==<br />
* Dehdari, Jon, and Deryle Lonsdale. 2008. [http://www.ling.ohio-state.edu/~jonsafari/papers/dehdari_lonsdale_2005.pdf A link grammar parser for Persian]. In Karimi, S., Samiian, V., and Stilo, D., editors, ''Aspects of Iranian Linguistics'', volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 ([http://www.ling.ohio-state.edu/~jonsafari/bib/dehdarilonsdale2005.bib.txt BIB])<br />
<br />
* Feili, H. and G. Ghassem-Sani (2004) "[http://sharif.edu/~sani/papers/Feili_SaniE2.pdf An Application of Lexicalized Grammars in English-Persian Translation]". ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.<br />
* Megerdoomian, K. (2000) "[http://crl.nmsu.edu/Research/Projects/shiraz/publications/papers/Cicling.pdf Unification-Based Persian Morphology]". ''Proceedings of CICLing 2000'', Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.<br />
* Megerdoomian, K. (2004) "[http://acl.ldc.upenn.edu/coling2004/W5/pdf/W5-7.pdf Finite-State Morphological Analysis of Persian]". ''COLING 2004 Computational Approaches to Arabic Script-based Languages''. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.<br />
* Mohammad Amin Farajian (2011). [http://world-comp.org/p2011/ICA4953.pdf PEN: Parallel English-Persian News Corpus]. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.<br />
<br />
==See also==<br />
*[[Resources for Kurdish]]<br />
*[[Resources for Tajik]]<br />
<br />
==External links==<br />
*[http://www.iranianlinguistics.org/wiki/index.php?title=Persian Iranian Linguistics: NLP Resources for Persian]<br />
*[http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html the Jon safari] (link parser, small lexicon, stemmer, morphological analysis tools)<br />
<br />
<br />
[[Category:Resources by language|Persian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Persian&diff=10489
Resources for Persian
2013-12-28T03:49:36Z
<p>Jonsafari: /* Free */ +WP FA-EN dict</p>
<hr />
<div>== Corpora ==<br />
===Free===<br />
*[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain)<br />
<br />
===Proprietary===<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://ece.ut.ac.ir/DBRG/Bijankhan/ Bijankhan corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S50 CALLFRIEND Farsi (speech)], LDC<br />
*[http://ece.ut.ac.ir/dbrg/hamshahri/ Hamshahri corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.elda.org/catalogue/en/speech/S0112.html Persian speech database Farsdat], ELRA<br />
<br />
<br />
==Lexical resources==<br />
===Free===<br />
*[http://www.ling.ohio-state.edu/~jonsafari/corpora/wikipedia_fa-en_20120217.txt.xz Persian - English dictionary], derived from Wikipedia article names<br />
<br />
===Proprietary===<br />
*[http://pwn.ir Persian WordNet]<br />
<br />
<br />
==Machine translation==<br />
===Free===<br />
<br />
===Proprietary===<br />
*[http://crl.nmsu.edu/Research/Projects/shiraz/index.html The Shiraz project] (Persian -> English)<br />
*[http://ece.ut.ac.ir/NLP/resources.htm Tehran English-Persian Parallel Corpus] by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.<br />
<br />
<br />
==Morphology tools==<br />
===Free===<br />
*[http://sourceforge.net/projects/perstem Perstem] - Persian stemmer, light morphological analyzer, and character set converter.<br />
*[http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tg-fa/apertium-tg-fa.fa.dix Morphological dictionary] &mdash; compiled using [[lttoolbox]].<br />
<br />
<br />
==Parsing==<br />
===Free===<br />
* [http://www.ling.ohio-state.edu/~jonsafari/persianlg/ Persian dictionaries] for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. By [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari]. These require the Perstem stemming package, above. <br />
<br />
===Proprietary===<br />
*[http://dadegan.ir/en/persiandependencytreebank Dadegan Dependency Treebank] for research purposes only.<br />
*[http://hpsg.fu-berlin.de/~ghayoomi/PTB.html HPSG Persian Treebank (PerTreeBank)] for academic research purposes only.<br />
*[http://stp.lingfil.uu.se/~mojgan/persian_dependency_treebank.pdf A soon-to-be-released Persian Dependency Treebank], license not specified yet.<br />
<br />
<br />
==Bibliography==<br />
* Dehdari, Jon, and Deryle Lonsdale. 2008. [http://www.ling.ohio-state.edu/~jonsafari/papers/dehdari_lonsdale_2005.pdf A link grammar parser for Persian]. In Karimi, S., Samiian, V., and Stilo, D., editors, ''Aspects of Iranian Linguistics'', volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 ([http://www.ling.ohio-state.edu/~jonsafari/bib/dehdarilonsdale2005.bib.txt BIB])<br />
<br />
* Feili, H. and G. Ghassem-Sani (2004) "[http://sharif.edu/~sani/papers/Feili_SaniE2.pdf An Application of Lexicalized Grammars in English-Persian Translation]". ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.<br />
* Megerdoomian, K. (2000) "[http://crl.nmsu.edu/Research/Projects/shiraz/publications/papers/Cicling.pdf Unification-Based Persian Morphology]". ''Proceedings of CICLing 2000'', Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.<br />
* Megerdoomian, K. (2004) "[http://acl.ldc.upenn.edu/coling2004/W5/pdf/W5-7.pdf Finite-State Morphological Analysis of Persian]". ''COLING 2004 Computational Approaches to Arabic Script-based Languages''. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.<br />
* Mohammad Amin Farajian (2011). [http://world-comp.org/p2011/ICA4953.pdf PEN: Parallel English-Persian News Corpus]. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.<br />
<br />
==See also==<br />
*[[Resources for Kurdish]]<br />
*[[Resources for Tajik]]<br />
<br />
==External links==<br />
*[http://www.iranianlinguistics.org/wiki/index.php?title=Persian Iranian Linguistics: NLP Resources for Persian]<br />
*[http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html the Jon safari] (link parser, small lexicon, stemmer, morphological analysis tools)<br />
<br />
<br />
[[Category:Resources by language|Persian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Persian&diff=10488
Resources for Persian
2013-12-28T03:47:01Z
<p>Jonsafari: Alphabetize sections; add lexical resources section; add Persian WordNet entry</p>
<hr />
<div>== Corpora ==<br />
===Free===<br />
*[http://www.ling.ohio-state.edu/~jonsafari/corpora VOA Persian Corpus 2003-2008] (public domain)<br />
<br />
===Proprietary===<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://ece.ut.ac.ir/DBRG/Bijankhan/ Bijankhan corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC96S50 CALLFRIEND Farsi (speech)], LDC<br />
*[http://ece.ut.ac.ir/dbrg/hamshahri/ Hamshahri corpus] (gratis for research/non-commercial purposes)<br />
*[http://www.elda.org/catalogue/en/speech/S0112.html Persian speech database Farsdat], ELRA<br />
<br />
<br />
==Lexical resources==<br />
===Free===<br />
<br />
===Proprietary===<br />
*[http://pwn.ir Persian WordNet]<br />
<br />
<br />
==Machine translation==<br />
===Free===<br />
<br />
===Proprietary===<br />
*[http://crl.nmsu.edu/Research/Projects/shiraz/index.html The Shiraz project] (Persian -> English)<br />
*[http://ece.ut.ac.ir/NLP/resources.htm Tehran English-Persian Parallel Corpus] by Mohammad Taher Pilevar, NLP Lab, University of Tehran. For research or non-commercial use.<br />
<br />
<br />
==Morphology tools==<br />
===Free===<br />
*[http://sourceforge.net/projects/perstem Perstem] - Persian stemmer, light morphological analyzer, and character set converter.<br />
*[http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tg-fa/apertium-tg-fa.fa.dix Morphological dictionary] &mdash; compiled using [[lttoolbox]].<br />
<br />
<br />
==Parsing==<br />
===Free===<br />
* [http://www.ling.ohio-state.edu/~jonsafari/persianlg/ Persian dictionaries] for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. By [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari]. These require the Perstem stemming package, above. <br />
<br />
===Proprietary===<br />
*[http://dadegan.ir/en/persiandependencytreebank Dadegan Dependency Treebank] for research purposes only.<br />
*[http://hpsg.fu-berlin.de/~ghayoomi/PTB.html HPSG Persian Treebank (PerTreeBank)] for academic research purposes only.<br />
*[http://stp.lingfil.uu.se/~mojgan/persian_dependency_treebank.pdf A soon-to-be-released Persian Dependency Treebank], license not specified yet.<br />
<br />
<br />
==Bibliography==<br />
* Dehdari, Jon, and Deryle Lonsdale. 2008. [http://www.ling.ohio-state.edu/~jonsafari/papers/dehdari_lonsdale_2005.pdf A link grammar parser for Persian]. In Karimi, S., Samiian, V., and Stilo, D., editors, ''Aspects of Iranian Linguistics'', volume 1. Cambridge Scholars Press. ISBN: 978-18-471-8639-3 ([http://www.ling.ohio-state.edu/~jonsafari/bib/dehdarilonsdale2005.bib.txt BIB])<br />
<br />
* Feili, H. and G. Ghassem-Sani (2004) "[http://sharif.edu/~sani/papers/Feili_SaniE2.pdf An Application of Lexicalized Grammars in English-Persian Translation]". ''Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004)'', 24-27 Aug. 2004, Universidad Politecnica de Valencia, Valencia, Spain, pp. 596-600.<br />
* Megerdoomian, K. (2000) "[http://crl.nmsu.edu/Research/Projects/shiraz/publications/papers/Cicling.pdf Unification-Based Persian Morphology]". ''Proceedings of CICLing 2000'', Alexander Gelbukh, Center of Investigation on Computation-IPN, Mexico, 2000.<br />
* Megerdoomian, K. (2004) "[http://acl.ldc.upenn.edu/coling2004/W5/pdf/W5-7.pdf Finite-State Morphological Analysis of Persian]". ''COLING 2004 Computational Approaches to Arabic Script-based Languages''. Ali Farghaly and Karine Megerdoomian editors, Geneva, Switzerland, 2004, pgs. 35-41.<br />
* Mohammad Amin Farajian (2011). [http://world-comp.org/p2011/ICA4953.pdf PEN: Parallel English-Persian News Corpus]. Proceedings of 2011 International Conference on Artificial Intelligence (ICAI'11), Nevada, USA.<br />
<br />
==See also==<br />
*[[Resources for Kurdish]]<br />
*[[Resources for Tajik]]<br />
<br />
==External links==<br />
*[http://www.iranianlinguistics.org/wiki/index.php?title=Persian Iranian Linguistics: NLP Resources for Persian]<br />
*[http://www.ling.ohio-state.edu/~jonsafari/persian_nlp.html the Jon safari] (link parser, small lexicon, stemmer, morphological analysis tools)<br />
<br />
<br />
[[Category:Resources by language|Persian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Multilingual_Corpora&diff=10484
Multilingual Corpora
2013-12-10T21:50:35Z
<p>Jonsafari: +MultiUN corpora</p>
<hr />
<div>For individual languages, see [[List of resources by language]].<br />
<br />
See also [[Multilingual resources]].<br />
<br />
<br />
<!-- Please keep this list in alphabetical order --><br />
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus]<br />
*[http://spraakbanken.gu.se/ Bank of Swedish]<br />
*[http://sli.uvigo.es/CLUVI/ CLUVI Corpus (Galician-English-Spanish-French parallel corpus)]<br />
*[http://hnk.ffzg.hr/ Croatian National Corpus (HNK)]<br />
*[http://ucnk.ff.cuni.cz/ Czech National Corpus (CNC)]<br />
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information]<br />
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS]<br />
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus]<br />
*[http://www.debian.org/international/ Debian free software community]<br />
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus]<br />
*[http://www.statmt.org/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003]<br />
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet]<br />
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine]<br />
*[http://glossa.fltr.ucl.ac.be/ GlossaNet]<br />
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol]<br />
*[http://corpus.nytud.hu/mnsz/ Hungarian National Corpus]<br />
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus]<br />
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora]<br />
*[http://korpus.pl/ IPI PAN Corpus of Polish]<br />
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet]<br />
*[http://corpora.informatik.uni-leipzig.de Leipzig Corpora Collection: Large monolingual raw corpora for 17+ languages, free downloads]<br />
*[http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus]<br />
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages]<br />
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources]<br />
* [http://www.csse.monash.edu.au/~jwb/tanakacorpus.html Tanaka Corpus: Japanese-English sentence pairs]<br />
*[http://multisemcor.itc.it MultiSemCor]<br />
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet]<br />
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus]<br />
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html Oslo Corpus of Bosnian]<br />
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank]<br />
*[http://www.corpusdoportugues.org/ Portuguese Corpus]<br />
*[http://register.consilium.eu.int/ Public registry of the Council of the EU]<br />
*[http://www.ruscorpora.ru/ Russian National Corpus (RNK)]<br />
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software]<br />
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus]<br />
*[http://www.fida.net/ Slovenian Corpus FIDA] and [http://www.fidaplus.net/ FIDA+]<br />
*[http://www.ling.su.se/DaLi/research/smultron/index.htm SMULTRON Corpus] parallell treebank of English, German and Swedish<br />
*[http://www.corpusdelespanol.org/ Spanish Corpus]<br />
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX]<br />
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif]<br />
*[http://wacky.sslmit.unibo.it/ WaCky Project]<br />
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch]<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
<br />
[[Category:Resources by language|Multilingual]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_French&diff=10483
Resources for French
2013-12-10T21:48:41Z
<p>Jonsafari: /* Corpora */ +MultiUN corpora, 10^9 corpus</p>
<hr />
<div>==Corpora==<br />
* [http://www.statmt.org/wmt10/training-giga-fren.tar 10^9 French-English corpus]<br />
* [http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais]<br />
* [http://corpora.informatik.uni-leipzig.de/ French plain text and Co-occurrences at LCC]<br />
* [http://www.up.univ-mrs.fr/veronis/donnees/index.html French Stopword List]<br />
* [http://www.cnrtl.fr/lexiques/morphalou/ Lexique Morphalou]<br />
* [http://w3.univ-tlse2.fr/erss/verbaction/main.html Lexique Verbaction]<br />
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
== Grammars/parsers ==<br />
===Free software===<br />
* [http://led.loria.fr/en_outils.php#114 HPSG FroG] (under the LGPLLR according to [http://2009.rmll.info/IMG/pdf/RMLL2009-Sciences-Sebastien_Paumier-LGPLLR.pdf this presentation])<br />
* [http://alpage.inria.fr/~sagot/wolf.html WOLF] – Wordnet Libre du Français, distribuée sous licence Cecill-C (compatible LGPL)<br />
* [http://alpage.inria.fr/~sagot/lefff.html Lefff] – (Lexique des Formes Fléchies du Français) est un lexique morphologique et syntaxique à large couverture, distribué sous licence libre LGPL-LR (Lesser General Public License For Linguistic Resources), see also [http://gforge.inria.fr/projects/alexina/ Alexina]<br />
* [http://sites.google.com/site/morfetteweb/ Morfette] data driven PoS tagger and lemmatizer, New BSD License<br />
* [http://wiki.apertium.org/wiki/Main_Page Apertium] has analysers/generators in the [[lttoolbox]] format for French, along with statistical disambiguation models, see e.g. the files in [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-ca fr-ca], [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-fr-es fr-es] and [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-br-fr br-fr]<br />
<br />
===Unknown licence===<br />
* [[Generation grammars|KPML generation grammar]]<br />
* [http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html Treetagger] has some French support (gratis for research)<br />
* [https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz MeLT], data driven pos tagger<br />
<br />
==Morphology, dictionaries==<br />
===Free software===<br />
* [http://www.dicollecte.org/ Dicollecte] LEXIQUE FRANÇAIS, LISTE DES FORMES FLÉCHIES, MPL/GPL/LGPL<br />
* [http://www.univ-nancy2.fr/pers/namer/Telecharger_Flemm.html Flemmv3.1] - inflectional morphology parser for French -- perl, GPL license.<br />
<br />
[[Category:Resources by language|French]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Chinese&diff=10482
Resources for Chinese
2013-12-10T21:46:30Z
<p>Jonsafari: +MultiUN corpora</p>
<hr />
<div>==Tools==<br />
===Free software===<br />
* [https://github.com/yzhang/rseg rseg] word segmentation; written in ruby (no compilation, no hard dependencies apart from ruby), comes with a model (MIT license)<br />
* [https://code.google.com/p/ctbparser/ ctbparser] word segmentation, POS tagging, NER, dependency parsing, all using Conditional Random Fields; written in C++ (LGPL license)<br />
* [http://www.cl.cam.ac.uk/~yz360/zpar.html ZPar] word segmentation, POS tagging, CFG/dep/CCG parsing of Chinese and English; written in C++ (GPL3 license)<br />
* [http://code.google.com/p/duduplus/ DuDuPlus: a graph-based dependency parser for English and Chinese] ("Other Open Source" license?)<br />
** where is the source code?<br />
<br />
==Corpora==<br />
===Free license===<br />
* [http://corpora.heliohost.org/ HC Corpora] 1606811 lines of [http://en.wikipedia.org/wiki/Fair_use Fair Use] excerpts from news, blogs, twitter<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
<br />
===Nonfree or Unknown license===<br />
* [http://www.chinesecomputing.com Chinese Computing] <br />
* [http://www.icl.pku.edu.cn/icl_groups/corpus/dwldform1.asp Word Segmented and POS tagged People Daily Corpus at ICL of Peking University]<br />
* [http://corpus.leeds.ac.uk/frqc/i-zh-char.num.html Frequency list of characters in the Internet corpus]<br />
* [http://corpus.leeds.ac.uk/frqc/internet-zh.num Frequency list of lexical items in the Internet corpus]<br />
* [http://www.ling.lancs.ac.uk/corplang/lcmc/ Lancaster Corpus of Mandarin Chinese]<br />
<br />
<br />
[[Category:Resources by language|Chinese]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_German&diff=10481
Resources for German
2013-12-10T21:44:29Z
<p>Jonsafari: /* Free license */ +MultiUN corpora</p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.computing.dcu.ie/~ygraham/software.html RIA Open Source Rule Induction Tool] includes an LFG-parsed German-English phrase-aligned parallel corpus, a subset of the EuroParl corpus (4000 sentences for each language, the tool at least is LGPL)<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora]<br />
* [http://corpora.ids-mannheim.de/~cosmas/ COSMAS II]<br />
* [http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)]<br />
* [http://www.wortschatz.uni-leipzig.de/ German plain text and Co-occurrences at LCC]<br />
* [http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus]<br />
* [http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/ TIGER treebank]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml Tübingen Treebank of Written German (TüBa-D/Z)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuebads.shtml Tübingen Treebank of Spoken German (TüBa-D/S, aka Verbmobil treebank)]<br />
* [http://www.sfs.uni-tuebingen.de/en_tuepp.shtml Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z)]<br />
* [http://www.coli.uni-saarland.de/~gparis/LMD-TAZ_corpus/ Le Monde Diplomatique-Die Tageszeitung Translation Corpus] - French-German, aligned (parallel)<br />
<br />
==Evaluation datasets==<br />
* [http://www.ukp.tu-darmstadt.de/data/semRelDatasets Semantic relatedness evaluation]<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
<br />
== Morphological analysis ==<br />
=== Free software ===<br />
* [https://code.google.com/p/morphisto/ Morphisto], based on [[SMOR]], is an [[SFST]]-based analyser and generator for German. (The morphology is GPLv2, but the lexicon is proprietary/non-commercial: CC-BY-SA-NC v3)<br />
* [http://www.danielnaber.de/morphologie/index_en.html German morphology data], based on [http://www.wolfganglezius.de/doku.php?id=cl:morphy Morhpy], licensed under CC-BY-SA 3.0<br />
<br />
==Lexicons==<br />
===Free software===<br />
* [http://www-user.tu-chemnitz.de/~fri/ding/ DING] - German-English Dictionary with approximately 253,000 entries (GPL 2 or later).<br />
* [http://www.openthesaurus.de/ OpenThesaurus] - German synonyms and associated terms (LGPL)<br />
<br />
===Proprietary/gratis===<br />
* [http://www.ims.uni-stuttgart.de/tcl/RESOURCES/German-Lexicon-en.html Lexical information for German] ("The data is freely available for education, research and other '''non-commercial''' purposes.")<br />
* [http://www.canoo.net/ Canoo.net] - German Dictionaries and Grammars<br />
<br />
===Unknown license===<br />
* [http://www.ims.uni-stuttgart.de/projekte/IMSLex/ IMSLex German Lexicon] (no license information, but only "sample" download)<br />
* [http://www.cl.uzh.ch/CL/siclemat/sprachanalyse/molif/ mOlif morphological analyzer] (broken link)<br />
<br />
==Resource Access==<br />
* [http://wortschatz.uni-leipzig.de/Webservices/ Web service access to German language statistics]<br />
<br />
==Timeline Analysis==<br />
* [http://wortschatz.uni-leipzig.de/wort-des-tages/ German Words of the Day]<br />
* [http://www.sfs.uni-tuebingen.de/~lothar/nw/ Wortwarte (selection of German neologisms for each day) ]<br />
<br />
[[Category:Resources by language|German]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Arabic&diff=10480
Resources for Arabic
2013-12-10T21:42:06Z
<p>Jonsafari: /* Free/open licence */ +MultiUN</p>
<hr />
<div>==Morphology==<br />
<br />
===Free software===<br />
*[https://sourceforge.net/projects/aramorph/ AraMorph - Perl] - An Arabic morphological analyzer and part-of-speech tagger written in Perl (originally by Tim Buckwalter)<br />
*[http://www.nongnu.org/aramorph/ AraMorph - Java] - An Arabic morphological analyzer and part-of-speech tagger rewritten in Java for [http://lucene.apache.org/ Lucene]<br />
<br />
===Proprietary===<br />
*[http://www.arabic-morphology.com Xerox Arabic Morphological Analyzer and Generator]<br />
<br />
==Parsers==<br />
===Free software===<br />
* [http://www.cis.upenn.edu/~dbikel/software.html#stat-parser Bikel's implementation of Collins Parser] by [http://www.cis.upenn.edu/~dbikel/ Dan Bikel].<br />
* [http://www.ling.ohio-state.edu/~jonsafari/arabiclg/arabiclg.20060829.tar.bz2 Arabic dictionaries], by [http://www.ling.ohio-state.edu/~jonsafari/ Jon Dehdari], for the [http://www.abisource.com/projects/link-grammar/ Link-Grammar parser]. These require the Aramorph stemming package, above. <br />
* [https://sourceforge.net/apps/trac/elixir-fm/wiki ElixirFM] ([http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi online interface here]) is a Functional Arabic Morphology written in Haskell and Perl; the lexicon is a "re-processed" version of the Buckwalter analyser.<br />
* [http://sourceforge.net/projects/sarf Sarf] - Arabic Morphology System (all in Java)<br />
<br />
==Corpora==<br />
===Proprietary===<br />
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1], 76 million tokens, annotation: paragraphs<br />
<br />
===Free/open licence===<br />
* [http://github.com/anastaw/Meedan-Memory Meedan-Memory], Arabic-English TMX (sentence-aligned), ~467,000 words on the English side, [http://www.opendatacommons.org/licenses/odbl/ Open Database Licence]<br />
* [http://quran.uk.net/ Quranic Arabic Corpus], 77,430 words of Quranic Arabic, with manually verified contextual POS, inflection, derivation; [[dependency grammar]] annotation is planned.<br />
* [http://www1.ccls.columbia.edu/~ybenajiba/downloads.html Arabic NER corpora] by [http://www1.ccls.columbia.edu/~ybenajiba/ Yassine Benajiba], 150,000+ words.<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
<br />
==Bibliography==<br />
<br />
==External links==<br />
*[http://www.elsnet.org/acl2001-arabic.html ACL/EACL 2001 Workshop on Arabic NLP]<br />
*[http://www1.cs.columbia.edu/~mdiab/software/ASVMTools_2.0.tar.gz Basic Arabic Processing Tools]<br />
*[http://acl.ldc.upenn.edu/coling2004/W5/index.html COLING 2004 Workshop on computational approaches to Arabic script-based languages]<br />
<br />
<br />
[[Category:Resources by language|Arabic]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=10479
Corpora for English
2013-12-10T21:40:42Z
<p>Jonsafari: +MultiUN corpora</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://devoted.to/corpora Bookmarks for Corpus-based Linguists]<br />
*[http://info.ox.ac.uk/bnc/ British National Corpus (from Oxford University)]<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www.athel.com/corpdes.html Corpus of Spoken Professional English]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://www.cs.pitt.edu/mpqa/ Multi-Perspective Question Answering (MPQA)]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm The Dialogue Diversity Corpus]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Spanish&diff=10478
Resources for Spanish
2013-12-10T21:39:12Z
<p>Jonsafari: +MultiUN corpora</p>
<hr />
<div>==Corpora==<br />
*[http://www.corpusdelespanol.org/ Corpus del Español] (website only)<br />
* [http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Española contemporanea: corpus oral peninsular]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
* [http://www.euromatrixplus.net/multi-un/ UN parallel corpora]<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
<br />
<br />
[[Category:Resources by language|Spanish]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Corpora_for_English&diff=10427
Corpora for English
2013-11-13T03:18:57Z
<p>Jonsafari: +UMBC Webbase Corpus</p>
<hr />
<div>For languages other than English, see [[List of resources by language]].<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus]<br />
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car]<br />
*[http://americannationalcorpus.org/ American National Corpus (ANC)]<br />
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE]<br />
*[http://compbio.uchsc.edu/ccp/corpora/index.shtml Biomedical corpora]<br />
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus]<br />
*[http://devoted.to/corpora Bookmarks for Corpus-based Linguists]<br />
*[http://info.ox.ac.uk/bnc/ British National Corpus (from Oxford University)]<br />
*[http://www.natcorp.ox.ac.uk/ British National Corpus (BNC)]<br />
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)]<br />
*[http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html Brown Corpus]<br />
*[http://boston.lti.cs.cmu.edu/Data/clueweb09/ ClueWeb]<br />
*[http://computing.open.ac.uk/coda/data.html CODA Parallel Annotated Monologue-Dialogue Corpus]<br />
*[http://www.collins.co.uk/books.aspx?group=154 Collins Wordbanks]<br />
*[http://www.cs.cornell.edu/home/llee/data/convote.html Congressional floor-debate transcripts, with support/oppose labels]<br />
*[http://www.athel.com/corpdes.html Corpus of Spoken Professional English]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus]<br />
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia]<br />
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)]<br />
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)]<br />
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus]<br />
*[http://ir.dcs.gla.ac.uk/test_collections/gov2-summary.htm GOV2 Corpus] - 426 gigabytes of text<br />
*[http://gmb.let.rug.nl Groningen Meaning Bank] semantically annotated corpus<br />
*[http://www.gutenberg.org/wiki/Main_Page Gutenberg]<br />
*[http://prize.hutter1.net/ Hutter Prize for Lossless Compression of Human Knowledge 100M sample of Wikipedia]<br />
*[http://nora.hd.uib.no/icame.html ICAME]<br />
*[http://www.cs.fit.edu/~mmahoney/compression/text.html Large Text Compression Benchmark's 1G sample of Wikipedia]<br />
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords]<br />
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data]<br />
*[http://www.cs.pitt.edu/mpqa/ Multi-Perspective Question Answering (MPQA)]<br />
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources]<br />
*[http://www.askoxford.com/oec/mainpage/?view=uk Oxford English Corpus]<br />
*[http://pie.usna.edu/ Phrases in English]<br />
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD]<br />
*[http://www.sketchengine.co.uk/ Sketch Engine]<br />
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus]<br />
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)]<br />
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English]<br />
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm The Dialogue Diversity Corpus]<br />
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation]<br />
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus]<br />
*[http://ebiquity.umbc.edu/resource/html/id/351 UMBC Webbase Corpus]<br />
*[http://www.let.rug.nl/~bos/vpe/ VP Ellipsis corpus]<br />
*[http://wacky.sslmit.unibo.it/ WaCky]<br />
*[http://www.webcorp.org.uk/guide/ WebCorp]<br />
* [http://www.statmt.org/wmt13/translation-task.html#download WMT corpora], including [http://en.wikipedia.org/wiki/Europarl_corpus Europarl], News Commentary, and News Crawl<br />
<br />
<br />
==Link collections==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora]<br />
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources]<br />
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL]<br />
<br />
==Corpora tools==<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words]<br />
*[http://korpus.pl/index.php?page=poliqarp Poliqarp] - open source XML-aware indexer, search engine and concordancer<br />
*[http://www.sketchengine.co.uk/ The Sketch Engine]<br />
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme]<br />
<br />
<br />
[[Category:Corpora|*]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Slovenian&diff=10426
Resources for Slovenian
2013-11-12T05:16:47Z
<p>Jonsafari: +slovenscina.eu corpora</p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus<br />
* [http://langtech.jrc.it/JRC-Acquis.html JRC Acquis] parallel texts. Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. <br />
<br />
===Non-free license===<br />
* [http://eng.slovenscina.eu/korpusi "Communication in Slovene" corpora], includes written, spoken, web, learner's, and tagged corpora, up to 1.2 billion words<br />
* [http://nl.ijs.si/ME/ Multext EAST] lexica, annotated "1984" corpus, parallel and comparable text and speech corpora. Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian<br />
<br />
<br />
<br />
[[Category:Resources by language|Solvenian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Swedish&diff=10343
Resources for Swedish
2013-10-12T17:28:40Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>==Machine translation systems==<br />
<br />
===Free software===<br />
<br />
* [http://apertium.sourceforge.net Apertium] Pre-alpha Swedish<->Danish material is available from CVS. <br />
<br />
===Proprietary===<br />
<br />
<br />
==Lexical resources==<br />
===Free software===<br />
* [http://www.dsso.se/ Den stora svenska ordlistan], 8.8 MB plaintext, License: [[CC-BY-SA]]<br />
<br />
===Proprietary===<br />
<br />
* [http://w3.msi.vxu.se/~nivre/research/Talbanken05.html Talbanken05] (Dependency Treebank, freely available for research purposes)<br />
<br />
==Corpora==<br />
===Free===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://corpora.informatik.uni-leipzig.de/ Swedish plain text and Co-occurrences at LCC]<br />
<br />
===Proprietary===<br />
* [http://www.ling.su.se/staff/sofia/suc/suc.html Stockholm Umeå Corpus] (Tagged Corpus, freely available for research purposes)<br />
<br />
==Bibliography==<br />
<br />
* <br />
<br />
==External links==<br />
<br />
* [http://spraakbanken.gu.se/lb/ The Bank of Swedish - A Linguistic Reference Database of G&ouml;teborg University] <br />
<br />
[[Category:Resources by language|Swedish]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Slovenian&diff=10342
Resources for Slovenian
2013-10-12T17:27:19Z
<p>Jonsafari: /* Corpora */ +Europarl corpus; reorg</p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://nl.ijs.si/elan/ IJS - ELAN] Slovene-English Parallel Corpus<br />
* [http://langtech.jrc.it/JRC-Acquis.html JRC Acquis] parallel texts. Languages involved: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. <br />
<br />
===Non-free license===<br />
* [http://nl.ijs.si/ME/ Multext EAST] lexica, annotated "1984" corpus, parallel and comparable text and speech corpora. Languages involved: Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, and Ukrainian<br />
<br />
<br />
<br />
[[Category:Resources by language|Solvenian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Slovak&diff=10341
Resources for Slovak
2013-10-12T17:22:50Z
<p>Jonsafari: fix section header</p>
<hr />
<div>==Corpora==<br />
===Free license===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
* [http://korpus.juls.savba.sk/ Slovenský národný korpus / Slovak National Corpus]<br />
<br />
==Lexical resources==<br />
===Free software===<br />
* [http://www.sk-spell.sk.cx/mass-msas Malý Anglicko-Slovenský a Slovensko-Anglický Slovník (mass/msas)] is a Slovak-English-Slovak dictionary, available in the StarDict format, under the [[GNU FDL]].<br />
<br />
===Proprietary===<br />
<br />
<br />
<br />
<br />
[[Category:Resources by language|Slovak]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Slovak&diff=10340
Resources for Slovak
2013-10-12T17:22:08Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>==Corpora==<br />
===Free license==<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
<br />
===Unknown license===<br />
<!-- Please keep this list in alphabetical order --><br />
* [http://korpus.juls.savba.sk/ Slovenský národný korpus / Slovak National Corpus]<br />
<br />
==Lexical resources==<br />
===Free software===<br />
* [http://www.sk-spell.sk.cx/mass-msas Malý Anglicko-Slovenský a Slovensko-Anglický Slovník (mass/msas)] is a Slovak-English-Slovak dictionary, available in the StarDict format, under the [[GNU FDL]].<br />
<br />
===Proprietary===<br />
<br />
<br />
<br />
<br />
[[Category:Resources by language|Slovak]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Romanian&diff=10339
Resources for Romanian
2013-10-12T17:21:27Z
<p>Jonsafari: /* Free */ +Europarl corpus</p>
<hr />
<div>==Machine translation systems==<br />
<br />
===Free software===<br />
<br />
===Proprietary===<br />
<br />
<br />
==Lexical resources==<br />
<br />
* [http://nats-www.informatik.uni-hamburg.de/view/Main/GerLexicon German-English-Romanian lexicon] (freeware for academic purposes)<br />
<br />
==Corpora==<br />
<br />
===Free===<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://www.cs.unt.edu/~rada/downloads.html Romanian NLP]<br />
* [http://www.statmt.org/setimes/ Southeast European Times] (sentence aligned corpus, Albanian, Bulgarian, English, Greek, Macedonian, Romanian, Serbo-Croatian, Turkish &mdash; approximately 4.5 million words per language)<br />
<br />
===Proprietary===<br />
<br />
* [http://consilr.info.uaic.ro/en/index.php?showpage=060103 Corpora] (Monolingual, POS tagged and bilingual English/French<->Romanian).<br />
<br />
==Bibliography==<br />
<br />
<br />
==External links==<br />
<br />
*[http://consilr.info.uaic.ro/en/index.php?showpage=0604 Consortium for the Romanian language: Resources and tools]<br />
<br />
<br />
[[Category:Resources by language|Romanian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Portugese&diff=10338
Resources for Portugese
2013-10-12T17:20:49Z
<p>Jonsafari: added sections; +Europarl corpus</p>
<hr />
<div><br />
==Corpora==<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
<br />
<br />
==Software==<br />
* [http://lael.pucsp.br/corpora/segmentador/ CEPRIL] - Portugese Segmenter<br />
* [http://www.linguateca.pt/corpografo Corpógrafo] - a Web-based environment for corpora research<br />
<br />
<br />
[[Category:Resources by language|Portugese]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Polish&diff=10337
Resources for Polish
2013-10-12T17:19:54Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>==Corpora==<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://korpus.pl/en/ IPI PAN Corpus] - The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS)<br />
* [http://korpus.pwn.pl/index_en.php PWN Corpus] - PWN has prepared and made available an online version of the Corpus of Polish consisting of 40 million words. The samples were taken from 386 books, 977 editions selected from 185 different press publications, 84 transcribed spoken texts, 207 web sites and several hundred advertising leaflets and other ephemera. The full version of the corpus is available on payment for access, while a demonstration version of over 7.5 million words is available free of charge.<br />
<br />
==Taggers, parsers, morphology analysers==<br />
<br />
==Free/Open Source Software==<br />
* [http://morfologik.blogspot.com/ Morfologik] -- morphological dictionary by Marcin Miłkowski (of LanguageTool), licensed under CC-SA / GNU LGPL<br />
** [http://nlp.pwr.wroc.pl/redmine/projects/libpltagger/wiki/Morfologik_converted Morfologik converted to the IKIPI tagset] (the tagset of the IPI PAN Corpus)<br />
* [http://nlp.pwr.wroc.pl/en/tools-and-resources/narzedzia-przetwarzania-morfosyntaktycznego Morphosyntactic Toolchain] by WrocUT Language Technology Group G4.19, licensed under GNU LGPL (some optional addons are GNU GPL). Command-line utilities providing tokenisation, morphological analysis, morphosyntactic tagging, shallow parsing (chunking), WCCL feature vectors for machine learning.<br />
<br />
==Unknown license==<br />
* [http://nlp.ipipan.waw.pl/~wolinski/morfeusz/ "Morfeusz"] - morphological analyser of Polish (Wolinski, 2005), <br />
** [http://www.springerlink.com/content/l101v8823391j568/ main reference] Morfeusz — a Practical Tool for the Morphological Analysis of Polish <br />
* "AMOR" - morphology analyser of Polish (Joanna Rabiega, 2000), <br />
** [http://members.chello.pl/jrw/doc/jr_ma.pdf/ main reference] Podstawy lingwistyczne automatycznego analizatora morfologicznego AMOR <br />
* [http://duch.mimuw.edu.pl/~kszafran/index.php?option=com_docman&task=cat_view&gid=33&Itemid=43 "SAM"] - morphological analyser of Polish (Krzysztof Szafran, 1994), <br />
* [http://sourceforge.net/project/showfiles.php?group_id=166344 Morfologik] - Polish morphological analyzer based on current ispell dictionaries, and Java libraries interfacing it. First completely open-source and comprehensive morphological tools for Polish. Will be used for grammar correction tools (to be included in the future) <br />
* [http://nlp.ipipan.waw.pl/Spejd/ Spejd - Shallow Parsing and Disambiguation Engine] <br />
* [http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml lemmatizer] - Dawid Weiss<br />
<br />
==Lexical resources==<br />
<br />
<br />
==Bibliography==<br />
<br />
<br />
==External links==<br />
* [http://bach.ipipan.waw.pl/mailman/listinfo/ling Polish linguistics mailing list] - mainly in Polish<br />
<br />
<br />
[[Category:Resources by language|Polish]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Lithuanian&diff=10336
Resources for Lithuanian
2013-10-12T17:19:15Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>==Corpora==<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
<br />
==Morphological analysis==<br />
<br />
* [http://xixona.dlsi.ua.es/wiki/index.php/Incubator#Lithuanian Lithuanian] morphological dictionary from the Apertium project.<br />
<br />
[[Category:Resources by language|Lithuanian]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Dutch&diff=10335
Resources for Dutch
2013-10-12T17:18:56Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>== Corpora ==<br />
* [http://corpora.informatik.uni-leipzig.de/ Dutch Plain text and Co-occurrences at LCC]<br />
* [http://www.let.rug.nl/~vannoord/alp/Alpino/ Dutch HPSG-based parser. Includes the Alpino treebank (7137 sentences, newspaper, manually corrected).]<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
<br />
== Grammars ==<br />
* [[Generation grammars|KPML generation grammar]]<br />
<br />
<br />
<br />
[[Category:Resources by language|Dutch]]</div>
Jonsafari
https://aclweb.org/aclwiki/index.php?title=Resources_for_Italian&diff=10334
Resources for Italian
2013-10-12T17:18:14Z
<p>Jonsafari: /* Corpora */ +Europarl corpus</p>
<hr />
<div>== Tools for Italian ==<br />
<br />
=== Tokenisers ===<br />
* [http://tcc.itc.it/projects/textpro/index.php TextPro] <br />
<br />
=== POS taggers ===<br />
* [http://tcc.itc.it/projects/textpro/index.php TextPro] <br />
* [http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html TreeTagger] <br />
<br />
===Morphology===<br />
====Free software====<br />
* [http://sslmitdev-online.sslmit.unibo.it/linguistics/morph-it.php Morph-It! version 0.47] - a free morphological resource for the Italian language, includes [[SFST]] sources. [[LGPL]] license.<br />
<br />
====Unknown license====<br />
* [http://archivium.biz/ dic_it: il Verbiario] - a morphological analizer and verb coniugator for Italian verbs (web interface only?)<br />
<br />
=== Named Entity Recognisers ===<br />
* [http://tcc.itc.it/projects/ontotext/entitypro.html EntityPro]<br />
<br />
=== Temporal Expressions ===<br />
* [http://tcc.itc.it/projects/ontotext/ita-chronos.html ITA-Chronos]<br />
<br />
=== Parsers ===<br />
* [http://ai-nlp.info.uniroma2.it/external/chaosproject/ Chaos] - Robust syntactic parser for Italian and for English<br />
<br />
=== Generators ===<br />
* [http://tcc.itc.it/projects/xig/index.html XIG] - Interchange to Italian Generator<br />
<br />
== Resources for Italian ==<br />
<br />
=== Corpora ===<br />
<!-- Please keep this list in alphabetical order --><br />
<br />
* [http://www.istc.cnr.it/material/database/colfis/ ColFIS Corpus e Lessico di Frequenza dell'Italiano Scritto]<br />
* [http://corpus.cilta.unibo.it:8080/coris_ita.html Corpus di Italiano Scritto contemporaneo (CORIS/CODIS)]<br />
* [http://www.statmt.org/europarl Europarl corpus], sentence aligned with English<br />
* [http://corpora.informatik.uni-leipzig.de/ Italian plain text and Co-occurrences at LCC]<br />
* [http://languageserver.uni-graz.at/badip/badip/20_corpusLip.php LIP - Lessico di frequenza dell'Italiano Parlato - Access via BADIP]<br />
* [http://multisemcor.itc.it/ MultiSemCor] - English/Italian parallel corpus<br />
* [http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers]<br />
* [http://tlio.ovi.cnr.it/TLIO/ Tesoro della lingua italiana delle origini (TLIO)]<br />
<br />
=== Tagsets ===<br />
* [http://tcc.itc.it/projects/textpro/index.php LemmaPro] - Italian POS tagset for LemmaPro<br />
<br />
=== Treebanks ===<br />
* [http://catalog.elra.info/retd/product_info.phpproducts_id=879&osCsid=0cef41a96779ef79b67c71bbf35e6eaa ISST] - Italian Syntactic-Semantic Treebank<br />
* [http://www.di.unito.it/~tutreeb/ TUT] - Turin University Treebank<br />
* [http://157.138.41.87/HTMLipar/indexparsing_a.htm VIT] - Venice Italian Treebank<br />
<br />
=== WordNets ===<br />
* [http://www.elda.fr/ EuroWordNet]<br />
* [http://multiwordnet.itc.it/english/home.php MultiWordNet] - a multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6<br />
<br />
=== Lexicons ===<br />
* [http://www.ilc.cnr.it/clips/PSC_decription.htm PAROLE-SIMPLE-CLIPS] - a four-layered, general purpose computational lexicon<br />
<br />
== Links ==<br />
* [http://evalita.itc.it/ Evalita] - Evaluation of NLP tools for Italian<br />
<br />
[[Category:Resources by language|Italian]]</div>
Jonsafari