Resources for English: Difference between revisions
Jump to navigation
Jump to search
| Line 164: | Line 164: | ||
==CORPORA== | ==CORPORA== | ||
*[ftp://ftp.cs.cornell.edu/pub/smart/time/ 1963 Time Magazine corpus] | |||
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html 2000 NIST Speaker Recognition Evaluation Corpus] | |||
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ A Syntactically Annotated Corpus of German Newspaper Texts] | |||
*[http://ixa.si.ehu.es/Ixa/resources/sensecorpus A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)] | |||
*[http://odur.let.rug.nl/~vannoord/trees/ Alpino Treebank] | |||
*[http://www.cornelsen.de/international/ An Empirical Grammar of the English Verb System] | |||
*[http://www.sultry.arts.usyd.edu.au/links/statnlp.html Annotated list of resources on statistical NLP and corpus-based CL] | |||
*[http://www.aot.ru/search1.html AOT] | |||
*[http://www.ldc.upenn.edu/Catalog/LDC2001T55.html Arabic Newswire Part 1] | |||
*[http://atilf.atilf.fr/dmf.htm Base Textuelle de Moyen Francais] | |||
*[http://thetis.bl.uk/ BNC Online Service] | |||
*[http://bokrcorpora.narod.ru Bokr Russian Reference Corpus] | |||
*[http://info.ox.ac.uk/bnc/ BRITISH NATIONAL CORPUS - WORLD EDITION] | |||
*[http://www.dcs.gla.ac.uk/idom/ir_resources/ Collections of texts and corpora] | |||
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular] | |||
*[http://www.lllf.uam.es/~fmarcos/informes/corpus/corpulee.html Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular] | |||
*[http://www.corpusdelespanol.org/ CORPUS DEL ESPANOL] | |||
*[http://www.corpusdelespanol.org/ Corpus del Espanol] | |||
*[http://www.corpusdelespanol.org/ Corpus del Espanol] | |||
*[http://www.hf.uio.no/easteur-orient/bulg/mat/ Corpus of spoken Bulgarian] | |||
*[http://pioneer.chula.ac.th/~awirote/ling/corpuslst.htm Corpus Resources (Chulalongkorn University, Thailand)] | |||
*[ftp://ftp.cs.cornell.edu/pub/smart/cran/ Cranfield collection] | |||
*[http://corpus.rae.es/creanet.html CREA] | |||
*[http://corpus.rae.es/creanet.html CREA] | |||
*[http://ucnk.ff.cuni.cz/english/index.html Czech National Corpus] | |||
*[http://korpus.dsl.dk/korpus2000/indgang.php Danish news corpus] | |||
*[http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)] | |||
*[http://www.hum.uva.nl/~ewn EuroWordNet] | |||
*[http://www.ims.uni-stuttgart.de/projekte/tc/CQP.html Experimental Corpus Query System (University of Stuttgart, Germany)] | |||
*[http://www.csc.fi/kielipankki/ Finnish text bank] | |||
*[http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/ GENIA corpus version 3.0p] | |||
*[http://hometown.aol.com/mit2haiti/Index4.html HAITIAN CREOLE ELECTRONIC TEXTS] | |||
*[http://www-rali.iro.umontreal.ca/TransSearch/TS-simple-uen.cgi Hansards Corpus - Searchable] | |||
*[http://www.hcrc.ed.ac.uk/maptask/ HCRC Map Task Corpus XML annotations] | |||
*[http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en Helsinki Corpus of Swahili (HCS)] | |||
*[http://nats-www.informatik.uni-hamburg.de/~ingo/icopost/ ICOPOST] | |||
*[http://www.ims.uni-stuttgart.de/projekte/TC.html IMS Corpus Toolbox, Univ. of Stuttgart] | |||
*[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/ IMS Corpus Workbench (CWB)] | |||
*[http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm International Corpus of Learner English] | |||
*[http://korpus.pl/en/ IPI PAN Polish Corpus] | |||
*[http://www.ipds.uni-kiel.de/links/datenmaterial.en.html Kiel University's Institute on Phonetics and Speech Procesing] | |||
*[http://www.nilc.icmc.usp.br/lacioweb Lacio Web Corpora] | |||
*[http://www.vuw.ac.nz/llc/ LANGUAGE LEARNING CENTER - ACADEMIC CORPUS] | |||
*[http://www-rali.iro.umontreal.ca/arc-a2/BAF/Description.html Le corpus BAF (French and English)] | |||
*[http://www.csse.monash.edu.au/~jwb/afaq/jitadoushi.html list of Japanese transitive - intransitive verb pairs] | |||
*[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of stop words] | |||
*[http://www.bmanuel.org/clr2_mp.html Manuel Barbera: General Corpora and Corpus Linguistics Resources] | |||
*[ftp://ftp.cs.cornell.edu/pub/smart/med/ Medlars collection] | |||
*[ftp://ftp.ox.ac.uk/pub/wordlists/ Miscellaneous Word Lists from Oxford University] | |||
*[http://www.lpl.univ-aix.fr/projects/multext/ Multilingual Text Tools and Corpora] | |||
*[http://www.census.gov/genealogy/names Name lists from US census] | |||
*[http://www.di.fc.ul.pt/~ahb/nexing.htm Nexing Corpus] | |||
*[http://www.cs.cmu.edu/web/books.html On-line books at CMU] | |||
*[http://logos.uio.no/opus/ OPUS -- An Open Source Parallel Corpus] | |||
*[http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/ Oxford Text Archive Corpus of Italian Newspapers] | |||
*[http://projects.ldc.upenn.edu/Chinese/hklaws.htm Parallel Texts of Hong Kong Laws] | |||
*[http://elex.amu.edu.pl/~przemka/PICLE_search.php Polish subcorpus of the International Corpus of Learner English] | |||
*[http://www.cirp.es/WXN/wxn/frames/proxectos.html Ramon Piero Center for Research] | |||
*[http://about.reuters.com/researchandstandards/corpus/ Reuters Corpus] | |||
*[http://www.cs.unt.edu/~rada/downloads.html Romanian NLP] | |||
*[http://www.sfb441.uni-tuebingen.de/b1/korpora.html Russian Corpora] | |||
*[http://rykov-cl.narod.ru/r.html Russian Corpora] | |||
*[http://www.ruscorpora.ru/ Russian Corpus Page] | |||
*[http://lib.ru/ Russian Corpus Site] | |||
*[http://www.ng.ru Russian Corpus Site] | |||
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus] | |||
*[http://www.philol.msu.ru/~lex/corpus/ Russian Newspaper Corpus] | |||
*[http://schools.keldysh.ru/uvk1838/Sciper/volume2/langres/russiclr.htm Russicon Resources] | |||
*[http://sanskritlibrary.org/ Sanskrit Library] | |||
*[http://nl.ijs.si/elan/#corpus Slovene-English Parallel Corpus] | |||
*[http://www.ldc.upenn.edu/Catalog/LDC2001S97.html Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio] | |||
*[http://www.ldc.upenn.edu/Catalog/LDC2001S99.html Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio] | |||
*[http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/doc/notes/corpora.txt Survey of Electronic Corpora (by Jane A. Edwards, file at CMU)] | |||
*[http://www.ucl.ac.uk/english-usage/ Survey of English Usage, University College, London] | |||
*[http://www.icsi.berkeley.edu/real/stp/index.html Switchboard Transcription Project] | |||
*[http://www.tractor.de/ TELRI Research Archive of Computational Tools and Resources] | |||
*[ftp://ftp.microsoft.com/developr/msdn/newup/glossary/ Terminology for more than 15 languages] | |||
*[http://childes.psy.cmu.edu/ The Childes Corpus - Children's language] | |||
*[http://nora.hd.uib.no/index-e.html The CORPORA DataCenter (Norway)] | |||
*[ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/ The Moby Corpus] | |||
*[http://www.tekstlab.uio.no/Bosnian/Corpus.html The Oslo Corpus of Bosnian Texts] | |||
*[http://www.sketchengine.co.uk/ The Sketch Engine] | |||
*[http://www.hf.uio.no/tekstlab/prosjekter/SOFIE.htm The Sofie Treebank - A Parallel Treebank of North European Languages] | |||
*[http://www.cis.upenn.edu/~treebank/tokenization.html Treebank tokenization scheme] | |||
===CORPORA - ENGLISH=== | ===CORPORA - ENGLISH=== | ||
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car] | |||
*[http://www.elda.fr/catalogue/en/speech/S0115.html American English SpeechDat-Car] | |||
*[http://americannationalcorpus.org/FirstRelease/ AMERICAN NATIONAL CORPUS FIRST RELEASE] | |||
*[http://homepage.mac.com/bncweb/ BNCweb a web-based interface to the British National Corpus] | |||
*[http://devoted.to/corpora Bookmarks for Corpus-based Linguists] | |||
*[http://info.ox.ac.uk/bnc/ British National Corpus (from Oxford University)] | |||
*[http://www.comp.lancs.ac.uk/computing/research/ucrel/bnc.html British National Corpus project page (from UCREL)] | |||
*[http://www.athel.com/corpdes.html Corpus of Spoken Professional English] | |||
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm Dialogue Diversity Corpus] | |||
*[http://etext.lib.virginia.edu/ Electronic Text Center -- University of Virginia] | |||
*[http://www.phon.ox.ac.uk/~esther/ivyweb/ English Intonation in the British Isles -The IViE Corpus] | |||
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c English stop words (from SMART)] | |||
*[http://www-personal.umich.edu/~jlawler/levin.html English Verb Classes And Alternations: A Preliminary Investigation (Index)] | |||
*[http://usna.edu/LangStudy/BNC/ Exploring Words and Phrases from the British National Corpus] | |||
*[http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/ GENIA Project Home Page] | |||
*[http://nora.hd.uib.no/icame.html ICAME] | |||
*[http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/naive-bayes/bow-0.8/stopwords.c List of English stopwords] | |||
*[http://www.lsi.upc.es/~nlp/tools/mapping.html Mapping WordNet Versions 1.6 and 2.0] | |||
*[http://www.cs.cornell.edu/People/pabo/movie-review-data/ Movie Review Data] | |||
*[http://mwe.stanford.edu/resources/ Multiword Expression Resources] | |||
*[http://pie.usna.edu/ Phrases in English] | |||
*[http://pie.usna.edu/ Phrases in English] | |||
*[http://homepages.feis.herts.ac.uk/~comrcml/Lyon-thesis.ps Restricted English Corpus from Dr. Caroline Lyon for PhD] | |||
*[http://www.sketchengine.co.uk/ Sketch Engine] | |||
*[http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/susanne/0.html Susanne: Annotated American English Corpus] | |||
*[http://clix.to/davidlee00 The BNC Index (for the BNCWorld Edition)] | |||
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English] | |||
*[http://www-users.york.ac.uk/~sp20/corpus.html The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English] | |||
*[http://www-rcf.usc.edu/~billmann/diversity/DDivers-site.htm The Dialogue Diversity Corpus] | |||
*[http://www.grsampson.net/LucyDoc.html The LUCY Corpus - Documentation] | |||
*[http://www.cs.rochester.edu/research/cisd/resources/trains.html TRAINS Dialogue Corpus] | |||
===CORPORA - GERMAN=== | ===CORPORA - GERMAN=== | ||
*[http://www.phonetik.uni-muenchen.de/Bas/BasKorporaeng.html Bavarian Archive for Speech Signals Corpora] | |||
*[http://corpora.ids-mannheim.de/~cosmas/ COSMAS II] | |||
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus] | |||
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html NEGRA Corpus] | |||
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/ Saarland University, Computational Linguistics] | |||
*[http://www.coli.uni-sb.de/sfb378/negra-corpus/negra-corpus.html The Negra Corpus - German Syntax annotated] | |||
===CORPORA - MULTILINGUAL=== | ===CORPORA - MULTILINGUAL=== | ||
*[http://wt.jrc.it/lt/Acquis/ ACQUIS COMMUNAUTAIRE Multilingual Corpus] | |||
*[http://www.kun.nl/celex CELEX - The Dutch Center for Lexical Information] | |||
*[http://www.cdc.gov/ncidod/sars/languages.htm Centre for Disease Control - Chinese, French, Japanese, Spanish info on SARS] | |||
*[http://www.linguateca.pt/COMPARA/ COMPARA corpus] | |||
*[http://www.debian.org/international/ Debian free software community] | |||
*[http://www.ling.lancs.ac.uk/corplang/emille EMILLE corpus] | |||
*[http://people.csail.mit.edu/people/koehn/publications/europarl/ European Parliament Proceedings Parallel Corpus 1996-2003] | |||
*[http://www.illc.uva.nl/EuroWordNet EuroWordNet] | |||
*[http://www.france.diplomatie.fr/label_france/index.html French Foreign Ministry's magazine] | |||
*[http://glossa.fltr.ucl.ac.be/ GlossaNet] | |||
*[http://hometown.aol.com/mit2haiti/JA-HC-kr.htm Haitian Creole corpus -Teknoloji pou lang kreyol] | |||
*[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95T20 Hansard French-English parallel corpus] | |||
*[http://www.ucl.ac.uk/english-usage/ice/avail.htm ICE corpora] | |||
*[http://www.tu-chemnitz.de/phil/InternetGrammar/ Learner Behaviour on the Internet] | |||
*[http://muchmore.dfki.de/resources1.htm MuchMore Springer Bilingual Corpus] | |||
*[http://nl.ijs.si/ME/ MULTEXT-East: Multilingual Corpora for Eastern and Central European Languages] | |||
*[http://tcc.itc.it/people/forner/multilingualcorpora.html Multilingual Corpora: Available Resources] | |||
*[http://multisemcor.itc.it MultiSemCor] | |||
*[http://www.ims.uni-stuttgart.de/info/Newspapers.html Newspapers on the Internet] | |||
*[http://logos.uio.no/opus/ OPUS - an open source parallel corpus] | |||
*[http://langbank.engl.polyu.edu.hk/indexl.html PolyU Language Bank] | |||
*[http://register.consilium.eu.int/ Public registry of the Council of the EU] | |||
*[http://www.multilingual.com/allen51.htm The Bible as a Resource for Translation Software] | |||
*[http://www.cogsci.ed.ac.uk/elsnet/eci.html The ECI Multilingual corpus] | |||
*[http://www.unhchr.ch/udhr/index.htm UN declaration of human rights in multiple languages] | |||
*[http://www-igm.univ-mlv.fr/~unitex/ UNITEX] | |||
*[http://www.u-grenoble3.fr/kraif/liens.htm Useful links about parallel corpora, by Olivier Kraif] | |||
*[http://wacky.sslmit.unibo.it/ WaCky Project] | |||
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch] | |||
*[http://www.wortschatz.uni-leipzig.de/html/wliste.html Wortlisten: spoken German, English, French, and Dutch] | |||
==COURSES== | ==COURSES== | ||
Revision as of 00:27, 27 October 2006
BIBLIOGRAPHY
BIBLIOGRAPHY - SEARCHABLE
BOOKS
BOOKS - ONLINE