Difference between revisions of "Corpora, datasets, lexicons"

From ACL Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
 
* [[Corpora]]
 
* [[Corpora]]
 
* [[Datasets]]
 
* [[Datasets]]
Line 37: Line 36:
 
* [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists]
 
* [http://devoted.to/corpora David Lee's Bookmarks for Corpus-based Linguists]
 
* [[Resources]]
 
* [[Resources]]
 
== Datasets ==
 
 
* [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
 
* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)]
 
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database]
 
* [http://www.cs.utexas.edu/~mfkb/nn/ Noun Compound Repository]
 
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
 
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
 
* [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection]
 
 
== Lexicons ==
 
(alphabetical order)
 
* [http://clipdemos.umiacs.umd.edu/catvar/ Catvar 2.0: The Categorial Variation Database] - for example, the ''developing'' cluster: {''develop'' (V), ''developer'' (N), ''developed'' (AJ), ''developing'' (N), ''developing'' (AJ), ''development'' (N)}
 
* [http://www.wjh.harvard.edu/%7Einquirer/spreadsheet_guide.htm General Inquirer]
 
* [http://www.csse.monash.edu.au/~jwb/edict_doc.html JMdict: Japanese-Multilingual Dictionary file]
 
* [http://www.umiacs.umd.edu/~bonnie/LCS_Database_Documentation.html LCS Database: Lexical Conceptual Structures]
 
* [http://www.dcs.shef.ac.uk/research/ilash/Moby/ Moby lexicon project]
 
* [http://www.signiform.com/tt/htm/tt.htm ThoughtTreasure]
 
 
=== WordNet and enhancements ===
 
(alphabetical order)
 
* [http://xwn.hlt.utdallas.edu/ eXtended WordNet] - glosses are syntactically parsed, transformed into logic forms, and content words are semantically disambiguated
 
* [http://patty.isti.cnr.it/~esuli/software/SentiWordNet/ SentiWordNet] - assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity
 
* [http://wordnet.princeton.edu/ WordNet] - the original
 
* [http://tcc.itc.it/research/textec/topics/disambiguation/wordnetdomains.html WordNet Domains] - augmented with Domain Labels, such as POLITICS, ECONOMY, SPORT
 

Revision as of 13:58, 2 November 2006