Knowledge collections and datasets (English): Difference between revisions

From ACL Wiki
Jump to navigation Jump to search
Timc (talk | contribs)
added DIRT
Bond (talk | contribs)
Added link to Wordnet Annotated Corpora --- A relatively complete list of wordnet annotated corpora, both in English and other languages
 
(22 intermediate revisions by 9 users not shown)
Line 1: Line 1:
Datasets for Computational Linguistics and Natural Language Processing.
Knowledge collections and datasets for Computational Linguistics and Natural Language Processing.


* [[DIRT Paraphrase Collection]]
For languages other than English, see [[List of resources by language]].
 
<!-- Please keep this list in alphabetical order -->
* [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]]
* [[DIRT Paraphrase Collection]] - Discovery of Inference Rules from Text
* [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
* [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)]
* [http://framenet.icsi.berkeley.edu/ FrameNet]
* [http://framenet.icsi.berkeley.edu/ FrameNet]
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database]
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database]
* [http://www.cs.utexas.edu/~mfkb/nn/ Noun Compound Repository]
* [http://www.clres.com/prepositions.html Preposition Project]
* [[Noun compound repository|Noun Compound Repository]]
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection]
* [[SAT Analogy Questions]] - a way of evaluating algorithms for measuring relational similarity
* [[Spam filtering datasets]]
* [[TEASE]] - Acquisition of Entailment Relations from the Web
* [[TOEFL Synonym Questions]] - a way of evaluating algorithms for measuring degree of similarity between 2 words
* [[RG-65 Test Collection (State of the art)|RG-65 Test Collection]] - suitable for correlation-based evaluation of algorithms for measuring semantic similarity of word pairs
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms]
* [http://wordnet.princeton.edu/ WordNet]
* [[VerbOcean]] - verbs organized by semantic relation, including temporal precedence and strength
* [[WordNet]]
:* [http://globalwordnet.org/?page_id=241 Wordnet Annotated Corpora] A relatively complete list of wordnet annotated corpora, both in English and other languages
* [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection]
* [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection]
See also [[NLG:Data sets]] for a collection of data sets used for building natural language generation systems.


== Additional Dataset Collections ==
== Additional Dataset Collections ==
* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)]
* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)]
[[Category:Knowledge Collections and Datasets|*]]

Latest revision as of 02:19, 18 November 2013

Knowledge collections and datasets for Computational Linguistics and Natural Language Processing.

For languages other than English, see List of resources by language.

See also NLG:Data sets for a collection of data sets used for building natural language generation systems.

Additional Dataset Collections