Knowledge collections and datasets (English): Difference between revisions
Jump to navigation
Jump to search
added DIRT |
Added link to Wordnet Annotated Corpora --- A relatively complete list of wordnet annotated corpora, both in English and other languages |
||
| (22 intermediate revisions by 9 users not shown) | |||
| Line 1: | Line 1: | ||
Knowledge collections and datasets for Computational Linguistics and Natural Language Processing. | |||
* [[DIRT Paraphrase Collection]] | For languages other than English, see [[List of resources by language]]. | ||
<!-- Please keep this list in alphabetical order --> | |||
* [[Clustering by Committee]] - terms clustered and organized using the [[Distributional Hypothesis]] | |||
* [[DIRT Paraphrase Collection]] - Discovery of Inference Rules from Text | |||
* [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)] | * [http://www.eat.rl.ac.uk/ Edinburgh Associative Thesaurus (EAT)] | ||
* [http://framenet.icsi.berkeley.edu/ FrameNet] | * [http://framenet.icsi.berkeley.edu/ FrameNet] | ||
* [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | * [http://www.psych.rl.ac.uk/ MRC Psycholinguistic Database] | ||
* [http://www. | * [http://www.clres.com/prepositions.html Preposition Project] | ||
* [[Noun compound repository|Noun Compound Repository]] | |||
* [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | * [http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html Reuters-21578 Text Categorization Collection] | ||
* [[SAT Analogy Questions]] - a way of evaluating algorithms for measuring relational similarity | |||
* [[Spam filtering datasets]] | |||
* [[TEASE]] - Acquisition of Entailment Relations from the Web | |||
* [[TOEFL Synonym Questions]] - a way of evaluating algorithms for measuring degree of similarity between 2 words | |||
* [[RG-65 Test Collection (State of the art)|RG-65 Test Collection]] - suitable for correlation-based evaluation of algorithms for measuring semantic similarity of word pairs | |||
* [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms] | * [http://w3.usf.edu/FreeAssociation/ University of South Florida Free Association Norms] | ||
* [http:// | * [[VerbOcean]] - verbs organized by semantic relation, including temporal precedence and strength | ||
* [[WordNet]] | |||
:* [http://globalwordnet.org/?page_id=241 Wordnet Annotated Corpora] A relatively complete list of wordnet annotated corpora, both in English and other languages | |||
* [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection] | * [http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html WordSimilarity-353 Test Collection] | ||
See also [[NLG:Data sets]] for a collection of data sets used for building natural language generation systems. | |||
== Additional Dataset Collections == | == Additional Dataset Collections == | ||
* [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)] | * [http://www.ldc.upenn.edu/ Linguistic Data Consortium (LDC)] | ||
[[Category:Knowledge Collections and Datasets|*]] | |||
Latest revision as of 02:19, 18 November 2013
Knowledge collections and datasets for Computational Linguistics and Natural Language Processing.
For languages other than English, see List of resources by language.
- Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
- DIRT Paraphrase Collection - Discovery of Inference Rules from Text
- Edinburgh Associative Thesaurus (EAT)
- FrameNet
- MRC Psycholinguistic Database
- Preposition Project
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- SAT Analogy Questions - a way of evaluating algorithms for measuring relational similarity
- Spam filtering datasets
- TEASE - Acquisition of Entailment Relations from the Web
- TOEFL Synonym Questions - a way of evaluating algorithms for measuring degree of similarity between 2 words
- RG-65 Test Collection - suitable for correlation-based evaluation of algorithms for measuring semantic similarity of word pairs
- University of South Florida Free Association Norms
- VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
- WordNet
- Wordnet Annotated Corpora A relatively complete list of wordnet annotated corpora, both in English and other languages
See also NLG:Data sets for a collection of data sets used for building natural language generation systems.