Knowledge collections and datasets (English)
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Datasets for Computational Linguistics and Natural Language Processing.
- Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
- DIRT Paraphrase Collection - Discovery of Inference Rules from Text
- Edinburgh Associative Thesaurus (EAT)
- FrameNet
- MRC Psycholinguistic Database
- Preposition Project
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- SAT Analogy Questions - a way of evaluating algorithms for measuring relational similarity
- Spam filtering datasets
- TEASE - Acquisition of Entailment Relations from the Web
- TOEFL Synonym Questions - a way of evaluating algorithms for measuring degree of similarity between two words
- University of South Florida Free Association Norms
- VerbOcean - verbs organized by semantic relation, including temporal precedence and strength
- WordNet
- WordSimilarity-353 Test Collection