Knowledge collections and datasets (English)
From ACL Wiki
Revision as of 08:00, 19 November 2006 by Ionandr (Added spam filtering datasets.)
Datasets for Computational Linguistics and Natural Language Processing.
- Clustering by Committee - terms clustered and organized using the Distributional Hypothesis
- DIRT Paraphrase Collection
- Edinburgh Associative Thesaurus (EAT)
- MRC Psycholinguistic Database
- Noun Compound Repository
- Reuters-21578 Text Categorization Collection
- Spam filtering datasets
- University of South Florida Free Association Norms
- VerbOcean - verbs organized by semantic relation, including temporal precedence, strength, etc.
- WordSimilarity-353 Test Collection