Difference between revisions of "RTE Knowledge Resources"
Jump to navigation
Jump to search
m |
m |
||
Line 205: | Line 205: | ||
|} | |} | ||
<br> | <br> | ||
− | [*] The | + | [*] The number of Users refers to participants in the last two RTE challenges. |
+ | RTE-4 data are extracted both from the related proceedings and from the Knowledge Resources Questionnaire whereas RTE-3 data are extracted only from the Knowledge Resources Questionnaire. |
Revision as of 07:17, 22 April 2009
This page has been created with the purpose of sharing information about knowledge resources used by systems which participated in one or more RTE challenges.
Participants are encouraged to add information about all kind of knowledge resources, from standard existing resources (e.g. WordNet) to knowledge collections created for specific purposes which can be made available to the community.
The table is sortable by Resource name, type, author and number of users.
Resource | Type | Author | Brief description | RTE Users* | Usage info |
---|---|---|---|---|---|
WordNet | Lexical DB | Princeton University | Lexical database of English nouns, verbs, adjectives and adverbs | 23 | Users |
Verbnet | Lexical DB | University of Colorado Boulder | Lexicon for English verbs organized into classes | 3 | Users |
VerbOcean | Lexical DB | University of Southern California | Broad-coverage semantic network of verbs | 5 | Users |
FrameNet | Lexical DB | ICSI (International Computer Science Institute) - Berkley University | Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence | 2 | Users |
NomBank | Lexical DB | New York University | Lexical resource containing syntactic frames for nouns, extracted from annotated corpora | 2 | Users |
PropBank | Lexical DB | University of Colorado Boulder | Lexical resource containing syntactic frames for verbs, extracted from annotated corpora | 2 | Users |
Nomlex Plus | Lexical DB | New York University | Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb | 1 | Users |
Parc Polarity Lexicon | Lexical DB | PARC - Palo Alto Research Center | Verbs classification with respect to semantic polarity | 1 | Users |
Wikipedia | Encyclopedia | Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc. | 3 | Users | |
DIRT Paraphrase Collection | Collection of paraphrases | University of Alberta | Output of the DIRT algorithm | 4 | Users |
TEASE Collection | Collection of Entailment Rules | Bar Ilan University | Output of the TEASE algorithm | 0 | Users |
BADC Acronym and Abbreviation List | Word List | BADC - British Atmospheric Data Centre | Acronym and Abbreviation List | 1 | Users |
Acronym Guide | Word List | Acronym-Guide.com | Acronym and Abbreviation Lists for English, branched in thematic directories | 1 | Users |
Dekang Lin’s Thesaurus | Thesaurus | University of Alberta | Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores | 1 | Users |
Roget's Thesaurus | Thesaurus | Peter Mark Roget (Electronic version distributed by University of Chicago) | Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition (version 1.02) is made available by University of Chicago. | 1 | Users |
Web1T 5-grams | Word list | Google Inc. | Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages | 1 | Users |
GNIS - Geographic Names Information System | Gazetteer | USGS - United States Geological Survey | Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica | 1 | Users |
Geonames | Gazetteer | Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. | 1 | Users | |
Gazetteer from TREC | Gazetteer | NIST - National Institute of Standards and Technology | Cities and other geographical names | 1 | Users |
Geographic Ontology | Ontology | University of West Florida | Hierarchical data structure that allows the storage of natural and man-made feature data for use in a multitude of both manual and computerized Mapping, Charting & Geodesy systems | 1 | Users |
Syntactic rule base | Collection of Entailment Rules | Bar-Ilan University | A manually-composed collection of entailment rules which define parse tree transformations. The rules cover generic syntactic phenomena such as appositions, conjunctions, passive, relative clause, etc. (Bar-Haim et al., AAAI-07) | 1 | Users |
Polarity rule base | Collection of Entailment Rules | Bar-Ilan University | A manually-composed collection of entailment rules which detect predicates whose polarity is negative (e.g. didn't dance) or unknown (e.g. plans to dance). The rules capture diverse phenomena that affect polarity, e.g. verbal negation, modal verbs, conditionals, and certain verbs that induce negative or "unknown" polarity context. The latter were taken mainly from VerbNet, and also from the PARC polarity lexicon. It extends a resource described in (Bar-Haim et al., AAAI-07) | 1 | Users |
OPENU Collection | Collection of Entailment Rules and Patterns | Collections of rules, patterns etc. for RTE purpose, extracted from parsed Reuter corpus. | 1 | Users | |
Sekine's Paraphrase Database | Collection of paraphrases | Department of Computer Science, New York University | Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases. | 0 | Users |
Microsoft Research Paraphrase Corpus | Collection of paraphrases | Microsoft Research | Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship. | 0 | Users |
New resource | Participants are encouraged to contribute | Users | |||
New resource | Participants are encouraged to contribute | Users |
[*] The number of Users refers to participants in the last two RTE challenges.
RTE-4 data are extracted both from the related proceedings and from the Knowledge Resources Questionnaire whereas RTE-3 data are extracted only from the Knowledge Resources Questionnaire.