Difference between revisions of "RTE Knowledge Resources"

Revision as of 07:17, 22 April 2009

This page has been created with the purpose of sharing information about knowledge resources used by systems which participated in one or more RTE challenges.

Participants are encouraged to add information about all kind of knowledge resources, from standard existing resources (e.g. WordNet) to knowledge collections created for specific purposes which can be made available to the community.

The table is sortable by Resource name, type, author and number of users.

Resource	Type	Author	Brief description	RTE Users*	Usage info
WordNet	Lexical DB	Princeton University	Lexical database of English nouns, verbs, adjectives and adverbs	23	Users
Verbnet	Lexical DB	University of Colorado Boulder	Lexicon for English verbs organized into classes	3	Users
VerbOcean	Lexical DB	University of Southern California	Broad-coverage semantic network of verbs	5	Users
FrameNet	Lexical DB	ICSI (International Computer Science Institute) - Berkley University	Lexical resource for English words, based on frame semantics (valences) and supported by corpus evidence	2	Users
NomBank	Lexical DB	New York University	Lexical resource containing syntactic frames for nouns, extracted from annotated corpora	2	Users
PropBank	Lexical DB	University of Colorado Boulder	Lexical resource containing syntactic frames for verbs, extracted from annotated corpora	2	Users
Nomlex Plus	Lexical DB	New York University	Dictionary of English nominalizations: it describes the allowed complements for a nominalization and relates the nominal complements to the arguments of the corresponding verb	1	Users
Parc Polarity Lexicon	Lexical DB	PARC - Palo Alto Research Center	Verbs classification with respect to semantic polarity	1	Users
Wikipedia	Encyclopedia		Free encyclopedia. Used for extraction of lexical-semantic rules (from its more structured parts), named entity recognition, geographical information etc.	3	Users
DIRT Paraphrase Collection	Collection of paraphrases	University of Alberta	Output of the DIRT algorithm	4	Users
TEASE Collection	Collection of Entailment Rules	Bar Ilan University	Output of the TEASE algorithm	0	Users
BADC Acronym and Abbreviation List	Word List	BADC - British Atmospheric Data Centre	Acronym and Abbreviation List	1	Users
Acronym Guide	Word List	Acronym-Guide.com	Acronym and Abbreviation Lists for English, branched in thematic directories	1	Users
Dekang Lin’s Thesaurus	Thesaurus	University of Alberta	Thesaurus automatically constructed using a parsed corpus, based on distributional similarity scores	1	Users
Roget's Thesaurus	Thesaurus	Peter Mark Roget (Electronic version distributed by University of Chicago)	Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition (version 1.02) is made available by University of Chicago.	1	Users
Web1T 5-grams	Word list	Google Inc.	Data set containing English word n-grams and their observed frequency counts. The n-gram counts were generated from approximately 1 trillion word tokens of text from publicly accessible Web pages	1	Users
GNIS - Geographic Names Information System	Gazetteer	USGS - United States Geological Survey	Database containing the Federal and national standard toponyms for USA, associated areas and Antarctica	1	Users
Geonames	Gazetteer		Database containing eight million geographical names. It is integrating geographical data such as names of places in various languages, elevation, population and others from various sources.	1	Users
Gazetteer from TREC	Gazetteer	NIST - National Institute of Standards and Technology	Cities and other geographical names	1	Users
Geographic Ontology	Ontology	University of West Florida	Hierarchical data structure that allows the storage of natural and man-made feature data for use in a multitude of both manual and computerized Mapping, Charting & Geodesy systems	1	Users
Syntactic rule base	Collection of Entailment Rules	Bar-Ilan University	A manually-composed collection of entailment rules which define parse tree transformations. The rules cover generic syntactic phenomena such as appositions, conjunctions, passive, relative clause, etc. (Bar-Haim et al., AAAI-07)	1	Users
Polarity rule base	Collection of Entailment Rules	Bar-Ilan University	A manually-composed collection of entailment rules which detect predicates whose polarity is negative (e.g. didn't dance) or unknown (e.g. plans to dance). The rules capture diverse phenomena that affect polarity, e.g. verbal negation, modal verbs, conditionals, and certain verbs that induce negative or "unknown" polarity context. The latter were taken mainly from VerbNet, and also from the PARC polarity lexicon. It extends a resource described in (Bar-Haim et al., AAAI-07)	1	Users
OPENU Collection	Collection of Entailment Rules and Patterns		Collections of rules, patterns etc. for RTE purpose, extracted from parsed Reuter corpus.	1	Users
Sekine's Paraphrase Database	Collection of paraphrases	Department of Computer Science, New York University	Data-base created using Sekine's method, NOT cleaned up by human. It includes 19,975 sets of paraphrases with 191,572 phrases.	0	Users
Microsoft Research Paraphrase Corpus	Collection of paraphrases	Microsoft Research	Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.	0	Users
New resource			Participants are encouraged to contribute		Users
New resource			Participants are encouraged to contribute		Users

[*] The number of Users refers to participants in the last two RTE challenges. RTE-4 data are extracted both from the related proceedings and from the Knowledge Resources Questionnaire whereas RTE-3 data are extracted only from the Knowledge Resources Questionnaire.

Difference between revisions of "RTE Knowledge Resources"

Revision as of 07:17, 22 April 2009

Navigation menu

Search

@@ Line 205: / Line 205: @@
 |}
 <br>
-[*] The numbers refer to the Users in RTE4 (data extracted both from related proceedings and from RTE Knowledge Resources Questionnaire) and in RTE3 (data extracted only from RTE Knowledge Resources Questionnaire) challenges.
+[*] The number of Users refers to participants in the last two RTE challenges.
+RTE-4 data are extracted both from the related proceedings and from the Knowledge Resources Questionnaire whereas RTE-3 data are extracted only from the Knowledge Resources Questionnaire.