File:TWSI397.zip

From ACL Wiki
Revision as of 18:05, 1 February 2010 by Biem (talk | contribs) (This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0. For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided he)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

TWSI397.zip(file size: 6.44 MB, MIME type: application/zip)

Warning: This file type may contain malicious code. By executing it, your system may be compromised.

This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0. For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided here: - "Substitutable words in context": Workers are presented a sentence with a target word and supply substitutions - "Are these words used with the same meaning?": Workers are presented a pair of sentences with the same target word marked in bold and can decide whether the meanings are identical, similar or different - "Match the Meaning" Workers are presented a sense inventory represented by prototypical sentences and align further sentences with the same target word to those senses.

The TWSI is organized by target word: For the most frequent 397 nouns in English Wikipedia (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word was used in this sense.

This data has been curated and extracted from the output of a turk bootstrapping acquisition cycle. Raw data is not included here, but is available upon request.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeDimensionsUserComment
current19:22, 1 February 2010 (6.44 MB)Biem (talk | contribs)The TWSI is organized by target word: For the most frequent 397 nouns in English Wikipedia (dump used from January 3rd, 2008), all targets are organized into senses. With each sense, there are associated substitutions and sentences where the target word w
18:05, 1 February 2010 (6.06 MB)Biem (talk | contribs)This file describes the data format of the TWSI (Turk bootstrap Word Sense Inventory) version 1.0. For the description of the process, please consult the paper for further documentation. In short, three Mturk tasks were used to yield the data provided he

There are no pages that use this file.