Difference between revisions of "File:Disco2011-shared-task-complete-dataset.zip"

From ACL Wiki
Jump to navigation Jump to search
(uploaded a new version of "File:Disco2011-shared-task-complete-dataset.zip": DISCO 2011 Complete Dataset (Training and Test Data, Eval Scripts))
 
(No difference)

Latest revision as of 11:09, 10 March 2014

This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score how literal the phrase was between 0 and 10. 4-5 different, randomly sampled sentences from the WaCKy corpora for UK English and German were presented to 4 workers each.

Phrases consist of two lemmas and come in three grammatical relations: - ADJ_NN: adjective modifying a noun - V_SUBJ: noun as a subject of a verb - V_OBJ: noun as an object of a verb Passive constructions were resolved to active constructions for relation assignment purposes.

Phrases were extracted semi-automatically. The relations were assigned by patterns and manually checked for validity. Phrases were selected in a way as to balance the data set while controlling for frequency.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeDimensionsUserComment
current11:09, 10 March 2014 (265 KB)Biem (talk | contribs)DISCO 2011 Complete Dataset (Training and Test Data, Eval Scripts)
02:59, 30 June 2011 (265 KB)Biem (talk | contribs)This archive contains data sets for compositionality judgments for English and German as well as the official scoring scripts. The data was collected from Amazon turk. Workers were presented a sentence with a bolded target phrase and were asked to score h

There are no pages that use this file.