Difference between revisions of "RTE5 - Ablation Tests"
m |
|||
(51 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | The following table lists the results of the ablation tests submitted by participants, which have been introduced as a mandatory track in the RTE5 campaign.<br><br> | ||
+ | The first column contains the specific resources which have been ablated.<br>The second column lists the Team Run in the form ''[name_of_the_Team][number_of_the_submitted_run].[submission_task]'' (e.g. BIU1.2way, Boeing3.3way).<br>The third and fourth columns present the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system. The third refers to the score obtained in the 2-way task, the fourth to that obtained in the 3-way task. For all the runs submitted as 3-way, also the 2-way derived accuracy has been calculated.<br> | ||
+ | Finally, the fifth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE5 proceedings.<br><br> | ||
+ | |||
+ | Participants are kindly invited to check if all the inserted information is correct and complete. | ||
+ | |||
+ | |||
{|class="wikitable sortable" cellpadding="3" cellspacing="0" border="1" | {|class="wikitable sortable" cellpadding="3" cellspacing="0" border="1" | ||
|- bgcolor="#CDCDCD" | |- bgcolor="#CDCDCD" | ||
! Ablated Resource | ! Ablated Resource | ||
− | ! Team Run | + | ! Team Run<ref>For further information about participants, click here: [[RTE Challenges - Data about participants]]</ref> |
− | ! <small> | + | ! <small>Resource impact - 2way</small> |
− | ! <small> | + | ! <small>Resource impact - 3way</small> |
! Resource Usage Description | ! Resource Usage Description | ||
Line 13: | Line 20: | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | | + | | The acronyms are expanded using the acronym database, so the acronyms are also matched with the expanded acronyms, and entailment is predicted accordingly |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 26: | Line 33: | ||
| BIU1.2way | | BIU1.2way | ||
| style="text-align: center;"| 1.33 | | style="text-align: center;"| 1.33 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Inference rules | | Inference rules | ||
Line 34: | Line 41: | ||
| style="text-align: center;"| -1.17 | | style="text-align: center;"| -1.17 | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | | + | | Verb paraphrases |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 44: | Line 51: | ||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
− | | Framenet | + | | Framenet+ <br/>WordNet |
| DLSIUAES1.2way | | DLSIUAES1.2way | ||
| style="text-align: center;"| 1.16 | | style="text-align: center;"| 1.16 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | Frame-to-frame similarity metric |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
− | | Framenet | + | | Framenet+ <br/>WordNet |
| DLSIUAES1.3way | | DLSIUAES1.3way | ||
| style="text-align: center;"| -0.17 | | style="text-align: center;"| -0.17 | ||
| style="text-align: center;"| -0.17 | | style="text-align: center;"| -0.17 | ||
− | | | + | | Frame-to-frame similarity metric |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 61: | Line 68: | ||
| UB.dmirg3.2way | | UB.dmirg3.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | If two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
| Grady Ward’s MOBY Thesaurus + <br>Roget's Thesaurus | | Grady Ward’s MOBY Thesaurus + <br>Roget's Thesaurus | ||
− | | | + | | Venses2.2way |
| style="text-align: center;"| 2.83 | | style="text-align: center;"| 2.83 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Semantic fields are used as semantic similarity matching, in all cases of non identical lemmas | | Semantic fields are used as semantic similarity matching, in all cases of non identical lemmas | ||
Line 86: | Line 93: | ||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
− | | NER | + | | NER (RASP Parser nertag) |
+ | | JU_CSE_TAC1.2way | ||
+ | | style="text-align: center;"| 0 | ||
+ | | style="text-align: center;"| — | ||
+ | | Named Entity match: measure based on the number of Nes in the hypothesis that match in the corresponding text. For named entity recognition, the RASP Parser (Briscoe et al., 2006) nertag component has been used. | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | NE component | ||
| UI_ccg1.2way | | UI_ccg1.2way | ||
| style="text-align: center;"| 4.83 | | style="text-align: center;"| 4.83 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Named Entity recognition/comparison | | Named Entity recognition/comparison | ||
Line 103: | Line 117: | ||
| QUANTA1.2way | | QUANTA1.2way | ||
| style="text-align: center;"| 0.67 | | style="text-align: center;"| 0.67 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| We use Named Entity similarity as a feature | | We use Named Entity similarity as a feature | ||
Line 110: | Line 124: | ||
| FBKirst1.2way | | FBKirst1.2way | ||
| style="text-align: center;"| 1.5 | | style="text-align: center;"| 1.5 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | A list of the 572 most frequent English words has been collected in order to prevent assigning high costs to the deletion/insertion of terms that are unlikely to bring relevant information to detect entailment,and to avoid substituting these terms with any content word. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 117: | Line 131: | ||
| PeMoZa3.2way | | PeMoZa3.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| | | | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 131: | Line 139: | ||
| PeMoZa3.2way | | PeMoZa3.2way | ||
| style="text-align: center;"| 0.66 | | style="text-align: center;"| 0.66 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| | | | ||
Line 138: | Line 146: | ||
| PeMoZa3.2way | | PeMoZa3.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| | | | ||
Line 146: | Line 154: | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
− | | | + | | VerbOcean relations are used to calculate relatedness between verbs in T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 153: | Line 161: | ||
| style="text-align: center;"| 0.33 | | style="text-align: center;"| 0.33 | ||
| style="text-align: center;"| 0.5 | | style="text-align: center;"| 0.5 | ||
− | | | + | | VerbOcean relations are used to calculate relatedness between verbs in T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 160: | Line 168: | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
− | | | + | | VerbOcean relations are used to calculate relatedness between verbs in T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 166: | Line 174: | ||
| FBKirst1.2way | | FBKirst1.2way | ||
| style="text-align: center;"| -0.16 | | style="text-align: center;"| -0.16 | ||
− | | style="text-align: center;"| - | + | | style="text-align: center;"| — |
− | + | | Extraction of 18232 entailment rules for all the English verbs connected by the ”stronger-than” relation. For instance, if ”kill [stronger-than] injure”, then the rule ”kill ENTAILS injure” is added to the rules repository. | |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 173: | Line 181: | ||
| QUANTA1.2way | | QUANTA1.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| We use "opposite-of" relation in VerbOcean as a feature | | We use "opposite-of" relation in VerbOcean as a feature | ||
Line 187: | Line 195: | ||
| BIU1.2way | | BIU1.2way | ||
| style="text-align: center;"| -1 | | style="text-align: center;"| -1 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Lexical rules extracted from Wikipedia definition sentences, title parenthesis, redirect and hyperlink relations | | Lexical rules extracted from Wikipedia definition sentences, title parenthesis, redirect and hyperlink relations | ||
Line 201: | Line 209: | ||
| FBKirst1.2way | | FBKirst1.2way | ||
| style="text-align: center;"| 1 | | style="text-align: center;"| 1 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Rules extracted from WP using Latent Semantic Analysis (LSA) | | Rules extracted from WP using Latent Semantic Analysis (LSA) | ||
Line 229: | Line 237: | ||
| BIU1.2way | | BIU1.2way | ||
| style="text-align: center;"| 2.5 | | style="text-align: center;"| 2.5 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Synonyms, hyponyms (2 levels away from the original term), hyponym_instance and derivations | | Synonyms, hyponyms (2 levels away from the original term), hyponym_instance and derivations | ||
Line 237: | Line 245: | ||
| style="text-align: center;"| 4 | | style="text-align: center;"| 4 | ||
| style="text-align: center;"| 5.67 | | style="text-align: center;"| 5.67 | ||
− | | | + | | Wordnet synonyms, hypernyms relationships between (senses of) words, "similar" (SIM), "pertains" (PER), and "derivational" (DER) links to recognize equivalence between T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 244: | Line 252: | ||
| style="text-align: center;"| -0.17 | | style="text-align: center;"| -0.17 | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | | + | | Argument alignment between T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 251: | Line 259: | ||
| style="text-align: center;"| 0.16 | | style="text-align: center;"| 0.16 | ||
| style="text-align: center;"| 0.34 | | style="text-align: center;"| 0.34 | ||
− | | | + | | Argument alignment between T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 258: | Line 266: | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
− | | | + | | Argument alignment between T and H |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 264: | Line 272: | ||
| DLSIUAES1.2way | | DLSIUAES1.2way | ||
| style="text-align: center;"| 0.83 | | style="text-align: center;"| 0.83 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Similarity between lemmata, computed by WordNet-based metrics | | Similarity between lemmata, computed by WordNet-based metrics | ||
Line 271: | Line 279: | ||
| DLSIUAES1.3way | | DLSIUAES1.3way | ||
| style="text-align: center;"| -0.5 | | style="text-align: center;"| -0.5 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| -0.33 |
| Similarity between lemmata, computed by WordNet-based metrics | | Similarity between lemmata, computed by WordNet-based metrics | ||
Line 278: | Line 286: | ||
| JU_CSE_TAC1.2way | | JU_CSE_TAC1.2way | ||
| style="text-align: center;"| 0.34 | | style="text-align: center;"| 0.34 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | WordNet based Unigram match | + | | WordNet based Unigram match: if any synset for the H unigram matches with any synset of a word in T then the hypothesis unigram is considered as a WordNet based unigram match. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 285: | Line 293: | ||
| PeMoZa1.2way | | PeMoZa1.2way | ||
| style="text-align: center;"| -0.5 | | style="text-align: center;"| -0.5 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Derivational Morphology from WordNet | | Derivational Morphology from WordNet | ||
Line 292: | Line 300: | ||
| PeMoZa1.2way | | PeMoZa1.2way | ||
| style="text-align: center;"| 1.33 | | style="text-align: center;"| 1.33 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Verb Entailment from Wordnet | | Verb Entailment from Wordnet | ||
Line 299: | Line 307: | ||
| PeMoZa2.2way | | PeMoZa2.2way | ||
| style="text-align: center;"| 1 | | style="text-align: center;"| 1 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Derivational Morphology from WordNet | | Derivational Morphology from WordNet | ||
Line 306: | Line 314: | ||
| PeMoZa2.2way | | PeMoZa2.2way | ||
| style="text-align: center;"| -0.33 | | style="text-align: center;"| -0.33 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| Verb Entailment from Wordnet | | Verb Entailment from Wordnet | ||
Line 313: | Line 321: | ||
| QUANTA1.2way | | QUANTA1.2way | ||
| style="text-align: center;"| -0.17 | | style="text-align: center;"| -0.17 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
| We use several relations from wordnet, such as synonyms, hyponym, hypernym et al. | | We use several relations from wordnet, such as synonyms, hyponym, hypernym et al. | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | WordNet | ||
+ | | Rhodes1.3way | ||
+ | | style="text-align: center;"| 3.17 | ||
+ | | style="text-align: center;"| 4 | ||
+ | | Lexicon based match: we chose a very simple metric: matching between words in T and H based on a path of distance at most 2 in the WordNet graph, using any links (hyponymy, hypernymy, meronymy, pertainymy, etc.) | ||
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 321: | Line 336: | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
| style="text-align: center;"| -0.83 | | style="text-align: center;"| -0.83 | ||
− | | The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet in the training and testing steps. | + | | The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet (namely, string similarity based on Levenshtein distance and semantic similarity) in the training and testing steps. |
Line 342: | Line 357: | ||
| UB.dmirg3.2way | | UB.dmirg3.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | Synonyms, hypernyms (2 levels away from the original term) |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 349: | Line 364: | ||
| UI_ccg1.2way | | UI_ccg1.2way | ||
| style="text-align: center;"| 4 | | style="text-align: center;"| 4 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | Word similarity == identity |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 356: | Line 371: | ||
| UB.dmirg3.2way | | UB.dmirg3.2way | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | | + | | WN: synonyms, hypernyms (2 levels away from the original term). FN: if two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 364: | Line 379: | ||
| style="text-align: center;"| 0 | | style="text-align: center;"| 0 | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
− | | | + | | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 371: | Line 386: | ||
| style="text-align: center;"| 0.5 | | style="text-align: center;"| 0.5 | ||
| style="text-align: center;"| 0.67 | | style="text-align: center;"| 0.67 | ||
− | | | + | | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 378: | Line 393: | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
| style="text-align: center;"| 0.17 | | style="text-align: center;"| 0.17 | ||
− | | | + | | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 391: | Line 406: | ||
| DLSIUAES1.2way | | DLSIUAES1.2way | ||
| style="text-align: center;"| 0.66 | | style="text-align: center;"| 0.66 | ||
− | | style="text-align: center;"| | + | | style="text-align: center;"| — |
− | | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by | + | | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves) |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 399: | Line 414: | ||
| style="text-align: center;"| -1 | | style="text-align: center;"| -1 | ||
| style="text-align: center;"| -0.5 | | style="text-align: center;"| -0.5 | ||
− | | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by | + | | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves) |
|- bgcolor="#ECECEC" "align="left" | |- bgcolor="#ECECEC" "align="left" | ||
Line 407: | Line 422: | ||
| style="text-align: center;"| 1.33 | | style="text-align: center;"| 1.33 | ||
| Synonymy, hyponymy and hypernymy and eXtended WordNet relation | | Synonymy, hyponymy and hypernymy and eXtended WordNet relation | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | System component | ||
+ | | DirRelCond3.2way | ||
+ | | style="text-align: center;"| 4.67 | ||
+ | | style="text-align: center;"| — | ||
+ | | The ablation test (abl-1) was meant to test one component of the most complex condition for entailment used in step 3 of the system | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | System component | ||
+ | | DirRelCond3.2way | ||
+ | | style="text-align: center;"| -1.5 | ||
+ | | style="text-align: center;"| — | ||
+ | | The ablation test (abl-2) was meant to test one component of the most complex condition for entailment used in step 3 of the system | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | System component | ||
+ | | DirRelCond3.2way | ||
+ | | style="text-align: center;"| 0.17 | ||
+ | | style="text-align: center;"| — | ||
+ | | The ablation test (abl-3) was meant to test one component of the most complex condition for entailment used in step 3 of the system | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | System component | ||
+ | | DirRelCond3.2way | ||
+ | | style="text-align: center;"| -1.16 | ||
+ | | style="text-align: center;"| — | ||
+ | | The ablation test (abl-4) was meant to test one component of the most complex condition for entailment used in step 3 of the system | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | System component | ||
+ | | DirRelCond3.2way | ||
+ | | style="text-align: center;"| 4.17 | ||
+ | | style="text-align: center;"| — | ||
+ | | The ablation test (abl-5) was meant to test one component of the most complex condition for entailment used in step 3 of the system | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | UAIC20091.3way | ||
+ | | style="text-align: center;"| 4.17 | ||
+ | | style="text-align: center;"| 4 | ||
+ | | Pre-processing module, using MINIPAR, TreeTagger tool and some transformations, e.g. ''hasn't'' > ''has not'' | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | DLSIUAES1.2way | ||
+ | | style="text-align: center;"| 1 | ||
+ | | style="text-align: center;"| — | ||
+ | | Everything ablated except lexical-based metrics | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | DLSIUAES1.2way | ||
+ | | style="text-align: center;"| 3.33 | ||
+ | | style="text-align: center;"| — | ||
+ | | Everything ablated except semantic-derived inferences | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | DLSIUAES1.3way | ||
+ | | style="text-align: center;"| -0.17 | ||
+ | | style="text-align: center;"| -0.33 | ||
+ | | Everything ablated except lexical-based metrics | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | DLSIUAES1.3way | ||
+ | | style="text-align: center;"| 2.33 | ||
+ | | style="text-align: center;"| 3.17 | ||
+ | | Everything ablated except semantic-derived inferences | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | FBKirst1.2way | ||
+ | | style="text-align: center;"| 2.84 | ||
+ | | style="text-align: center;"| — | ||
+ | | The automatic estimation of operation costs from run-1 modules was removed: the set of costs were assigned manually. | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | JU_CSE_TAC1.2way | ||
+ | | style="text-align: center;"| 0 | ||
+ | | style="text-align: center;"| — | ||
+ | | Skip bigram match | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | JU_CSE_TAC1.2way | ||
+ | | style="text-align: center;"| 0 | ||
+ | | style="text-align: center;"| — | ||
+ | | Bigram match | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | JU_CSE_TAC1.2way | ||
+ | | style="text-align: center;"| -0.5 | ||
+ | | style="text-align: center;"| — | ||
+ | | Longest Common Subsequence | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Stemmer | ||
+ | | JU_CSE_TAC1.2way | ||
+ | | style="text-align: center;"| -0.5 | ||
+ | | style="text-align: center;"| — | ||
+ | | Stemming, using WordNet stemmer | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa1.2way | ||
+ | | style="text-align: center;"| -2.5 | ||
+ | | style="text-align: center;"| — | ||
+ | | Idf score | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa1.2way | ||
+ | | style="text-align: center;"| -0.66 | ||
+ | | style="text-align: center;"| — | ||
+ | | Proper Noun Levenstain Distance | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa1.2way | ||
+ | | style="text-align: center;"| 0.34 | ||
+ | | style="text-align: center;"| — | ||
+ | | J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa2.2way | ||
+ | | style="text-align: center;"| 1 | ||
+ | | style="text-align: center;"| — | ||
+ | | Idf score | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa2.2way | ||
+ | | style="text-align: center;"| 0.17 | ||
+ | | style="text-align: center;"| — | ||
+ | | Proper Noun Levenstain Distance | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | PeMoZa2.2way | ||
+ | | style="text-align: center;"| 0.5 | ||
+ | | style="text-align: center;"| — | ||
+ | | J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | Rhodes1.3way | ||
+ | | style="text-align: center;"| -0.17 | ||
+ | | style="text-align: center;"| -0.17 | ||
+ | | Acronym match: we match words in all caps against sequences of capitalized words whose initial characters concatenate to form the acronym | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | Rhodes1.3way | ||
+ | | style="text-align: center;"| 3.33 | ||
+ | | style="text-align: center;"| 1.83 | ||
+ | | Proper nouns match: exact string match between T and H, for proper nouns | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | Rhodes1.3way | ||
+ | | style="text-align: center;"| 0.33 | ||
+ | | style="text-align: center;"| 0.17 | ||
+ | | Numbers match: exact string match between T and H, for numbers | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | Rhodes1.3way | ||
+ | | style="text-align: center;"| 3.17 | ||
+ | | style="text-align: center;"| 4 | ||
+ | | Edit-distance-based matching: 2 words match if 80% of the letters of a H word occur in one or more adjacent T words in the same order | ||
+ | |||
+ | |- bgcolor="#ECECEC" "align="left" | ||
+ | | Other | ||
+ | | UI_ccg1.2way | ||
+ | | style="text-align: center;"| 1 | ||
+ | | style="text-align: center;"| — | ||
+ | | Less sophisticated NE similarity metric: mainly Jaro-Winkler-based | ||
|} | |} | ||
+ | <br> | ||
+ | ==Footnotes== | ||
+ | <references /> | ||
+ | |||
+ | |||
+ | Return to [[RTE Knowledge Resources]] |
Latest revision as of 05:18, 25 June 2012
The following table lists the results of the ablation tests submitted by participants, which have been introduced as a mandatory track in the RTE5 campaign.
The first column contains the specific resources which have been ablated.
The second column lists the Team Run in the form [name_of_the_Team][number_of_the_submitted_run].[submission_task] (e.g. BIU1.2way, Boeing3.3way).
The third and fourth columns present the normalized difference between the accuracy of the complete system run and the accuracy of the ablation run (i.e. the output of the complete system without the ablated resource), showing the impact of the resource on the performance of the system. The third refers to the score obtained in the 2-way task, the fourth to that obtained in the 3-way task. For all the runs submitted as 3-way, also the 2-way derived accuracy has been calculated.
Finally, the fifth column contains a brief description of the specific usage of the resource. It is based on the information provided both in the "readme" files submitted together with the ablation tests and in the system reports published in the RTE5 proceedings.
Participants are kindly invited to check if all the inserted information is correct and complete.
Ablated Resource | Team Run[1] | Resource impact - 2way | Resource impact - 3way | Resource Usage Description |
---|---|---|---|---|
Acronym guide | Siel_093.3way | 0 | 0 | The acronyms are expanded using the acronym database, so the acronyms are also matched with the expanded acronyms, and entailment is predicted accordingly |
Acronym guide + UAIC_Acronym_rules |
UAIC20091.3way | 0.17 | 0.16 | We start from acronym-guide, but additional we use a rule that consider for expressions like Xaaaa Ybbbb Zcccc the acronym XYZ, regardless of length of text with this form. |
DIRT | BIU1.2way | 1.33 | — | Inference rules |
DIRT | Boeing3.3way | -1.17 | 0 | Verb paraphrases |
DIRT | UAIC20091.3way | 0.17 | 0.33 | We transform text and hypothesis with MINIPAR into dependency trees: use of DIRT relations to map verbs in T with verbs in H |
Framenet+ WordNet |
DLSIUAES1.2way | 1.16 | — | Frame-to-frame similarity metric |
Framenet+ WordNet |
DLSIUAES1.3way | -0.17 | -0.17 | Frame-to-frame similarity metric |
Framenet | UB.dmirg3.2way | 0 | — | If two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related. |
Grady Ward’s MOBY Thesaurus + Roget's Thesaurus |
Venses2.2way | 2.83 | — | Semantic fields are used as semantic similarity matching, in all cases of non identical lemmas |
MontyLingua Tool | Siel_093.3way | 0 | 0 | For the VerbOcean, the verbs have to be in the base form. We used the "MontyLingua" tool to convert the verbs into their base form |
NEGATION_rules by UAIC | UAIC20091.3way | 0 | -1.34 | Negation rules check in the dependency trees on verbs descending branches to see if some categories of words that change the meaning are found. |
NER (RASP Parser nertag) | JU_CSE_TAC1.2way | 0 | — | Named Entity match: measure based on the number of Nes in the hypothesis that match in the corresponding text. For named entity recognition, the RASP Parser (Briscoe et al., 2006) nertag component has been used. |
NE component | UI_ccg1.2way | 4.83 | — | Named Entity recognition/comparison |
PropBank | cswhu1.3way | 2 | 3.17 | syntactic and semantic parsing |
Stanford NER | QUANTA1.2way | 0.67 | — | We use Named Entity similarity as a feature |
Stopword list | FBKirst1.2way | 1.5 | — | A list of the 572 most frequent English words has been collected in order to prevent assigning high costs to the deletion/insertion of terms that are unlikely to bring relevant information to detect entailment,and to avoid substituting these terms with any content word. |
Training data from RTE1, 2, 3 | PeMoZa3.2way | 0 | — |
|
Training data from RTE2 | PeMoZa3.2way | 0.66 | — | |
Training data from RTE2, 3 | PeMoZa3.2way | 0 | — | |
VerbOcean | DFKI1.3way | 0 | 0.17 | VerbOcean relations are used to calculate relatedness between verbs in T and H |
VerbOcean | DFKI2.3way | 0.33 | 0.5 | VerbOcean relations are used to calculate relatedness between verbs in T and H |
VerbOcean | DFKI3.3way | 0.17 | 0.17 | VerbOcean relations are used to calculate relatedness between verbs in T and H |
VerbOcean | FBKirst1.2way | -0.16 | — | Extraction of 18232 entailment rules for all the English verbs connected by the ”stronger-than” relation. For instance, if ”kill [stronger-than] injure”, then the rule ”kill ENTAILS injure” is added to the rules repository. |
VerbOcean | QUANTA1.2way | 0 | — | We use "opposite-of" relation in VerbOcean as a feature |
VerbOcean | Siel_093.3way | 0 | 0 | Similarity/anthonymy/unrelatedness between verbs |
WikiPedia | BIU1.2way | -1 | — | Lexical rules extracted from Wikipedia definition sentences, title parenthesis, redirect and hyperlink relations |
WikiPedia | cswhu1.3way | 1.33 | 3.34 | Lexical semantic rules |
WikiPedia | FBKirst1.2way | 1 | — | Rules extracted from WP using Latent Semantic Analysis (LSA) |
WikiPedia | UAIC20091.3way | 1.17 | 1.5 | Relations between named entities |
Wikipedia + NER's (LingPipe, GATE) + Perl patterns |
UAIC20091.3way | 6.17 | 5 | NE module: NERs, in order to identify Persons, Locations, Jobs, Languages, etc; Perl patterns built by us for RTE4 in order to identify numbers and dates; our own resources extracted from Wikipedia in order to identify a "distance" between one name entity from hypothesis and name entities from text |
WordNet | AUEBNLP1.3way | -2 | -2.67 | Synonyms |
WordNet | BIU1.2way | 2.5 | — | Synonyms, hyponyms (2 levels away from the original term), hyponym_instance and derivations |
WordNet | Boeing3.3way | 4 | 5.67 | Wordnet synonyms, hypernyms relationships between (senses of) words, "similar" (SIM), "pertains" (PER), and "derivational" (DER) links to recognize equivalence between T and H |
WordNet | DFKI1.3way | -0.17 | 0 | Argument alignment between T and H |
WordNet | DFKI2.3way | 0.16 | 0.34 | Argument alignment between T and H |
WordNet | DFKI3.3way | 0.17 | 0.17 | Argument alignment between T and H |
WordNet | DLSIUAES1.2way | 0.83 | — | Similarity between lemmata, computed by WordNet-based metrics |
WordNet | DLSIUAES1.3way | -0.5 | -0.33 | Similarity between lemmata, computed by WordNet-based metrics |
WordNet | JU_CSE_TAC1.2way | 0.34 | — | WordNet based Unigram match: if any synset for the H unigram matches with any synset of a word in T then the hypothesis unigram is considered as a WordNet based unigram match. |
WordNet | PeMoZa1.2way | -0.5 | — | Derivational Morphology from WordNet |
WordNet | PeMoZa1.2way | 1.33 | — | Verb Entailment from Wordnet |
WordNet | PeMoZa2.2way | 1 | — | Derivational Morphology from WordNet |
WordNet | PeMoZa2.2way | -0.33 | — | Verb Entailment from Wordnet |
WordNet | QUANTA1.2way | -0.17 | — | We use several relations from wordnet, such as synonyms, hyponym, hypernym et al. |
WordNet | Rhodes1.3way | 3.17 | 4 | Lexicon based match: we chose a very simple metric: matching between words in T and H based on a path of distance at most 2 in the WordNet graph, using any links (hyponymy, hypernymy, meronymy, pertainymy, etc.) |
WordNet | Sagan1.3way | 0 | -0.83 | The system is based on machine learning approach. The ablation test was obtained with 2 less features using WordNet (namely, string similarity based on Levenshtein distance and semantic similarity) in the training and testing steps.
|
WordNet | Siel_093.3way | 0.34 | -0.17 | Similarity between nouns using WN tool |
WordNet | ssl1.3way | 0 | 0.67 | WordNet Analysis |
WordNet | UB.dmirg3.2way | 0 | — | Synonyms, hypernyms (2 levels away from the original term) |
WordNet | UI_ccg1.2way | 4 | — | Word similarity == identity |
WordNet + FrameNet |
UB.dmirg3.2way | 0 | — | WN: synonyms, hypernyms (2 levels away from the original term). FN: if two lexical items are covered in a single FrameNet frame, then the two items are treated as semantically related. |
WordNet + VerbOcean |
DFKI1.3way | 0 | 0.17 | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
WordNet + VerbOcean |
DFKI2.3way | 0.5 | 0.67 | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
WordNet + VerbOcean |
DFKI3.3way | 0.17 | 0.17 | VerbOcean is used to calculate relatedness between nominal predicates in T and H, after using WordNet to change the nouns into verbs. |
WordNet + VerbOcean |
UAIC20091.3way | 2 | 1.50 | Contradiction identification |
WordNet + VerbOcean + DLSIUAES_negation_list |
DLSIUAES1.2way | 0.66 | — | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves) |
WordNet + VerbOcean + DLSIUAES_negation_list |
DLSIUAES1.3way | -1 | -0.5 | Antonym relations between verbs (VO+WN); polarity based on negation terms (short list constructed by participant themselves) |
WordNet + XWordNet |
UAIC20091.3way | 1 | 1.33 | Synonymy, hyponymy and hypernymy and eXtended WordNet relation |
System component | DirRelCond3.2way | 4.67 | — | The ablation test (abl-1) was meant to test one component of the most complex condition for entailment used in step 3 of the system |
System component | DirRelCond3.2way | -1.5 | — | The ablation test (abl-2) was meant to test one component of the most complex condition for entailment used in step 3 of the system |
System component | DirRelCond3.2way | 0.17 | — | The ablation test (abl-3) was meant to test one component of the most complex condition for entailment used in step 3 of the system |
System component | DirRelCond3.2way | -1.16 | — | The ablation test (abl-4) was meant to test one component of the most complex condition for entailment used in step 3 of the system |
System component | DirRelCond3.2way | 4.17 | — | The ablation test (abl-5) was meant to test one component of the most complex condition for entailment used in step 3 of the system |
Other | UAIC20091.3way | 4.17 | 4 | Pre-processing module, using MINIPAR, TreeTagger tool and some transformations, e.g. hasn't > has not |
Other | DLSIUAES1.2way | 1 | — | Everything ablated except lexical-based metrics |
Other | DLSIUAES1.2way | 3.33 | — | Everything ablated except semantic-derived inferences |
Other | DLSIUAES1.3way | -0.17 | -0.33 | Everything ablated except lexical-based metrics |
Other | DLSIUAES1.3way | 2.33 | 3.17 | Everything ablated except semantic-derived inferences |
Other | FBKirst1.2way | 2.84 | — | The automatic estimation of operation costs from run-1 modules was removed: the set of costs were assigned manually. |
Other | JU_CSE_TAC1.2way | 0 | — | Skip bigram match |
Other | JU_CSE_TAC1.2way | 0 | — | Bigram match |
Other | JU_CSE_TAC1.2way | -0.5 | — | Longest Common Subsequence |
Stemmer | JU_CSE_TAC1.2way | -0.5 | — | Stemming, using WordNet stemmer |
Other | PeMoZa1.2way | -2.5 | — | Idf score |
Other | PeMoZa1.2way | -0.66 | — | Proper Noun Levenstain Distance |
Other | PeMoZa1.2way | 0.34 | — | J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives |
Other | PeMoZa2.2way | 1 | — | Idf score |
Other | PeMoZa2.2way | 0.17 | — | Proper Noun Levenstain Distance |
Other | PeMoZa2.2way | 0.5 | — | J&C (Jiang and Conrath, 1997) similarity score on nouns, adjectives |
Other | Rhodes1.3way | -0.17 | -0.17 | Acronym match: we match words in all caps against sequences of capitalized words whose initial characters concatenate to form the acronym |
Other | Rhodes1.3way | 3.33 | 1.83 | Proper nouns match: exact string match between T and H, for proper nouns |
Other | Rhodes1.3way | 0.33 | 0.17 | Numbers match: exact string match between T and H, for numbers |
Other | Rhodes1.3way | 3.17 | 4 | Edit-distance-based matching: 2 words match if 80% of the letters of a H word occur in one or more adjacent T words in the same order |
Other | UI_ccg1.2way | 1 | — | Less sophisticated NE similarity metric: mainly Jaro-Winkler-based |
Footnotes
- ↑ For further information about participants, click here: RTE Challenges - Data about participants
Return to RTE Knowledge Resources