Iryna Gurevych


2019

pdf bib
A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking
Andreas Hanselowski | Christian Stab | Claudia Schulz | Zile Li | Iryna Gurevych
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Automated fact-checking based on machine learning is a promising approach to identify false information distributed on the web. In order to achieve satisfactory performance, machine learning methods require a large corpus with reliable annotations for the different tasks in the fact-checking process. Having analyzed existing fact-checking corpora, we found that none of them meets these criteria in full. They are either too small in size, do not provide detailed annotations, or are limited to a single domain. Motivated by this gap, we present a new substantially sized mixed-domain corpus with annotations of good quality for the core fact-checking tasks: document retrieval, evidence extraction, stance detection, and claim validation. To aid future corpus construction, we describe our methodology for corpus creation and annotation, and demonstrate that it results in substantial inter-annotator agreement. As baselines for future research, we perform experiments on our corpus with a number of model architectures that reach high performance in similar problem settings. Finally, to support the development of future models, we provide a detailed error analysis for each of the tasks. Our results show that the realistic, multi-domain setting defined by our data poses new challenges for the existing models, providing opportunities for considerable improvement by future systems.

pdf bib
Classification and Clustering of Arguments with Contextualized Word Embeddings
Nils Reimers | Benjamin Schiller | Tilman Beck | Johannes Daxenberger | Christian Stab | Iryna Gurevych
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We experiment with two recent contextualized word embedding methods (ELMo and BERT) in the context of open-domain argument search. For the first time, we show how to leverage the power of contextualized word embeddings to classify and cluster topic-dependent arguments, achieving impressive results on both tasks and across multiple datasets. For argument classification, we improve the state-of-the-art for the UKP Sentential Argument Mining Corpus by 20.8 percentage points and for the IBM Debater - Evidence Sentences dataset by 7.4 percentage points. For the understudied task of argument clustering, we propose a pre-training step which improves by 7.8 percentage points over strong baselines on a novel dataset, and by 12.3 percentage points for the Argument Facet Similarity (AFS) Corpus.

pdf bib
Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference
Tobias Falke | Leonardo F. R. Ribeiro | Prasetya Ajie Utama | Ido Dagan | Iryna Gurevych
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

While recent progress on abstractive summarization has led to remarkably fluent summaries, factual errors in generated summaries still severely limit their use in practice. In this paper, we evaluate summaries produced by state-of-the-art models via crowdsourcing and show that such errors occur frequently, in particular with more abstractive models. We study whether textual entailment predictions can be used to detect such errors and if they can be reduced by reranking alternative predicted summaries. That leads to an interesting downstream application for entailment models. In our experiments, we find that out-of-the-box entailment models trained on NLI datasets do not yet offer the desired performance for the downstream task and we therefore release our annotations as additional test data for future extrinsic evaluations of NLI.

pdf bib
Analysis of Automatic Annotation Suggestions for Hard Discourse-Level Tasks in Expert Domains
Claudia Schulz | Christian M. Meyer | Jan Kiesewetter | Michael Sailer | Elisabeth Bauer | Martin R. Fischer | Frank Fischer | Iryna Gurevych
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Many complex discourse-level tasks can aid domain experts in their work but require costly expert annotations for data creation. To speed up and ease annotations, we investigate the viability of automatically generated annotation suggestions for such tasks. As an example, we choose a task that is particularly hard for both humans and machines: the segmentation and classification of epistemic activities in diagnostic reasoning texts. We create and publish a new dataset covering two domains and carefully analyse the suggested annotations. We find that suggestions have positive effects on annotation speed and performance, while not introducing noteworthy biases. Envisioning suggestion models that improve with newly annotated texts, we contrast methods for continuous model adjustment and suggest the most effective setup for suggestions in future expert tasks.

pdf bib
Predicting Humorousness and Metaphor Novelty with Gaussian Process Preference Learning
Edwin Simpson | Erik-Lân Do Dinh | Tristan Miller | Iryna Gurevych
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The inability to quantify key aspects of creative language is a frequent obstacle to natural language understanding. To address this, we introduce novel tasks for evaluating the creativeness of language—namely, scoring and ranking text by humorousness and metaphor novelty. To sidestep the difficulty of assigning discrete labels or numeric scores, we learn from pairwise comparisons between texts. We introduce a Bayesian approach for predicting humorousness and metaphor novelty using Gaussian process preference learning (GPPL), which achieves a Spearman’s ρ of 0.56 against gold using word embeddings and linguistic features. Our experiments show that given sparse, crowdsourced annotation data, ranking using GPPL outperforms best–worst scaling. We release a new dataset for evaluating humour containing 28,210 pairwise comparisons of 4,030 texts, and make our software freely available.

pdf bib
A Bayesian Approach for Sequence Tagging with Crowds
Edwin D. Simpson | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation error. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art, and that Bayesian approaches outperform non-Bayesian alternatives. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.

pdf bib
Neural Duplicate Question Detection without Labeled Training Data
Andreas Rücklé | Nafise Sadat Moosavi | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Supervised training of neural models to duplicate question detection in community Question Answering (CQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the CQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train CQA answer selection models without direct answer supervision.

pdf bib
Better Rewards Yield Better Summaries: Learning to Summarise Without References
Florian Böhm | Yang Gao | Christian M. Meyer | Ori Shapira | Ido Dagan | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Reinforcement Learning (RL)based document summarisation systems yield state-of-the-art performance in terms of ROUGE scores, because they directly use ROUGE as the rewards during training. However, summaries with high ROUGE scores often receive low human judgement. To find a better reward function that can guide RL to generate human-appealing summaries, we learn a reward function from human ratings on 2,500 summaries. Our reward function only takes the document and system summary as input. Hence, once trained, it can be used to train RL based summarisation systems without using any reference summaries. We show that our learned rewards have significantly higher correlation with human ratings than previous approaches. Human evaluation experiments show that, compared to the state-of-the-art supervised-learning systems and ROUGE-as-rewards RL summarisation systems, the RL systems using our learned rewards during training generate summaries with higher human ratings. The learned reward function and our source code are available at https://github.com/yg211/summary-reward-no-reference.

pdf bib
Enhancing AMR-to-Text Generation with Dual Graph Representations
Leonardo F. R. Ribeiro | Claire Gardent | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating text from graph-based data, such as Abstract Meaning Representation (AMR), is a challenging task due to the inherent difficulty in how to properly encode the structure of a graph with labeled edges. To address this difficulty, we propose a novel graph-to-sequence model that encodes different but complementary perspectives of the structural information contained in the AMR graph. The model learns parallel top-down and bottom-up representations of nodes capturing contrasting views of the graph. We also investigate the use of different node message passing strategies, employing different state-of-the-art graph encoders to compute node representations based on incoming and outgoing perspectives. In our experiments, we demonstrate that the dual graph representation leads to improvements in AMR-to-text generation, achieving state-of-the-art results on two AMR datasets

pdf bib
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

pdf bib
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning
Jonas Pfeiffer | Christian M. Meyer | Claudia Schulz | Jan Kiesewetter | Jan Zottmann | Michael Sailer | Elisabeth Bauer | Frank Fischer | Martin R. Fischer | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Our proposed system FAMULUS helps students learn to diagnose based on automatic feedback in virtual patient simulations, and it supports instructors in labeling training data. Diagnosing is an exceptionally difficult skill to obtain but vital for many different professions (e.g., medical doctors, teachers). Previous case simulation systems are limited to multiple-choice questions and thus cannot give constructive individualized feedback on a student’s diagnostic reasoning process. Given initially only limited data, we leverage a (replaceable) NLP model to both support experts in their further data annotation with automatic suggestions, and we provide automatic feedback for students. We argue that because the central model consistently improves, our interactive approach encourages both students and instructors to recurrently use the tool, and thus accelerate the speed of data creation and annotation. We show results from two user studies on diagnostic reasoning in medicine and teacher education and outline how our system can be extended to further use cases.

pdf bib
LINSPECTOR WEB: A Multilingual Probing Suite for Word Representations
Max Eichler | Gözde Gül Şahin | Iryna Gurevych
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

We present LINSPECTOR WEB , an open source multilingual inspector to analyze word representations. Our system provides researchers working in low-resource settings with an easily accessible web based probing tool to gain quick insights into their word embeddings especially outside of the English language. To do this we employ 16 simple linguistic probing tasks such as gender, case marking, and tense for a diverse set of 28 languages. We support probing of static word embeddings along with pretrained AllenNLP models that are commonly used for NLP downstream tasks such as named entity recognition, natural language inference and dependency parsing. The results are visualized in a polar chart and also provided as a table. LINSPECTOR WEB is available as an offline tool or at https://linspector.ukp.informatik.tu-darmstadt.de.

pdf bib
Pitfalls in the Evaluation of Sentence Embeddings
Steffen Eger | Andreas Rücklé | Iryna Gurevych
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Deep learning models continuously break new records across different NLP tasks. At the same time, their success exposes weaknesses of model evaluation. Here, we compile several key pitfalls of evaluation of sentence embeddings, a currently very popular NLP paradigm. These pitfalls include the comparison of embeddings of different sizes, normalization of embeddings, and the low (and diverging) correlations between transfer and probing tasks. Our motivation is to challenge the current evaluation of sentence embeddings and to provide an easy-to-access reference for future research. Based on our insights, we also recommend better practices for better future evaluations of sentence embeddings.

pdf bib
Revisiting the Binary Linearization Technique for Surface Realization
Yevgeniy Puzikov | Claire Gardent | Ido Dagan | Iryna Gurevych
Proceedings of the 12th International Conference on Natural Language Generation

End-to-end neural approaches have achieved state-of-the-art performance in many natural language processing (NLP) tasks. Yet, they often lack transparency of the underlying decision-making process, hindering error analysis and certain model improvements. In this work, we revisit the binary linearization approach to surface realization, which exhibits more interpretable behavior, but was falling short in terms of prediction accuracy. We show how enriching the training data to better capture word order constraints almost doubles the performance of the system. We further demonstrate that encoding both local and global prediction contexts yields another considerable performance boost. With the proposed modifications, the system which ranked low in the latest shared task on multilingual surface realization now achieves best results in five out of ten languages, while being on par with the state-of-the-art approaches in others.

pdf bib
Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization
Tobias Falke | Iryna Gurevych
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Concept map-based multi-document summarization has recently been proposed as a variant of the traditional summarization task with graph-structured summaries. As shown by previous work, the grouping of coreferent concept mentions across documents is a crucial subtask of it. However, while the current state-of-the-art method suggested a new grouping method that was shown to improve the summary quality, its use of pairwise comparisons leads to polynomial runtime complexity that prohibits the application to large document collections. In this paper, we propose two alternative grouping techniques based on locality sensitive hashing, approximate nearest neighbor search and a fast clustering algorithm. They exhibit linear and log-linear runtime complexity, making them much more scalable. We report experimental results that confirm the improved runtime behavior while also showing that the quality of the summary concept maps remains comparable.

pdf bib
Does My Rebuttal Matter? Insights from a Major NLP Conference
Yang Gao | Steffen Eger | Ilia Kuznetsov | Iryna Gurevych | Yusuke Miyao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Peer review is a core element of the scientific process, particularly in conference-centered fields such as ML and NLP. However, only few studies have evaluated its properties empirically. Aiming to fill this gap, we present a corpus that contains over 4k reviews and 1.2k author responses from ACL-2018. We quantitatively and qualitatively assess the corpus. This includes a pilot study on paper weaknesses given by reviewers and on quality of author responses. We then focus on the role of the rebuttal phase, and propose a novel task to predict after-rebuttal (i.e., final) scores from initial reviews and author responses. Although author responses do have a marginal (and statistically significant) influence on the final scores, especially for borderline papers, our results suggest that a reviewer’s final score is largely determined by her initial score and the distance to the other reviewers’ initial scores. In this context, we discuss the conformity bias inherent to peer reviewing, a bias that has largely been overlooked in previous research. We hope our analyses will help better assess the usefulness of the rebuttal phase in NLP conferences.

pdf bib
Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems
Steffen Eger | Gözde Gül Şahin | Andreas Rücklé | Ji-Ung Lee | Claudia Schulz | Mohsen Mesgar | Krishnkant Swarnkar | Edwin Simpson | Iryna Gurevych
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Visual modifications to text are often used to obfuscate offensive comments in social media (e.g., “!d10t”) or as a writing style (“1337” in “leet speak”), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate. We investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82%. We then explore three shielding methods—visual character embeddings, adversarial training, and rule-based recovery—which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

pdf bib
A Streamlined Method for Sourcing Discourse-level Argumentation Annotations from the Crowd
Tristan Miller | Maria Sukhareva | Iryna Gurevych
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The study of argumentation and the development of argument mining tools depends on the availability of annotated data, which is challenging to obtain in sufficient quantity and quality. We present a method that breaks down a popular but relatively complex discourse-level argument annotation scheme into a simpler, iterative procedure that can be applied even by untrained annotators. We apply this method in a crowdsourcing setup and report on the reliability of the annotations obtained. The source code for a tool implementing our annotation method, as well as the sample data we obtained (4909 gold-standard annotations across 982 documents), are freely released to the research community. These are intended to serve the needs of qualitative research into argumentation, as well as of data-driven approaches to argument mining.

2018

pdf bib
SemEval-2018 Task 12: The Argument Reasoning Comprehension Task
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of The 12th International Workshop on Semantic Evaluation

A natural language argument is composed of a claim as well as reasons given as premises for the claim. The warrant explaining the reasoning is usually left implicit, as it is clear from the context and common sense. This makes a comprehension of arguments easy for humans but hard for machines. This paper summarizes the first shared task on argument reasoning comprehension. Given a premise and a claim along with some topic information, the goal was to automatically identify the correct warrant among two candidates that are plausible and lexically close, but in fact imply opposite claims. We describe the dataset with 1970 instances that we built for the task, and we outline the 21 computational approaches that participated, most of which used neural networks. The results reveal the complexity of the task, with many approaches hardly improving over the random accuracy of about 0.5. Still, the best observed accuracy (0.712) underlines the principle feasibility of identifying warrants. Our analysis indicates that an inclusion of external knowledge is key to reasoning comprehension.

pdf bib
Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
Daniil Sorokin | Iryna Gurevych
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.

pdf bib
A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning
Hatem Mousselly-Sergieh | Teresa Botschen | Iryna Gurevych | Stefan Roth
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Current methods for knowledge graph (KG) representation learning focus solely on the structure of the KG and do not exploit any kind of external information, such as visual and linguistic information corresponding to the KG entities. In this paper, we propose a multimodal translation-based approach that defines the energy of a KG triple as the sum of sub-energy functions that leverage both multimodal (visual and linguistic) and structural KG representations. Next, a ranking-based loss is minimized using a simple neural network architecture. Moreover, we introduce a new large-scale dataset for multimodal KG representation learning. We compared the performance of our approach to other baselines on two standard tasks, namely knowledge graph completion and triple classification, using our as well as the WN9-IMG dataset. The results demonstrate that our approach outperforms all baselines on both tasks and datasets.

pdf bib
From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources
Ilia Kuznetsov | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

Distributional word representations (often referred to as word embeddings) are omnipresent in modern NLP. Early work has focused on building representations for word types, and recent studies show that lemmatization and part of speech (POS) disambiguation of targets in isolation improve the performance of word embeddings on a range of downstream tasks. However, the reasons behind these improvements, the qualitative effects of these operations and the combined performance of lemmatized and POS disambiguated targets are less studied. This work aims to close this gap and puts previous findings into a general perspective. We examine the effect of lemmatization and POS typing on word embedding performance in a novel resource-based evaluation scenario, as well as on standard similarity benchmarks. We show that these two operations have complimentary qualitative and vocabulary-level effects and are best used in combination. We find that the improvement is more pronounced for verbs and show how lemmatization and POS typing implicitly target some of the verb-specific issues. We claim that the observed improvement is a result of better conceptual alignment between word embeddings and lexical resources, stressing the need for conceptually plausible modeling of word embedding targets.

pdf bib
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Steffen Eger | Johannes Daxenberger | Christian Stab | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at http://github.com/UKPLab/coling2018-xling_argument_mining.

pdf bib
Killing Four Birds with Two Stones: Multi-Task Learning for Non-Literal Language Detection
Erik-Lân Do Dinh | Steffen Eger | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

Non-literal language phenomena such as idioms or metaphors are commonly studied in isolation from each other in NLP. However, often similar definitions and features are being used for different phenomena, challenging the distinction. Instead, we propose to view the detection problem as a generalized non-literal language classification problem. In this paper we investigate multi-task learning for related non-literal language phenomena. We show that in contrast to simply joining the data of multiple tasks, multi-task learning consistently improves upon four metaphor and idiom detection tasks in two languages, English and German. Comparing two state-of-the-art multi-task learning architectures, we also investigate when soft parameter sharing and learned information flow can be beneficial for our related tasks. We make our adapted code publicly available.

pdf bib
A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
Andreas Hanselowski | Avinesh PVS | Benjamin Schiller | Felix Caspelherr | Debanjan Chaudhuri | Christian M. Meyer | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1’s experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1’s proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods’ ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research.

pdf bib
Multimodal Grounding for Language Processing
Lisa Beinborn | Teresa Botschen | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.

pdf bib
Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering
Daniil Sorokin | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

The most approaches to Knowledge Base Question Answering are based on semantic parsing. In this paper, we address the problem of learning vector representations for complex semantic parses that consist of multiple entities and relations. Previous work largely focused on selecting the correct semantic relations for a question and disregarded the structure of the semantic parse: the connections between entities and the directions of the relations. We propose to use Gated Graph Neural Networks to encode the graph structure of the semantic parse. We show on two data sets that the graph networks outperform all baseline models that do not explicitly model the structure. The error analysis confirms that our approach can successfully process complex semantic parses.

pdf bib
The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation
Jan-Christoph Klie | Michael Bugert | Beto Boullosa | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software.

pdf bib
Weeding out Conventionalized Metaphors: A Corpus of Novel Metaphor Annotations
Erik-Lân Do Dinh | Hannah Wieland | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We encounter metaphors every day, but only a few jump out on us and make us stumble. However, little effort has been devoted to investigating more novel metaphors in comparison to general metaphor detection efforts. We attribute this gap primarily to the lack of larger datasets that distinguish between conventionalized, i.e., very common, and novel metaphors. The goal of this paper is to alleviate this situation by introducing a crowdsourced novel metaphor annotation layer for an existing metaphor corpus. Further, we analyze our corpus and investigate correlations between novelty and features that are typically used in metaphor detection, such as concreteness ratings and more semantic features like the Potential for Metaphoricity. Finally, we present a baseline approach to assess novelty in metaphors based on our annotations.

pdf bib
Cross-topic Argument Mining from Heterogeneous Sources
Christian Stab | Tristan Miller | Benjamin Schiller | Pranav Rai | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Argument mining is a core technology for automating argument search in large document collections. Despite its usefulness for this task, most current approaches are designed for use only with specific text types and fall short when applied to heterogeneous texts. In this paper, we propose a new sentential annotation scheme that is reliably applicable by crowd workers to arbitrary Web texts. We source annotations for over 25,000 instances covering eight controversial topics. We show that integrating topic information into bidirectional long short-term memory networks outperforms vanilla BiLSTMs by more than 3 percentage points in F1 in two- and three-label cross-topic settings. We also show that these results can be further improved by leveraging additional data for topic relevance using multi-task learning.

pdf bib
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
Yang Gao | Christian M. Meyer | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a method to perform automatic document summarisation without using reference summaries. Instead, our method interactively learns from users’ preferences. The merit of preference-based interactive summarisation is that preferences are easier for users to provide than reference summaries. Existing preference-based interactive learning methods suffer from high sample complexity, i.e. they need to interact with the oracle for many rounds in order to converge. In this work, we propose a new objective function, which enables us to leverage active learning, preference learning and reinforcement learning techniques in order to reduce the sample complexity. Both simulation and real-user experiments suggest that our method significantly advances the state of the art. Our source code is freely available at https://github.com/UKPLab/emnlp2018-april.

pdf bib
Is it Time to Swish? Comparing Deep Learning Activation Functions Across NLP tasks
Steffen Eger | Paul Youssef | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning. One of the currently most popular activation functions is ReLU, but several competitors have recently been proposed or ‘discovered’, including LReLU functions and swish. While most works compare newly proposed activation functions on few tasks (usually from image classification) and against few competitors (usually ReLU), we perform the first largescale comparison of 21 activation functions across eight different NLP tasks. We find that a largely unknown activation function performs most stably across all tasks, the so-called penalized tanh function. We also show that it can successfully replace the sigmoid and tanh gates in LSTM cells, leading to a 2 percentage point (pp) improvement over the standard choices on a challenging NLP task.

pdf bib
Interactive Instance-based Evaluation of Knowledge Base Question Answering
Daniil Sorokin | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Most approaches to Knowledge Base Question Answering are based on semantic parsing. In this paper, we present a tool that aids in debugging of question answering systems that construct a structured semantic representation for the input question. Previous work has largely focused on building question answering interfaces or evaluation frameworks that unify multiple data sets. The primary objective of our system is to enable interactive debugging of model predictions on individual instances (questions) and to simplify manual error analysis. Our interactive interface helps researchers to understand the shortcomings of a particular model, qualitatively analyze the complete pipeline and compare different models. A set of sit-by sessions was used to validate our interface design.

pdf bib
Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform
Beto Boullosa | Richard Eckart de Castilho | Naveen Kumar | Jan-Christoph Klie | Iryna Gurevych
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Annotating entity mentions and linking them to a knowledge resource are essential tasks in many domains. It disambiguates mentions, introduces cross-document coreferences, and the resources contribute extra information, e.g. taxonomic relations. Such tasks benefit from text annotation tools that integrate a search which covers the text, the annotations, as well as the knowledge resource. However, to the best of our knowledge, no current tools integrate knowledge-supported search as well as entity linking support. We address this gap by introducing knowledge-supported search functionality into the INCEpTION text annotation platform. In our approach, cross-document references are created by linking entity mentions to a knowledge base in the form of a structured hierarchical vocabulary. The resulting annotations are then indexed to enable fast and yet complex queries taking into account the text, the annotations, and the vocabulary structure.

pdf bib
Event Time Extraction with a Decision Tree of Neural Classifiers
Nils Reimers | Nazanin Dehghani | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 6

Extracting the information from text when an event happened is challenging. Documents do not only report on current events, but also on past events as well as on future events. Often, the relevant time information for an event is scattered across the document. In this paper we present a novel method to automatically anchor events in time. To our knowledge it is the first approach that takes temporal information from the complete document into account. We created a decision tree that applies neural network based classifiers at its nodes. We use this tree to incrementally infer, in a stepwise manner, at which time frame an event happened. We evaluate the approach on the TimeBank-EventTime Corpus (Reimers et al., 2016) achieving an accuracy of 42.0% compared to an inter-annotator agreement (IAA) of 56.7%. For events that span over a single day we observe an accuracy improvement of 33.1 points compared to the state-of-the-art CAEVO system (Chambers et al., 2014). Without retraining, we apply this model to the SemEval-2015 Task 4 on automatic timeline generation and achieve an improvement of 4.01 points F1-score compared to the state-of-the-art. Our code is publically available.

pdf bib
Finding Convincing Arguments Using Scalable Bayesian Preference Learning
Edwin Simpson | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 6

We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard ratings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy training data, but have not previously been used to identify convincing arguments. One issue is scalability, which we address by developing a stochastic variational inference method for Gaussian process (GP) preference learning. We show how our method can be applied to predict argument convincingness from crowdsourced data, outperforming the previous state-of-the-art, particularly when trained with small amounts of unreliable data. We demonstrate how the Bayesian approach enables more effective active learning, thereby reducing the amount of data required to identify convincing arguments for new users and domains. While word embeddings are principally used with neural networks, our results show that word embeddings in combination with linguistic features also benefit GPs when predicting argument convincingness.

pdf bib
BinLin: A Simple Method of Dependency Tree Linearization
Yevgeniy Puzikov | Iryna Gurevych
Proceedings of the First Workshop on Multilingual Surface Realisation

Surface Realization Shared Task 2018 is a workshop on generating sentences from lemmatized sets of dependency triples. This paper describes the results of our participation in the challenge. We develop a data-driven pipeline system which first orders the lemmas and then conjugates the words to finish the surface realization process. Our contribution is a novel sequential method of ordering lemmas, which, despite its simplicity, achieves promising results. We demonstrate the effectiveness of the proposed approach, describe its limitations and outline ways to improve it.

pdf bib
One Size Fits All? A simple LSTM for non-literal token and construction-level classification
Erik-Lân Do Dinh | Steffen Eger | Iryna Gurevych
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In this paper, we tackle four different tasks of non-literal language classification: token and construction level metaphor detection, classification of idiomatic use of infinitive-verb compounds, and classification of non-literal particle verbs. One of the tasks operates on the token level, while the three other tasks classify constructions such as “hot topic” or “stehen lassen” (“to allow sth. to stand” vs. “to abandon so.”). The two metaphor detection tasks are in English, while the two non-literal language detection tasks are in German. We propose a simple context-encoding LSTM model and show that it outperforms the state-of-the-art on two tasks. Additionally, we experiment with different embeddings for the token level metaphor detection task and find that 1) their performance varies according to the genre, and 2) word2vec embeddings perform best on 3 out of 4 genres, despite being one of the simplest tested model. In summary, we present a large-scale analysis of a neural model for non-literal language classification (i) at different granularities, (ii) in different languages, (iii) over different non-literal language phenomena.

pdf bib
Frame- and Entity-Based Knowledge for Common-Sense Argumentative Reasoning
Teresa Botschen | Daniil Sorokin | Iryna Gurevych
Proceedings of the 5th Workshop on Argument Mining

Common-sense argumentative reasoning is a challenging task that requires holistic understanding of the argumentation where external knowledge about the world is hypothesized to play a key role. We explore the idea of using event knowledge about prototypical situations from FrameNet and fact knowledge about concrete entities from Wikidata to solve the task. We find that both resources can contribute to an improvement over the non-enriched approach and point out two persisting challenges: first, integration of many annotations of the same type, and second, fusion of complementary annotations. After our explorations, we question the key role of external world knowledge with respect to the argumentative reasoning task and rather point towards a logic-based analysis of the chain of reasoning.

pdf bib
PD3: Better Low-Resource Cross-Lingual Transfer By Combining Direct Transfer and Annotation Projection
Steffen Eger | Andreas Rücklé | Iryna Gurevych
Proceedings of the 5th Workshop on Argument Mining

We consider unsupervised cross-lingual transfer on two tasks, viz., sentence-level argumentation mining and standard POS tagging. We combine direct transfer using bilingual embeddings with annotation projection, which projects labels across unlabeled parallel data. We do so by either merging respective source and target language datasets or alternatively by using multi-task learning. Our combination strategy considerably improves upon both direct transfer and projection with few available parallel sentences, the most realistic scenario for many low-resource target languages.

pdf bib
Cross-Lingual Argumentative Relation Identification: from English to Portuguese
Gil Rocha | Christian Stab | Henrique Lopes Cardoso | Iryna Gurevych
Proceedings of the 5th Workshop on Argument Mining

Argument mining aims to detect and identify argument structures from textual resources. In this paper, we aim to address the task of argumentative relation identification, a subtask of argument mining, for which several approaches have been recently proposed in a monolingual setting. To overcome the lack of annotated resources in less-resourced languages, we present the first attempt to address this subtask in a cross-lingual setting. We compare two standard strategies for cross-language learning, namely: projection and direct-transfer. Experimental results show that by using unsupervised language adaptation the proposed approaches perform at a competitive level when compared with fully-supervised in-language learning settings.

pdf bib
UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification
Andreas Hanselowski | Hao Zhang | Zile Li | Daniil Sorokin | Benjamin Schiller | Claudia Schulz | Iryna Gurevych
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

The Fact Extraction and VERification (FEVER) shared task was launched to support the development of systems able to verify claims by extracting supporting or refuting facts from raw text. The shared task organizers provide a large-scale dataset for the consecutive steps involved in claim verification, in particular, document retrieval, fact extraction, and claim classification. In this paper, we present our claim verification pipeline approach, which, according to the preliminary results, scored third in the shared task, out of 23 competing systems. For the document retrieval, we implemented a new entity linking approach. In order to be able to rank candidate facts and classify a claim on the basis of several selected facts, we introduce two extensions to the Enhanced LSTM (ESIM).

pdf bib
E2E NLG Challenge: Neural Models vs. Templates
Yevgeniy Puzikov | Iryna Gurevych
Proceedings of the 11th International Conference on Natural Language Generation

E2E NLG Challenge is a shared task on generating restaurant descriptions from sets of key-value pairs. This paper describes the results of our participation in the challenge. We develop a simple, yet effective neural encoder-decoder model which produces fluent restaurant descriptions and outperforms a strong baseline. We further analyze the data provided by the organizers and conclude that the task can also be approached with a template-based model developed in just a few hours.

pdf bib
A Legal Perspective on Training Models for Natural Language Processing
Richard Eckart de Castilho | Giulia Dore | Thomas Margoni | Penny Labropoulou | Iryna Gurevych
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Adapting Serious Game for Fallacious Argumentation to German: Pitfalls, Insights, and Best Practices
Ivan Habernal | Patrick Pauli | Iryna Gurevych
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Corpus-Driven Thematic Hierarchy Induction
Ilia Kuznetsov | Iryna Gurevych
Proceedings of the 22nd Conference on Computational Natural Language Learning

Thematic role hierarchy is a widely used linguistic tool to describe interactions between semantic roles and their syntactic realizations. Despite decades of dedicated research and numerous thematic hierarchy suggestions in the literature, this concept has not been used in NLP so far due to incompatibility and limited scope of existing hierarchies. We introduce an empirical framework for thematic hierarchy induction and evaluate several role ranking strategies on English and German full-text corpus data. We hypothesize that global thematic hierarchy induction is feasible, that a hierarchy can be induced from just fractions of training data and that resulting hierarchies apply cross-lingually. We evaluate these assumptions empirically.

pdf bib
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Iryna Gurevych | Yusuke Miyao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Iryna Gurevych | Yusuke Miyao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Arguing without committing a fallacy is one of the main requirements of an ideal debate. But even when debating rules are strictly enforced and fallacious arguments punished, arguers often lapse into attacking the opponent by an ad hominem argument. As existing research lacks solid empirical investigation of the typology of ad hominem arguments as well as their potential causes, this paper fills this gap by (1) performing several large-scale annotation studies, (2) experimenting with various neural architectures and validating our working hypotheses, such as controversy or reasonableness, and (3) providing linguistic insights into triggers of ad hominem using explainable neural network architectures.

pdf bib
Multimodal Frame Identification with Multilingual Evaluation
Teresa Botschen | Iryna Gurevych | Jan-Christoph Klie | Hatem Mousselly-Sergieh | Stefan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

An essential step in FrameNet Semantic Role Labeling is the Frame Identification (FrameId) task, which aims at disambiguating a situation around a predicate. Whilst current FrameId methods rely on textual representations only, we hypothesize that FrameId can profit from a richer understanding of the situational context. Such contextual information can be obtained from common sense knowledge, which is more present in images than in text. In this paper, we extend a state-of-the-art FrameId system in order to effectively leverage multimodal representations. We conduct a comprehensive evaluation on the English FrameNet and its German counterpart SALSA. Our analysis shows that for the German data, textual representations are still competitive with multimodal ones. However on the English data, our multimodal FrameId approach outperforms its unimodal counterpart, setting a new state of the art. Its benefits are particularly apparent in dealing with ambiguous and rare instances, the main source of errors of current systems. For research purposes, we release (a) the implementation of our system, (b) our evaluation splits for SALSA 2.0, and (c) the embeddings for synsets and IMAGINED words.

pdf bib
The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants
Ivan Habernal | Henning Wachsmuth | Iryna Gurevych | Benno Stein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.

pdf bib
Multi-Task Learning for Argumentation Mining in Low-Resource Settings
Claudia Schulz | Steffen Eger | Johannes Daxenberger | Tobias Kahse | Iryna Gurevych
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We investigate whether and where multi-task learning (MTL) can improve performance on NLP problems related to argumentation mining (AM), in particular argument component identification. Our results show that MTL performs particularly well (and better than single-task learning) when little training data is available for the main task, a common scenario in AM. Our findings challenge previous assumptions that conceptualizations across AM datasets are divergent and that MTL is difficult for semantic or higher-level tasks.

pdf bib
Objective Function Learning to Match Human Judgements for Optimization-Based Summarization
Maxime Peyrard | Iryna Gurevych
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Supervised summarization systems usually rely on supervision at the sentence or n-gram level provided by automatic metrics like ROUGE, which act as noisy proxies for human judgments. In this work, we learn a summary-level scoring function 𝜃 including human judgments as supervision and automatically generated data as regularization. We extract summaries with a genetic algorithm using 𝜃 as a fitness function. We observe strong and promising performances across datasets in both automatic and manual evaluation.

pdf bib
ArgumenText: Searching for Arguments in Heterogeneous Sources
Christian Stab | Johannes Daxenberger | Chris Stahlhut | Tristan Miller | Benjamin Schiller | Christopher Tauchmann | Steffen Eger | Iryna Gurevych
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Argument mining is a core technology for enabling argument search in large corpora. However, most current approaches fall short when applied to heterogeneous texts. In this paper, we present an argument retrieval system capable of retrieving sentential arguments for any given controversial topic. By analyzing the highest-ranked results extracted from Web sources, we found that our system covers 89% of arguments found in expert-curated lists of arguments from an online debate portal, and also identifies additional valid arguments.

2017

pdf bib
Assessing SRL Frameworks with Automatic Training Data Expansion
Silvana Hartmann | Éva Mújdricza-Maydt | Ilia Kuznetsov | Iryna Gurevych | Anette Frank
Proceedings of the 11th Linguistic Annotation Workshop

We present the first experiment-based study that explicitly contrasts the three major semantic role labeling frameworks. As a prerequisite, we create a dataset labeled with parallel FrameNet-, PropBank-, and VerbNet-style labels for German. We train a state-of-the-art SRL tool for German for the different annotation styles and provide a comparative analysis across frameworks. We further explore the behavior of the frameworks with automatic training data generation. VerbNet provides larger semantic expressivity than PropBank, and we find that its generalization capacity approaches PropBank in SRL training, but it benefits less from training data expansion than the sparse-data affected FrameNet.

pdf bib
A Consolidated Open Knowledge Representation for Multiple Texts
Rachel Wities | Vered Shwartz | Gabriel Stanovsky | Meni Adler | Ori Shapira | Shyam Upadhyay | Dan Roth | Eugenio Martinez Camara | Iryna Gurevych | Ido Dagan
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We propose to move from Open Information Extraction (OIE) ahead to Open Knowledge Representation (OKR), aiming to represent information conveyed jointly in a set of texts in an open text-based manner. We do so by consolidating OIE extractions using entity and predicate coreference, while modeling information containment between coreferring elements via lexical entailment. We suggest that generating OKR structures can be a useful step in the NLP pipeline, to give semantic applications an easy handle on consolidated information across multiple texts.

pdf bib
LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test
Michael Bugert | Yevgeniy Puzikov | Andreas Rücklé | Judith Eckle-Kohler | Teresa Martin | Eugenio Martínez-Cámara | Daniil Sorokin | Maxime Peyrard | Iryna Gurevych
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

The Story Cloze test is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus.

pdf bib
Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite
Maria Sukhareva | Francesco Fuscagni | Johannes Daxenberger | Susanne Görke | Doris Prechel | Iryna Gurevych
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents a statistical approach to automatic morphosyntactic annotation of Hittite transcripts. Hittite is an extinct Indo-European language using the cuneiform script. There are currently no morphosyntactic annotations available for Hittite, so we explored methods of distant supervision. The annotations were projected from parallel German translations of the Hittite texts. In order to reduce data sparsity, we applied stemming of German and Hittite texts. As there is no off-the-shelf Hittite stemmer, a stemmer for Hittite was developed for this purpose. The resulting annotation projections were used to train a POS tagger, achieving an accuracy of 69% on a test sample. To our knowledge, this is the first attempt of statistical POS tagging of a cuneiform language.

pdf bib
Prediction of Frame-to-Frame Relations in the FrameNet Hierarchy with Frame Embeddings
Teresa Botschen | Hatem Mousselly-Sergieh | Iryna Gurevych
Proceedings of the 2nd Workshop on Representation Learning for NLP

Automatic completion of frame-to-frame (F2F) relations in the FrameNet (FN) hierarchy has received little attention, although they incorporate meta-level commonsense knowledge and are used in downstream approaches. We address the problem of sparsely annotated F2F relations. First, we examine whether the manually defined F2F relations emerge from text by learning text-based frame embeddings. Our analysis reveals insights about the difficulty of reconstructing F2F relations purely from text. Second, we present different systems for predicting F2F relations; our best-performing one uses the FN hierarchy to train on and to ground embeddings in. A comparison of systems and embeddings exposes the crucial influence of knowledge-based embeddings to a system’s performance in predicting F2F relations.

pdf bib
Learning to Score System Summaries for Better Content Selection Evaluation.
Maxime Peyrard | Teresa Botschen | Iryna Gurevych
Proceedings of the Workshop on New Frontiers in Summarization

The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.

pdf bib
Proceedings of the 4th Workshop on Argument Mining
Ivan Habernal | Iryna Gurevych | Kevin Ashley | Claire Cardie | Nancy Green | Diane Litman | Georgios Petasis | Chris Reed | Noam Slonim | Vern Walker
Proceedings of the 4th Workshop on Argument Mining

pdf bib
Latest News in Computational Argumentation: Surfing on the Deep Learning Wave, Scuba Diving in the Abyss of Fundamental Questions
Iryna Gurevych
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Mining arguments from natural language texts, parsing argumentative structures, and assessing argument quality are among the recent challeng-es tackled in computational argumentation. While advanced deep learning models provide state-of-the-art performance in many of these tasks, much attention is also paid to the underly-ing fundamental questions. How are arguments expressed in natural language across genres and domains? What is the essence of an argument’s claim? Can we reliably annotate convincingness of an argument? How can we approach logic and common-sense reasoning in argumentation? This talk highlights some recent advances in computa-tional argumentation and shows why researchers must be both “surfers” and “scuba divers”.

pdf bib
Utilizing Automatic Predicate-Argument Analysis for Concept Map Mining
Tobias Falke | Iryna Gurevych
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

pdf bib
Neural Disambiguation of Causal Lexical Markers Based on Context
Eugenio Martínez-Cámara | Vered Shwartz | Iryna Gurevych | Ido Dagan
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

pdf bib
Representation Learning for Answer Selection with LSTM-Based Importance Weighting
Andreas Rücklé | Iryna Gurevych
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

pdf bib
Argumentation Mining in User-Generated Web Discourse
Ivan Habernal | Iryna Gurevych
Computational Linguistics, Volume 43, Issue 1 - April 2017

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people’s argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.

pdf bib
Parsing Argumentation Structures in Persuasive Essays
Christian Stab | Iryna Gurevych
Computational Linguistics, Volume 43, Issue 3 - September 2017

In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using Integer Linear Programming. We show that our model significantly outperforms challenging heuristic baselines on two different types of discourse. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement.

pdf bib
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
Nils Reimers | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches. We demonstrate for common sequence tagging tasks that the seed value for the random number generator can result in statistically significant (p < 10-4) differences for state-of-the-art systems. For two recent systems for NER, we observe an absolute difference of one percentage point F₁-score depending on the selected seed value, making these systems perceived either as state-of-the-art or mediocre. Instead of publishing and reporting single performance scores, we propose to compare score distributions based on multiple executions. Based on the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we present network architectures that produce both superior performance as well as are more stable with respect to the remaining hyperparameters.

pdf bib
Context-Aware Representations for Knowledge Base Relation Extraction
Daniil Sorokin | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We demonstrate that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation. Our architecture uses an LSTM-based encoder to jointly learn representations for all relations in a single sentence. We combine the context representations with an attention mechanism to make the final prediction. We use the Wikidata knowledge base to construct a dataset of multiple relations per sentence and to evaluate our approach. Compared to a baseline system, our method results in an average error reduction of 24 on a held-out set of relations. The code and the dataset to replicate the experiments are made available at https://github.com/ukplab/.

pdf bib
What is the Essence of a Claim? Cross-Domain Claim Identification
Johannes Daxenberger | Steffen Eger | Ivan Habernal | Christian Stab | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Argument mining has become a popular research area in NLP. It typically includes the identification of argumentative components, e.g. claims, as the central component of an argument. We perform a qualitative analysis across six different datasets and show that these appear to conceptualize claims quite differently. To learn about the consequences of such different conceptualizations of claim for practical applications, we carried out extensive experiments using state-of-the-art feature-rich and deep learning systems, to identify claims in a cross-domain fashion. While the divergent conceptualization of claims in different datasets is indeed harmful to cross-domain classification, we show that there are shared properties on the lexical level as well as system configurations that can help to overcome these gaps.

pdf bib
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
Tobias Falke | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.

pdf bib
Argotario: Computational Argumentation Meets Serious Games
Ivan Habernal | Raffael Hannemann | Christian Pollak | Christopher Klamm | Patrick Pauli | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

An important skill in critical thinking and argumentation is the ability to spot and recognize fallacies. Fallacious arguments, omnipresent in argumentative discourse, can be deceptive, manipulative, or simply leading to ‘wrong moves’ in a discussion. Despite their importance, argumentation scholars and NLP researchers with focus on argumentation quality have not yet investigated fallacies empirically. The nonexistence of resources dealing with fallacious argumentation calls for scalable approaches to data acquisition and annotation, for which the serious games methodology offers an appealing, yet unexplored, alternative. We present Argotario, a serious game that deals with fallacies in everyday argumentation. Argotario is a multilingual, open-source, platform-independent application with strong educational aspects, accessible at www.argotario.net.

pdf bib
GraphDocExplore: A Framework for the Experimental Comparison of Graph-based Document Exploration Techniques
Tobias Falke | Iryna Gurevych
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Graphs have long been proposed as a tool to browse and navigate in a collection of documents in order to support exploratory search. Many techniques to automatically extract different types of graphs, showing for example entities or concepts and different relationships between them, have been suggested. While experimental evidence that they are indeed helpful exists for some of them, it is largely unknown which type of graph is most helpful for a specific exploratory task. However, carrying out experimental comparisons with human subjects is challenging and time-consuming. Towards this end, we present the GraphDocExplore framework. It provides an intuitive web interface for graph-based document exploration that is optimized for experimental user studies. Through a generic graph interface, different methods to extract graphs from text can be plugged into the system. Hence, they can be compared at minimal implementation effort in an environment that ensures controlled comparisons. The system is publicly available under an open-source license.

pdf bib
Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization
Tobias Falke | Christian M. Meyer | Iryna Gurevych
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Concept-map-based multi-document summarization is a variant of traditional summarization that produces structured summaries in the form of concept maps. In this work, we propose a new model for the task that addresses several issues in previous methods. It learns to identify and merge coreferent concepts to reduce redundancy, determines their importance with a strong supervised model and finds an optimal summary concept map via integer linear programming. It is also computationally more efficient than previous methods, allowing us to summarize larger document sets. We evaluate the model on two datasets, finding that it outperforms several approaches from previous work.

pdf bib
Real-Time News Summarization with Adaptation to Media Attention
Andreas Rücklé | Iryna Gurevych
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Real-time summarization of news events (RTS) allows persons to stay up-to-date on important topics that develop over time. With the occurrence of major sub-events, media attention increases and a large number of news articles are published. We propose a summarization approach that detects such changes and selects a suitable summarization configuration at run-time. In particular, at times with high media attention, our approach exploits the redundancy in content to produce a more precise summary and avoid emitting redundant information. We find that our approach significantly outperforms a strong non-adaptive RTS baseline in terms of the emitted summary updates and achieves the best results on a recent web-scale dataset. It can successfully be applied to a different real-world dataset without requiring additional modifications.

pdf bib
Out-of-domain FrameNet Semantic Role Labeling
Silvana Hartmann | Ilia Kuznetsov | Teresa Martin | Iryna Gurevych
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Domain dependence of NLP systems is one of the major obstacles to their application in large-scale text analysis, also restricting the applicability of FrameNet semantic role labeling (SRL) systems. Yet, current FrameNet SRL systems are still only evaluated on a single in-domain test set. For the first time, we study the domain dependence of FrameNet SRL on a wide range of benchmark sets. We create a novel test set for FrameNet SRL based on user-generated web text and find that the major bottleneck for out-of-domain FrameNet SRL is the frame identification step. To address this problem, we develop a simple, yet efficient system based on distributed word representations. Our system closely approaches the state-of-the-art in-domain while outperforming the best available frame identification system out-of-domain. We publish our system and test data for research purposes.

pdf bib
Metaheuristic Approaches to Lexical Substitution and Simplification
Sallam Abualhaija | Tristan Miller | Judith Eckle-Kohler | Iryna Gurevych | Karl-Heinz Zimmermann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we propose using metaheuristics—in particular, simulated annealing and the new D-Bees algorithm—to solve word sense disambiguation as an optimization problem within a knowledge-based lexical substitution system. We are the first to perform such an extrinsic evaluation of metaheuristics, for which we use two standard lexical substitution datasets, one English and one German. We find that D-Bees has robust performance for both languages, and performs better than simulated annealing, though both achieve good results. Moreover, the D-Bees–based lexical substitution system outperforms state-of-the-art systems on several evaluation metrics. We also show that D-Bees achieves competitive performance in lexical simplification, a variant of lexical substitution.

pdf bib
Recognizing Insufficiently Supported Arguments in Argumentative Essays
Christian Stab | Iryna Gurevych
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we propose a new task for assessing the quality of natural language arguments. The premises of a well-reasoned argument should provide enough evidence for accepting or rejecting its claim. Although this criterion, known as sufficiency, is widely adopted in argumentation theory, there are no empirical studies on its applicability to real arguments. In this work, we show that human annotators substantially agree on the sufficiency criterion and introduce a novel annotated corpus. Furthermore, we experiment with feature-rich SVMs and Convolutional Neural Networks and achieve 84% accuracy for automatically identifying insufficiently supported arguments. The final corpus as well as the annotation guideline are freely available for encouraging future research on argument quality.

pdf bib
A tool for extracting sense-disambiguated example sentences through user feedback
Beto Boullosa | Richard Eckart de Castilho | Alexander Geyken | Lothar Lemnitzer | Iryna Gurevych
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

This paper describes an application system aimed to help lexicographers in the extraction of example sentences for a given headword based on its different senses. The tool uses classification and clustering methods and incorporates user feedback to refine its results.

pdf bib
Neural End-to-End Learning for Computational Argumentation Mining
Steffen Eger | Johannes Daxenberger | Iryna Gurevych
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning ‘natural’ subtasks, in a multi-task learning setup, improves performance.

pdf bib
Argumentation Quality Assessment: Theory vs. Practice
Henning Wachsmuth | Nona Naderi | Ivan Habernal | Yufang Hou | Graeme Hirst | Iryna Gurevych | Benno Stein
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Argumentation quality is viewed differently in argumentation theory and in practical assessment approaches. This paper studies to what extent the views match empirically. We find that most observations on quality phrased spontaneously are in fact adequately represented by theory. Even more, relative comparisons of arguments in practice correlate with absolute quality ratings based on theory. Our results clarify how the two views can learn from each other.

pdf bib
Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets
Gabriel Stanovsky | Judith Eckle-Kohler | Yevgeniy Puzikov | Ido Dagan | Iryna Gurevych
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Previous models for the assessment of commitment towards a predicate in a sentence (also known as factuality prediction) were trained and tested against a specific annotated dataset, subsequently limiting the generality of their results. In this work we propose an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora. In addition, we design a novel model for factuality prediction by first extending a previous rule-based factuality prediction system and applying it over an abstraction of dependency trees, and then using the output of this system in a supervised classifier. We show that this model outperforms previous methods on all three datasets. We make both the unified factuality corpus and our new model publicly available.

pdf bib
End-to-End Non-Factoid Question Answering with an Interactive Visualization of Neural Attention Weights
Andreas Rücklé | Iryna Gurevych
Proceedings of ACL 2017, System Demonstrations

pdf bib
SemEval-2017 Task 7: Detection and Interpretation of English Puns
Tristan Miller | Christian Hempelmann | Iryna Gurevych
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

A pun is a form of wordplay in which a word suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word, for an intended humorous or rhetorical effect. Though a recurrent and expected feature in many discourse types, puns stymie traditional approaches to computational lexical semantics because they violate their one-sense-per-context assumption. This paper describes the first competitive evaluation for the automatic detection, location, and interpretation of puns. We describe the motivation for these tasks, the evaluation methods, and the manually annotated data set. Finally, we present an overview and discussion of the participating systems’ methodologies, resources, and results.

pdf bib
EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for kEyphrase ClassificaTION
Steffen Eger | Erik-Lân Do Dinh | Ilia Kuznetsov | Masoud Kiaeeha | Iryna Gurevych
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our approach to the SemEval 2017 Task 10: Extracting Keyphrases and Relations from Scientific Publications, specifically to Subtask (B): Classification of identified keyphrases. We explored three different deep learning approaches: a character-level convolutional neural network (CNN), a stacked learner with an MLP meta-classifier, and an attention based Bi-LSTM. From these approaches, we created an ensemble of differently hyper-parameterized systems, achieving a micro-F1-score of 0.63 on the test data. Our approach ranks 2nd (score of 1st placed system: 0.64) out of four according to this official score. However, we erroneously trained 2 out of 3 neural nets (the stacker and the CNN) on only roughly 15% of the full data, namely, the original development set. When trained on the full data (training+development), our ensemble has a micro-F1-score of 0.69. Our code is available from https://github.com/UKPLab/semeval2017-scienceie.

2016

pdf bib
Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM
Ivan Habernal | Iryna Gurevych
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction, and Utilization
Lucie Flekova | Iryna Gurevych
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Temporal Anchoring of Events for the TimeBank Corpus
Nils Reimers | Nazanin Dehghani | Iryna Gurevych
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
MDSWriter: Annotation Tool for Creating High-Quality Multi-Document Summarization Corpora
Christian M. Meyer | Darina Benikova | Margot Mieskes | Iryna Gurevych
Proceedings of ACL-2016 System Demonstrations

bib
NLP Approaches to Computational Argumentation
Noam Slonim | Iryna Gurevych | Chris Reed | Benno Stein
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Argumentation and debating represent primary intellectual activities of the human mind. People in all societies argue and debate, not only to convince others of their own opinions but also in order to explore the differences between multiple perspectives and conceptualizations, and to learn from this exploration. The process of reaching a resolution on controversial topics typically does not follow a simple sequence of purely logical steps. Rather it involves a wide variety of complex and interwoven actions. Presumably, pros and cons are identified, considered, and weighed, via cognitive processes that often involve persuasion and emotions, which are inherently harder to formalize from a computational perspective.This wide range of conceptual capabilities and activities, have only in part been studied in fields like CL and NLP, and typically within relatively small sub-communities that overlap the ACL audience. The new field of Computational Argumentation has very recently seen significant expansion within the CL and NLP community as new techniques and datasets start to become available, allowing for the first time investigation of the computational aspects of human argumentation in a holistic manner.The main goal of this tutorial would be to introduce this rapidly evolving field to the CL community. Specifically, we will aim to review recent advances in the field and to outline the challenging research questions - that are most relevant to the ACL audience - that naturally arise when trying to model human argumentation.We will further emphasize the practical value of this line of study, by considering real-world CL and NLP applications that are expected to emerge from this research, and to impact various industries, including legal, finance, healthcare, media, and education, to name just a few examples.The first part of the tutorial will provide introduction to the basics of argumentation and rhetoric. Next, we will cover fundamental analysis tasks in Computational Argumentation, including argumentation mining, revealing argument relations, assessing arguments quality, stance classification, polarity analysis, and more. After the coffee break, we will first review existing resources and recently introduced benchmark data. In the following part we will cover basic synthesis tasks in Computational Argumentation, including the relation to NLG and dialogue systems, and the evolving area of Debate Technologies, defined as technologies developed directly to enhance, support, and engage with human debating. Finally, we will present relevant demos, review potential applications, and discuss the future of this emerging field.

pdf bib
Predicting the Spelling Difficulty of Words for Language Learners
Lisa Beinborn | Torsten Zesch | Iryna Gurevych
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Token-Level Metaphor Detection using Neural Networks
Erik-Lân Do Dinh | Iryna Gurevych
Proceedings of the Fourth Workshop on Metaphor in NLP

pdf bib
Enriching Wikidata with Frame Semantics
Hatem Mousselly-Sergieh | Iryna Gurevych
Proceedings of the 5th Workshop on Automated Knowledge Base Construction

pdf bib
Argumentation: Content, Structure, and Relationship with Essay Quality
Beata Beigman Klebanov | Christian Stab | Jill Burstein | Yi Song | Binod Gyawali | Iryna Gurevych
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Recognizing the Absence of Opposing Arguments in Persuasive Essays
Christian Stab | Iryna Gurevych
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures
Richard Eckart de Castilho | Éva Mújdricza-Maydt | Seid Muhie Yimam | Silvana Hartmann | Iryna Gurevych | Anette Frank | Chris Biemann
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams. New features in this release focus on semantic annotation tasks (e.g. semantic role labelling or event annotation) and allow the tight integration of semantic annotations with syntactic annotations. In particular, we introduce the concept of slot features, a novel constraint mechanism that allows modelling the interaction between semantic and syntactic annotations, as well as a new annotation user interface. The new features were developed and used in an annotation project for semantic roles on German texts. The paper briefly introduces this project and reports on experiences performing annotations with the new tool. On a comparative evaluation, our tool reaches significant speedups over WebAnno 2 for a semantic annotation task.

pdf bib
A domain-agnostic approach for opinion prediction on speech
Pedro Bispo Santos | Lisa Beinborn | Iryna Gurevych
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

We explore a domain-agnostic approach for analyzing speech with the goal of opinion prediction. We represent the speech signal by mel-frequency cepstral coefficients and apply long short-term memory neural networks to automatically learn temporal regularities in speech. In contrast to previous work, our approach does not require complex feature engineering and works without textual transcripts. As a consequence, it can easily be applied on various speech analysis tasks for different languages and the results show that it can nevertheless be competitive to the state-of-the-art in opinion prediction. In a detailed error analysis for opinion mining we find that our approach performs well in identifying speaker-specific characteristics, but should be combined with additional information if subtle differences in the linguistic content need to be identified.

pdf bib
Porting an Open Information Extraction System from English to German
Tobias Falke | Gabriel Stanovsky | Iryna Gurevych | Ido Dagan
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
What makes a convincing argument? Empirical analysis and detecting attributes of convincingness in Web argumentation
Ivan Habernal | Iryna Gurevych
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Generating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources
Silvana Hartmann | Judith Eckle-Kohler | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 4

We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., Word-Net, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and roles. The automatically labeled training data (www.ukp.tu-darmstadt.de/knowledge-based-srl/) are evaluated on four corpora from different domains for the tasks of word sense disambiguation and semantic role classification. Results show that classifiers trained on our generated data equal those resulting from a standard supervised setting.

pdf bib
Sense-annotating a Lexical Substitution Data Set with Ubyline
Tristan Miller | Mohamed Khemakhem | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe the construction of GLASS, a newly sense-annotated version of the German lexical substitution data set used at the GermEval 2015: LexSub shared task. Using the two annotation layers, we conduct the first known empirical study of the relationship between manually applied word senses and lexical substitutions. We find that synonymy and hypernymy/hyponymy are the only semantic relations directly linking targets to their substitutes, and that substitutes in the target’s hypernymy/hyponymy taxonomy closely align with the synonyms of a single GermaNet synset. Despite this, these substitutes account for a minority of those provided by the annotators. The results of our analysis accord with those of a previous study on English-language data (albeit with automatically induced word senses), leading us to suspect that the sense―substitution relations we discovered may be of a universal nature. We also tentatively conclude that relatively cheap lexical substitution annotations can be used as a knowledge source for automatic WSD. Also introduced in this paper is Ubyline, the web application used to produce the sense annotations. Ubyline presents an intuitive user interface optimized for annotating lexical sample data, and is readily adaptable to sense inventories other than GermaNet.

pdf bib
C4Corpus: Multilingual Web-size Corpus with Free License
Ivan Habernal | Omnia Zayed | Iryna Gurevych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Large Web corpora containing full documents with permissive licenses are crucial for many NLP tasks. In this article we present the construction of 12 million-pages Web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs. Our highly-scalable Hadoop-based framework is able to process the full CommonCrawl corpus on 2000+ CPU cluster on the Amazon Elastic Map/Reduce infrastructure. The processing pipeline includes license identification, state-of-the-art boilerplate removal, exact duplicate and near-duplicate document removal, and language detection. The construction of the corpus is highly configurable and fully reproducible, and we provide both the framework (DKPro C4CorpusTools) and the resulting data (C4Corpus) to the research community.

pdf bib
Crowdsourcing a Large Dataset of Domain-Specific Context-Sensitive Semantic Verb Relations
Maria Sukhareva | Judith Eckle-Kohler | Ivan Habernal | Iryna Gurevych
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a new large dataset of 12403 context-sensitive verb relations manually annotated via crowdsourcing. These relations capture fine-grained semantic information between verb-centric propositions, such as temporal or entailment relations. We propose a novel semantic verb relation scheme and design a multi-step annotation approach for scaling-up the annotations using crowdsourcing. We employ several quality measures and report on agreement scores. The resulting dataset is available under a permissive CreativeCommons license at www.ukp.tu-darmstadt.de/data/verb-relations/. It represents a valuable resource for various applications, such as automatic information consolidation or automatic summarization.

pdf bib
Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data
Éva Mújdricza-Maydt | Silvana Hartmann | Iryna Gurevych | Anette Frank
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a VerbNet-based annotation scheme for semantic roles that we explore in an annotation study on German language data that combines word sense and semantic role annotation. We reannotate a substantial portion of the SALSA corpus with GermaNet senses and a revised scheme of VerbNet roles. We provide a detailed evaluation of the interaction between sense and role annotation. The resulting corpus will allow us to compare VerbNet role annotation for German to FrameNet and PropBank annotation by mapping to existing role annotations on the SALSA corpus. We publish the annotated corpus and detailed guidelines for the new role annotation scheme.

pdf bib
Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity
Nils Reimers | Philip Beyer | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Semantic Textual Similarity (STS) is a foundational NLP task and can be used in a wide range of tasks. To determine the STS of two texts, hundreds of different STS systems exist, however, for an NLP system designer, it is hard to decide which system is the best one. To answer this question, an intrinsic evaluation of the STS systems is conducted by comparing the output of the system to human judgments on semantic similarity. The comparison is usually done using Pearson correlation. In this work, we show that relying on intrinsic evaluations with Pearson correlation can be misleading. In three common STS based tasks we could observe that the Pearson correlation was especially ill-suited to detect the best STS system for the task and other evaluation measures were much better suited. In this work we define how the validity of an intrinsic evaluation can be assessed and compare different intrinsic evaluation methods. Understanding of the properties of the targeted task is crucial and we propose a framework for conducting the intrinsic evaluation which takes the properties of the targeted task into account.

pdf bib
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
Darina Benikova | Margot Mieskes | Christian M. Meyer | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure. We use a corpus of heterogeneous documents to address the issue that information seekers usually face – a variety of different types of information sources. We directly extract information from these, but minimally redact and meaningfully order it to form a coherent text. Our qualitative and quantitative evaluations show that quantitative results are not sufficient to judge the quality of a summary and that other quality criteria, such as coherence, should also be taken into account. We find that our manually created corpus is of high quality and that it has the potential to bridge the gap between reference corpora of abstracts and automatic methods producing extracts. Our corpus is available to the research community for further development.

pdf bib
Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks
Carsten Schnober | Steffen Eger | Erik-Lân Do Dinh | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We analyze the performance of encoder-decoder neural models and compare them with well-known established methods. The latter represent different classes of traditional approaches that are applied to the monotone sequence-to-sequence tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion, and lemmatization. Such tasks are of practical relevance for various higher-level research fields including digital humanities, automatic text correction, and speech recognition. We investigate how well generic deep-learning approaches adapt to these tasks, and how they perform in comparison with established and more specialized methods, including our own adaptation of pruned CRFs.

pdf bib
Semi-automatic Detection of Cross-lingual Marketing Blunders based on Pragmatic Label Propagation in Wiktionary
Christian M. Meyer | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We introduce the task of detecting cross-lingual marketing blunders, which occur if a trade name resembles an inappropriate or negatively connotated word in a target language. To this end, we suggest a formal task definition and a semi-automatic method based the propagation of pragmatic labels from Wiktionary across sense-disambiguated translations. Our final tool assists users by providing clues for problematic names in any language, which we simulate in two experiments on detecting previously occurred marketing blunders and identifying relevant clues for established international brands. We conclude the paper with a suggested research roadmap for this new task. To initiate further research, we publish our online demo along with the source code and data at http://uby.ukp.informatik.tu-darmstadt.de/blunder/.

pdf bib
CNN- and LSTM-based Claim Classification in Online User Comments
Chinnappa Guggilla | Tristan Miller | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

When processing arguments in online user interactive discourse, it is often necessary to determine their bases of support. In this paper, we describe a supervised approach, based on deep neural networks, for classifying the claims made in online arguments. We conduct experiments using convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) on two claim data sets compiled from online user comments. Using different types of distributional word embeddings, but without incorporating any rich, expensive set of features, we achieve a significant improvement over the state of the art for one data set (which categorizes arguments as factual vs. emotional), and performance comparable to the state of the art on the other data set (which categorizes propositions according to their verifiability). Our approach has the advantages of using a generalized, simple, and effective methodology that works for claim categorization on different data sets and tasks.

pdf bib
Modeling Extractive Sentence Intersection via Subtree Entailment
Omer Levy | Ido Dagan | Gabriel Stanovsky | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Sentence intersection captures the semantic overlap of two texts, generalizing over paradigms such as textual entailment and semantic text similarity. Despite its modeling power, it has received little attention because it is difficult for non-experts to annotate. We analyze 200 pairs of similar sentences and identify several underlying properties of sentence intersection. We leverage these insights to design an algorithm that decomposes the sentence intersection task into several simpler annotation tasks, facilitating the construction of a high quality dataset via crowdsourcing. We implement this approach and provide an annotated dataset of 1,764 sentence intersections.

2015

pdf bib
Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks.
Emily Jamison | Iryna Gurevych
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Personality Profiling of Fictional Characters using Sense-Level Links between Lexical Resources
Lucie Flekova | Iryna Gurevych
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exploiting Debate Portals for Semi-Supervised Argumentation Mining in User-Generated Web Discourse
Ivan Habernal | Iryna Gurevych
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
On the Role of Discourse Markers for Discriminating Claims and Premises in Argumentative Discourse
Judith Eckle-Kohler | Roland Kluge | Iryna Gurevych
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatic disambiguation of English puns
Tristan Miller | Iryna Gurevych
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
In-tool Learning for Selective Manual Annotation in Large Corpora
Erik-Lân Do Dinh | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of ACL-IJCNLP 2015 System Demonstrations

pdf bib
Linking the Thoughts: Analysis of Argumentation Structures in Scientific Publications
Christian Kirschner | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Candidate evaluation strategies for improved difficulty prediction of language tests
Lisa Beinborn | Torsten Zesch | Iryna Gurevych
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Counting What Counts: Decompounding for Keyphrase Extraction
Nicolai Erbs | Pedro Bispo Santos | Torsten Zesch | Iryna Gurevych
Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction

2014

pdf bib
Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets
Emily Jamison | Iryna Gurevych
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
Adjacency Pair Recognition in Wikipedia Discussions using Lexical Pairs
Emily Jamison | Iryna Gurevych
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
Automated Verb Sense Labelling Based on Linked Lexical Resources
Kostadin Cholakov | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia
Johannes Daxenberger | Iryna Gurevych
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments
Nicolai Erbs | Pedro Bispo Santos | Iryna Gurevych | Torsten Zesch
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data
Johannes Daxenberger | Oliver Ferschke | Iryna Gurevych | Torsten Zesch
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno
Seid Muhie Yimam | Chris Biemann | Richard Eckart de Castilho | Iryna Gurevych
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
WordNet—Wikipedia—Wiktionary: Construction of a Three-way Alignment
Tristan Miller | Iryna Gurevych
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Lexical Substitution Dataset for German
Kostadin Cholakov | Chris Biemann | Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

pdf bib
Sense and Similarity: A Study of Sense-level Similarity Measures
Nicolai Erbs | Iryna Gurevych | Torsten Zesch
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

pdf bib
UKPDIPF: Lexical Semantic Approach to Sentiment Polarity Prediction in Twitter Data
Lucie Flekova | Oliver Ferschke | Iryna Gurevych
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
High Performance Word Sense Alignment by Joint Modeling of Sense Distance and Gloss Similarity
Michael Matuschek | Iryna Gurevych
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Annotating Argument Components and Relations in Persuasive Essays
Christian Stab | Iryna Gurevych
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement
Christian M. Meyer | Margot Mieskes | Christian Stab | Iryna Gurevych
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

pdf bib
Predicting the Difficulty of Language Proficiency Tests
Lisa Beinborn | Torsten Zesch | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 2

Language proficiency tests are used to evaluate and compare the progress of language learners. We present an approach for automatic difficulty prediction of C-tests that performs on par with human experts. On the basis of detailed analysis of newly collected data, we develop a model for C-test difficulty introducing four dimensions: solution difficulty, candidate ambiguity, inter-gap dependency, and paragraph difficulty. We show that cues from all four dimensions contribute to C-test difficulty.

pdf bib
Identifying Argumentative Discourse Structures in Persuasive Essays
Christian Stab | Iryna Gurevych
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
A broad-coverage collection of portable NLP components for building shareable analysis pipelines
Richard Eckart de Castilho | Iryna Gurevych
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT

2013

pdf bib
UKP-WSI: UKP Lab Semeval-2013 Task 11 System Description
Hans-Peter Zorn | Iryna Gurevych
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UKP-BIU: Similarity and Entailment Metrics for Student Response Analysis
Omer Levy | Torsten Zesch | Ido Dagan | Iryna Gurevych
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
Automatically Classifying Edit Categories in Wikipedia Revisions
Johannes Daxenberger | Iryna Gurevych
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Oliver Ferschke | Iryna Gurevych | Marc Rittberger
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection
Silvana Hartmann | Iryna Gurevych
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Recognizing Partial Textual Entailment
Omer Levy | Torsten Zesch | Ido Dagan | Iryna Gurevych
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
Seid Muhie Yimam | Iryna Gurevych | Richard Eckart de Castilho | Chris Biemann
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
DKPro WSD: A Generalized UIMA-based Framework for Word Sense Disambiguation
Tristan Miller | Nicolai Erbs | Hans-Peter Zorn | Torsten Zesch | Iryna Gurevych
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
DKPro Similarity: An Open Source Framework for Text Similarity
Daniel Bär | Torsten Zesch | Iryna Gurevych
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment
Michael Matuschek | Iryna Gurevych
Transactions of the Association for Computational Linguistics, Volume 1

In this paper, we present Dijkstra-WSA, a novel graph-based algorithm for word sense alignment. We evaluate it on four different pairs of lexical-semantic resources with different characteristics (WordNet-OmegaWiki, WordNet-Wiktionary, GermaNet-Wiktionary and WordNet-Wikipedia) and show that it achieves competitive performance on 3 out of 4 datasets. Dijkstra-WSA outperforms the state of the art on every dataset if it is combined with a back-off based on gloss similarity. We also demonstrate that Dijkstra-WSA is not only flexibly applicable to different resources but also highly parameterizable to optimize for precision or recall.

pdf bib
Supervised All-Words Lexical Substitution using Delexicalized Features
György Szarvas | Chris Biemann | Iryna Gurevych
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Hierarchy Identification for Automatically Generating Table-of-Contents
Nicolai Erbs | Iryna Gurevych | Torsten Zesch
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads
Emily Jamison | Iryna Gurevych
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Cognate Production using Character-based Machine Translation
Lisa Beinborn | Torsten Zesch | Iryna Gurevych
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Uncertainty Detection for Natural Language Watermarking
György Szarvas | Iryna Gurevych
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
CSNIPER - Annotation-by-query for Non-canonical Constructions in Large Corpora
Richard Eckart de Castilho | Sabine Bartsch | Iryna Gurevych
Proceedings of the ACL 2012 System Demonstrations

pdf bib
UBY-LMF – A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF
Judith Eckle-Kohler | Iryna Gurevych | Silvana Hartmann | Michael Matuschek | Christian M. Meyer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
The Open Linguistics Working Group
Christian Chiarcos | Sebastian Hellmann | Sebastian Nordhoff | Steven Moran | Richard Littauer | Judith Eckle-Kohler | Iryna Gurevych | Silvana Hartmann | Michael Matuschek | Christian M. Meyer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

pdf bib
Cross-Genre and Cross-Domain Detection of Semantic Uncertainty
György Szarvas | Veronika Vincze | Richárd Farkas | György Móra | Iryna Gurevych
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf bib
Text Reuse Detection using a Composition of Text Similarity Measures
Daniel Bär | Torsten Zesch | Iryna Gurevych
Proceedings of COLING 2012

pdf bib
A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles
Johannes Daxenberger | Iryna Gurevych
Proceedings of COLING 2012

pdf bib
To Exhibit is not to Loiter: A Multilingual, Sense-Disambiguated Wiktionary for Measuring Verb Similarity
Christian M. Meyer | Iryna Gurevych
Proceedings of COLING 2012

pdf bib
Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation
Tristan Miller | Chris Biemann | Torsten Zesch | Iryna Gurevych
Proceedings of COLING 2012

pdf bib
Learning Semantics with Deep Belief Network for Cross-Language Information Retrieval
Jungi Kim | Jinseok Nam | Iryna Gurevych
Proceedings of COLING 2012: Posters

pdf bib
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP
Iryna Gurevych | Nicoletta Calzolari Zamorani | Jungi Kim
Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

pdf bib
Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability
Judith Eckle-Kohler | Iryna Gurevych
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF
Iryna Gurevych | Judith Eckle-Kohler | Silvana Hartmann | Michael Matuschek | Christian M. Meyer | Christian Wirth
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages
Oliver Ferschke | Iryna Gurevych | Yevgen Chebotar
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
Daniel Bär | Chris Biemann | Iryna Gurevych | Torsten Zesch
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis
Daniel Bär | Nicolai Erbs | Torsten Zesch | Iryna Gurevych
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia’s Edit History
Oliver Ferschke | Torsten Zesch | Iryna Gurevych
Proceedings of the ACL-HLT 2011 System Demonstrations

pdf bib
The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet
Elisabeth Niemann | Iryna Gurevych
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
A Reflective View on Text Similarity
Daniel Bär | Torsten Zesch | Iryna Gurevych
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
A Study of Sense-Disambiguated Networks Induced from Folksonomies
Hans-Peter Zorn | Iryna Gurevych
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

pdf bib
What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage
Christian M. Meyer | Iryna Gurevych
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources
Iryna Gurevych | Torsten Zesch
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

pdf bib
The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures
Torsten Zesch | Iryna Gurevych
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib
Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
Cigdem Toprak | Niklas Jakob | Iryna Gurevych
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
Niklas Jakob | Iryna Gurevych
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
A Monolingual Tree-based Translation Model for Sentence Simplification
Zhemin Zhu | Delphine Bernhard | Iryna Gurevych
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
TUD: Semantic Relatedness for Relation Classification
György Szarvas | Iryna Gurevych
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields
Niklas Jakob | Iryna Gurevych
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
A Study on the Semantic Relatedness of Query and Document Terms in Information Retrieval
Christof Müller | Iryna Gurevych
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Approximate Matching for Evaluating Keyphrase Extraction
Torsten Zesch | Iryna Gurevych
Proceedings of the International Conference RANLP-2009

pdf bib
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding
Delphine Bernhard | Iryna Gurevych
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)
Iryna Gurevych | Torsten Zesch
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

2008

pdf bib
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
Torsten Zesch | Christof Müller | Iryna Gurevych
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

pdf bib
Coling 2008: Educational Natural Language Processing - Tutorial notes
Iryna Gurevych | Delphine Bernhard
Coling 2008: Educational Natural Language Processing - Tutorial notes

pdf bib
ENLP Tutorial Notes – Slides
Iryna Gurevych | Delphine Bernhard
Coling 2008: Educational Natural Language Processing - Tutorial notes

pdf bib
ENLP Tutorial Notes – References
Iryna Gurevych | Delphine Bernhard
Coling 2008: Educational Natural Language Processing - Tutorial notes

pdf bib
Answering Learners’ Questions by Retrieving Question Paraphrases from Social Q&A Sites
Delphine Bernhard | Iryna Gurevych
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

2007

pdf bib
What to be? - Electronic Career Guidance Based on Semantic Relatedness
Iryna Gurevych | Christof Müller | Torsten Zesch
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Automatically Assessing the Post Quality in Online Discussions on Software
Markus Weimer | Iryna Gurevych | Max Mühlhäuser
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Analysis of the Wikipedia Category Graph for NLP Applications
Torsten Zesch | Iryna Gurevych
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing

pdf bib
Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance
Saif Mohammad | Iryna Gurevych | Graeme Hirst | Torsten Zesch
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets
Torsten Zesch | Iryna Gurevych | Max Mühlhäuser
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

2006

pdf bib
Automatically Creating Datasets for Measures of Semantic Relatedness
Torsten Zesch | Iryna Gurevych
Proceedings of the Workshop on Linguistic Distances

2005

pdf bib
Using the Structure of a Conceptual Network in Computing Semantic Relatedness
Iryna Gurevych
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Computing Semantic Relatedness in German with Revised Information Content Metrics
Iryna Gurevych | Hendrik Niederlich
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources

pdf bib
Accessing GermaNet Data and Computing Semantic Relatedness
Iryna Gurevych | Hendrik Niederlich
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2004

pdf bib
Semantic Similarity Applied to Spoken Dialogue Summarization
Iryna Gurevych | Michael Strube
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Assigning Domains to Speech Recognition Hypotheses
Klaus Rüggenmann | Iryna Gurevych
Proceedings of the HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems and Higher Level Linguistic Information for Speech Processing

2003

pdf bib
Automatic Creation of Interface Specifications from Ontologies
Iryna Gurevych | Stefan Merten | Robert Porzel
Proceedings of the HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS)

pdf bib
Less is More: Using a single knowledge representation in dialogue systems
Iryna Gurevych | Robert Porzel | Elena Slinko | Norbert Pfleger | Jan Alexandersson | Stefan Merten
Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning

pdf bib
Ontology-based Contextual Coherence Scoring
Robert Porzel | Iryna Gurevych | Christof E. Müller
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf bib
Semantic Coherence Scoring Using an Ontology
Iryna Gurevych | Rainer Malaka | Robert Porzel | Hans-Peter Zorn
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
Annotating the Semantic Consistency of Speech Recognition Hypotheses
Iryna Gurevych | Robert Porzel | Michael Strube
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf bib
Towards Context-adaptive Utterance Interpretation
Robert Porzel | Iryna Gurevych
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

Search
Co-authors