Lucie Poláková


2020

pdf bib
GeCzLex: Lexicon of Czech and German Anaphoric Connectives
Lucie Poláková | Kateřina Rysová | Magdaléna Rysová | Jiří Mírovský
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce the first version of GeCzLex, an online electronic resource for translation equivalents of Czech and German discourse connectives. The lexicon is one of the outcomes of the research on anaphoricity and long-distance relations in discourse, it contains at present anaphoric connectives (ACs) for Czech and German connectives, and further their possible translations documented in bilingual parallel corpora (not necessarily anaphoric). As a basis, we use two existing monolingual lexicons of connectives: the Lexicon of Czech Discourse Connectives (CzeDLex) and the Lexicon of Discourse Markers (DiMLex) for German, interlink their relevant entries via semantic annotation of the connectives (according to the PDTB 3 sense taxonomy) and statistical information of translation possibilities from the Czech and German parallel data of the InterCorp project. The lexicon is, as far as we know, the first bilingual inventory of connectives with linkage on the level of individual entries, and a first attempt to systematically describe devices engaged in long-distance, non-local discourse coherence. The lexicon is freely available under the Creative Commons License.

pdf bib
CzeDLex 0.6 and its Representation in the PML-TQ
Jiří Mírovský | Lucie Poláková | Pavlína Synková
Proceedings of the Twelfth Language Resources and Evaluation Conference

CzeDLex is an electronic lexicon of Czech discourse connectives with its data coming from a large treebank annotated with discourse relations. Its new version CzeDLex 0.6 (as compared with the previous version 0.5, which was published in 2017) is significantly larger with respect to manually processed entries. Also, its structure has been modified to allow for primary connectives to appear with multiple entries for a single discourse sense. The lexicon comes in several formats, being both human and machine readable, and is available for searching in PML Tree Query, a user-friendly and powerful search tool for all kinds of linguistically annotated treebanks. The main purpose of this paper/demo is to present the new version of the lexicon and to demonstrate possibilities of mining various types of information from the lexicon using PML Tree Query; we present several examples of search queries over the lexicon data along with their results. The new version of the lexicon, CzeDLex 0.6, is available on-line and was officially released in December 2019 under the Creative Commons License.

2019

pdf bib
A Test Suite and Manual Evaluation of Document-Level NMT at WMT19
Kateřina Rysová | Magdaléna Rysová | Tomáš Musil | Lucie Poláková | Ondřej Bojar
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

As the quality of machine translation rises and neural machine translation (NMT) is moving from sentence to document level translations, it is becoming increasingly difficult to evaluate the output of translation systems. We provide a test suite for WMT19 aimed at assessing discourse phenomena of MT systems participating in the News Translation Task. We have manually checked the outputs and identified types of translation errors that are relevant to document-level translation.

2017

pdf bib
Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus
Pavlína Synková | Magdaléna Rysová | Lucie Poláková | Jiří Mírovský
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Searching in the Penn Discourse Treebank Using the PML-Tree Query
Jiří Mírovský | Lucie Poláková | Jan Štěpánek
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The PML-Tree Query is a general, powerful and user-friendly system for querying richly linguistically annotated treebanks. The paper shows how the PML-Tree Query can be used for searching for discourse relations in the Penn Discourse Treebank 2.0 mapped onto the syntactic annotation of the Penn Treebank.

pdf bib
Designing CzeDLex – A Lexicon of Czech Discourse Connectives
Jiří Mírovský | Pavlína Jínová | Magdaléna Rysová | Lucie Poláková
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Posters

2014

pdf bib
Genres in the Prague Discourse Treebank
Lucie Poláková | Pavlína Jínová | Jiří Mírovský
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the project of classification of Prague Discourse Treebank documents (Czech journalistic texts) for their genres. Our main interest lies in opening the possibility to observe how text coherence is realized in different types (in the genre sense) of language data and, in the future, in exploring the ways of using genres as a feature for multi-sentence-level language technologies. In the paper, we first describe the motivation and the concept of the genre annotation, and briefly introduce the Prague Discourse Treebank. Then, we elaborate on the process of manual annotation of genres in the treebank, from the annotators’ manual work to post-annotation checks and to the inter-annotator agreement measurements. The annotated genres are subsequently analyzed together with discourse relations (already annotated in the treebank) ― we present distributions of the annotated genres and results of studying distinctions of distributions of discourse relations across the individual genres.

pdf bib
Discourse Relations in the Prague Dependency Treebank 3.0
Jiří Mírovský | Pavlína Jínová | Lucie Poláková
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2013

pdf bib
Introducing the Prague Discourse Treebank 1.0
Lucie Poláková | Jiří Mírovský | Anna Nedoluzhko | Pavlína Jínová | Šárka Zikánová | Eva Hajičová
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Machine Translation with Many Manually Labeled Discourse Connectives
Thomas Meyer | Lucie Poláková
Proceedings of the Workshop on Discourse in Machine Translation

pdf bib
Subordinators with Elaborative Meanings in Czech and English
Pavlína Jínová | Lucie Poláková | Jiří Mírovský
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
Does Tectogrammatics Help the Annotation of Discourse?
Jiří Mírovský | Pavlína Jínová | Lucie Poláková
Proceedings of COLING 2012: Posters

pdf bib
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects
Eva Hajičová | Lucie Poláková | Jiří Mírovský
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects

pdf bib
Semi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT
Pavlína Jínová | Jiří Mírovský | Lucie Poláková
Proceedings of the Workshop on Advances in Discourse Analysis and its Computational Aspects

pdf bib
Interplay of Coreference and Discourse Relations: Discourse Connectives with a Referential Component
Lucie Poláková | Pavlína Jínová | Jiří Mírovský
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This contribution explores the subgroup of text structuring expressions with the form preposition + demonstrative pronoun, thus it is devoted to an aspect of the interaction of coreference relations and relations signaled by discourse connectives (DCs) in a text. The demonstrative pronoun typically signals a referential link to an antecedent, whereas the whole expression can, but does not have to, carry a discourse meaning in sense of discourse connectives. We describe the properties of these phrases/expressions with regard to their antecedents, their position among the text-structuring language means and their features typical for the “connective function” of them compared to their “non-connective function”. The analysis is carried out on Czech data from the approx. 50,000 sentences of the Prague Dependency Treebank 2.0, directly on the syntactic trees. We explore the characteristics of these phrases/expressions discovered during two projects: the manual annotation of 1, coreference relations (Nedoluzhko et al. 2011) and 2, discourse connectives, their scopes and meanings (Mladová et al. 2008).