Rule-Based Pronominal Anaphora Treatment for Machine Translation

In this paper we describe the rule-based MT system Its-2 developed at the University of Geneva and submitted for the shared task on pronoun translation organized within the Second DiscoMT Work-shop. For improving pronoun translation, an Anaphora Resolution (AR) step based on Chomsky’s Binding Theory and Hobbs’ algorithm has been implemented. Since this strategy is currently restricted to 3rd person personal pronouns (i.e. they, it translated as elle, elles, il, ils only), absolute performance is affected. However, qualitative differences between the submitted system and a baseline without the AR procedure can be observed.


Introduction
In this paper we describe the system submitted for the shared task on pronoun translation organized in conjunction with the EMNLP 2015 Second Workshop on Discourse in Machine Translation (Hardmeier et al., 2015). We present the rulebased Machine Translation (MT) system Its-2 developed at the University of Geneva. A demo can be found here: http://latlapps.unige. ch/Translate?
The interest for the pronoun translation task is at the heart of a line of research concerned with discourse phenomena and MT. Now, it is widely acknowledged that many remaining problems within MT can improve only if discourse knowledge, i.e., processing of phenomena beyond the sentence level, is taken into account (Webber and Joshi, 2012;Hardmeier, 2012;Joty et al., 2014).
The problem of pronoun translation has its roots in the nature of anaphors. These are words empty of semantic content themselves, such as third person referential pronouns, which refer back to other words with semantic content to find their meaning. We know which element a pronoun refers to (its antecedent), in part because it agrees in gender or number. For example, in (1a), we are able to link they (pronoun) with bikes (antecedent) because they agree in number. This linking, or resolution, seems trivial for a human, but is not straightforward for a machine, especially if the antecedent and the anaphor are not in the same sentence and the text in question contains several sentences with several potential antecedents. Developing automatic Anaphora Resolution (AR) systems is a research domain on its own and has been active for decades (Mitkov, 2001;Mitkov, 2002;Strube, 2007;Stoyanov et al., 2009;Ng, 2010).
(1) a. Paul left two bikes in front of the house.
When he came back, they were no longer there.

The Problem of Pronoun Translation for English-French
If sentence (1) is to be translated into French, one has the choice (mainly) between ils and elles for translating the pronoun they. This choice is no longer dependent on the English antecedent bikes, but on its translation in French either as the masculine noun vélos (2a) or as the feminine noun bicyclettes (2b).
The focus of the shared task is on the English third person pronouns it and they. As observed in corpus, these pronouns are not always translated as pronouns, but can correspond to a content noun phrase (NP) or to nothing at all. This is the case  (3) where the English pronoun they in (3a) corresponds to a content NP in French (3b).
(3) a. To conclude, I would just like to say something on the principle of subsidiarity. I believe it to be of vital importance that where Member States allow regions and local authorities to raise taxes, they should continue to be able to do so and not be subject to across-the-board regulation by Europe. b. Enfin, concernant le principe de subsidiarité, je voudrais dire que j'estime indispensable que les États membres puissent continuer d'autoriser les régions et les communes à percevoir des taxes et que ce domaine ne soit pas uniformément réglé par l'Europe .
Moreover, even in cases where a pronoun is translated as a pronoun, the mapping is not one-toone. To illustrate this, we composed a sample of 25,000 it and they taken from the Workshop data (instances from the Europarl, TED and News Commentary files are included) (Hardmeier et al., 2015). The translation distribution of these two pronouns is presented in Table 1 and Figure 1. 1 Table 1 shows that each of these pronouns can be translated with at least 7 other pronouns in different proportions. This emphasizes the fact that agreement must be checked in the target language.
1 These correspondences were determined using the automatic word alignments provided with the training data for the prediction track of the shared task and they were corrected by hand. Specifically, 446 instances of pronouns aligned to random words were corrected. The OTHER category stands for cases such as example (3), where the translation corresponds to something which is not a pronoun. This category amounts to ≈20-25% of the translations. NONE, on the other hand, corresponds to English pronouns which were not translated at all in French (4). 2 Similar proportions were reported by Weiner (2014) for the translation from English to German.

Related Work
The AR problem has been vastly addressed since the 1980s using rule-based methods first, and corpus-based methods more recently. Two algorithms are particularly important both for their foundational character and their pertinence with the system described here : Hobbs' (1978) algorithm and Lappin & Leass' (1994) Resolution of Anaphora Procedure (RAP). Hobbs' algorithm deals with third person pronouns only (he, she, it, they). It traverses the parse trees of the sentences looking for NPs of the same gender and number as the anaphor to resolve. The potential antecedents are prioritized according to their grammatical function, in a way that a subject is preferred to a direct object which is also preferred to an indirect object. While reporting accuracy of 88.3%, Hobbs' algorithm has been criticized because of its assumption of perfect syntactic analysis, since results are computed using parse trees built manually.
The RAP algorithm, on the other hand, treats third person pronouns, reflexives, reciprocals and pleonastic pronouns. RAP is based on a series of agreement filters, a binding algorithm which prioritizes arguments according to their function -like Hobbs' algorithm-and salience weighting, a concept of centering theory. It builds on parse trees and identifies referents by analyzing each noun phrase. Each referent has an associated salience value according to a predefined scale, which is updated with every sentence, when the value reaches zero, the potential referent is removed from the list. The authors report 86% accuracy, however this figure is computed using perfect syntactic analysis as well.
A third system is particularly important in the development of AR. We refer to Soon, H. T. Ng, and Lim (2001) one of the first corpus-based successful systems. Rather than finding antecedents for pronouns, their interest is coreference resolution (CR), i.e., finding all NPs in a text which refer to the same world entity. The system uses a pairwise classification paradigm based on a set of features encoding distance, morphological and semantic agreement, definiteness and type of NPs. It achieves a recall of 58.6% and a precision of 67.3% on the MUC-6 corpus (Grishman and Sundheim, 1995).
The question of pronoun translation, on the other hand, has caught the attention of researchers working on Statistical Machine Translation (SMT) for a few years now, resulting in more or less regular publications on the subject since 2010. The most straightforward methods have already been explored, although with limited performance. The first attempts to improve pronoun MT relied on external AR systems difficult to reconcile with SMT systems themselves, an approach which introduces many errors (Le Nagard and Koehn, 2010;Hardmeier and Federico, 2010;Guillou, 2011;Guillou, 2012).
The latest solution has taken the form of a pronoun predictor, an algorithm able to predict a pronoun in the target language using source language information and easily embeddable with a SMT system. Such a predictor, however, is hard to train and results are yet unsatisfactory (Popescu-Belis et al., 2012;Hardmeier et al., 2013;Hardmeier et al., 2014). An automatic post-processing approach has also been reported by Weiner (2014). This method consists in automatically correcting the MT output based on the anaphora-pronoun pairs collected from the source text using a AR system.
Finally, using the coreference annotation of the Prague Dependency Treebank (PDT) (Kučová and Hajičová, 2005;), Novák (2011 focuses on the translation of it using a classic transfer system. During the parsing stage, each English it pronoun is assigned a label for its interpretation. These labels are then used for generating the correct translation in English.

Its-2
Its-2 ) is a rule-based translation system based on the Fips parser (Wehrli, 2007). The translation process follows the three classic steps: analysis, transfer and generation. Start with the analysis module. For a given source language sentence, the parser produces an informationrich phrase-structure representation, along with predicate-argument labels. The grammar implemented in the Fips parser is heavily influenced by Chomsky's minimalism program and earlier work (Chomsky, 1995), but also includes concepts from other theories such as LFG (Bresnan, 2001) and Simpler Syntax (Culicover and Jackendoff, 2005). The syntactic structures built by the parser follow the general X-bar schema shown in (5), which yields relatively flat structures, without intermediate nodes.
Each constituent XP is composed of a head, X, along with a (possibly empty) list of left subconstituents (L) and a (possibly empty) list of right sub-constituents (R), where X stands for the usual lexical categories -N(oun), V(erb), A(djective), Adv(erb), P(reposition), C(onjunction), etc., to which we add T(ense) and F(unctional). The T category stands for tensed phrases, corresponding, roughly, to the traditional S category of standard generative linguistics. As for F, it is used to represent secondary predicates, as in the so-called small clause constructions.
The transfer module maps this source language abstract representation to an equivalent target language representation. The mapping is achieved by a recursive traversal of the source-language structure, starting with the head of a constituent, and then its right and left subconstituents. Lexical transfer occurs at the head level and yields a target language equivalent term of the same or different category, which becomes the new current head. The target language structure is then projected on the basis of the head. In this way, the final output is generated according to the lexical features of the target language. Argument constituents, on the other hand, are determined by the subcategorization properties of the target language predicate. The necessary information is available in the lexical database. Transformational rules, in the traditional Chomskyan sense, can apply to generate specific structures such as passive or wh-constructions (interrogative, relative, tough-movement 3 ). In addition, the transfer procedure can be augmented with language-pair specific transfer rules, for instance to modify the constituent order.
Currently, the Its-2 system is available for ten language pairs between English, French, German, Italian and Spanish. For each language pair, there is a bilingual, bidirectional dictionary implemented as a relational table containing the associations between the lexical items of source and target languages. Other specifications such as translation context, semantic descriptors and argument matching for predicates are also contained in the table.
In the Its-2 system, pronouns are handled like other lexical heads, that is, they are transferred and translated as heads of phrases, using the bilingual dictionary. This strategy, which works fine for non-anaphoric pronouns, is clearly insufficient for anaphoric pronouns, for which knowledge of antecedent is mandatory. The following section describes our preliminary attempt to implement an anaphora resolution component in the Its-2 system, as part of the Fips parser. For the time being, this AR component only deals with 3rd person personal pronouns such as (he, she, it, her, him, etc.). The basic idea underlying our implementation is 3 tough-movement refers to subjects of a main verb which are also the object of an embedded infinitive verb. In This book is easy to read, for instance, this book is both the subject of the main verb and the logical object of the verb to read. that the proper form of a target-language pronoun depends on the gender and number features of its (target-language) antecedent. Since we do not perform AR on the target language, this information can be retrieved through the links connecting the source-language pronoun, its antecedent and the target-language correspondence of the antecedent. To illustrate this process, consider the following example: (6) a. en Paul bought an ice-cream and will eat it later. b. fr Paul a acheté une glace et la mangera plus tard.
The pronoun it in the source language should be translated as a feminine (clitic) pronoun la in the French sentence, because ice-cream, the antecedent of it, is translated as glace, a feminine noun.

Binding Theory AR
As indicated above, our AR procedure is part of the Fips parser and currently only deals with 3rd person personal pronouns. It is highly influenced by Chomsky's Binding Theory (1981), which is not an AR method per se, but rather a set of constraints useful to exclude otherwise potential antecedents. These constraints follow two principles: Principle A states that reflexive and reciprocal pronouns find their antecedents within their governing category (the smallest clause that includes them); Principle B states that 3rd person personal pronouns find their antecedents outside of the clause that includes them (Reinhart, 1983;Büring, 2005). 4 Our strategy for anaphora resolution recalls in several ways the one used by Hobbs (1978) or Lappin & Leass (Lappin and Leass, 1994), adapted to the specific structures of the Fips parser.
The algorithm comprises three steps: 1. impersonal pronouns The impersonal pronoun it in English -il in French -has no antecedent and should be excluded from further consideration by the AR procedure. The identification of impersonal pronouns is achieved on the basis of lexical information (verbs lexically marked as impersonal, for instance meteorological verbs such as to rain or to snow), as well as syntactic information. For instance, adjectives which can take so-called sentential subjects occur with an impersonal subject when the sentence is extraposed as in: (7) a. It was obvious that Paul had lied. b. It is easy to see that.
Similarly, impersonal subject pronouns can be found in passive structures with sentential complements: (8) It was suggested that Paul would do the job.

reflexive or reciprocal pronouns
We assume a simplified interpretation of Principle A in which this type of pronoun always refers to the subject of the sentence that contains it. In cases of embedded infinitive sentences, we assume the presence of an abstract subject pronoun (PRO, unrealized lexically) whose antecedent is determined by the control theory and ultimately by lexical information. For example, in the sentence Paul i promised Mary [PRO to take care of himself i ], himself refers to the subject pronoun PRO, which in turn refers to the noun phrase Paul.
3. referential non-reflexive/reciprocal pronouns Such pronouns, currently restricted to the non-impersonal it, along with he, him, she, her, they, them, etc., undergo our simplified interpretation of Principle B, which means that they must have an antecedent outside of the clause that contains them. We further restrict possible antecedents to arguments, excluding adjuncts noun phrases. The search for antecedents considers all preceding clauses within the sentence as well as within the previous sentence and makes an ordered list of the noun phrases which agree in number and gender with the pronoun. 5 The 5 The n preceding sentences for finding an antecedent is a variable number (Klappholtz and Lockman, 1975). However, the large majority of the works in the field use an n value between 1 and 5. Here we follow Hobbs' estimation of n ≤ 1 for 90% of the cases. order is determined by proximity, as well as by the grammatical function of the antecedent (subject, then grammatical object, then prepositional complements, etc.).
In summary, our AR procedure is based on a simplified interpretation of the principles A and B of the Binding Theory. After attempting to eliminate impersonal pronouns, the procedure uses principles A and B, respectively to handle reflexive/reciprocal pronouns and other 3rd personal referential pronouns. Our simplified interpretation of those principles state that reflexive/reciprocal pronouns can only refer to the subject of their clause, while other pronouns can refer to noun phrases outside of their immediate clause. When several noun phrases meet those conditions, priority is given to grammatical function and locality.

Results and Discussion
The translation of the test set using the AR component does not have an impact on the BLEU scores (Papineni et al., 2002) (as expected). When measuring only the translations of pronouns, however, the AR component shows a positive effect when compared to a baseline without it, as shown in Table 2. Since these results are computed using exact word-level alignment matching between the candidate translation and an unique reference (Hardmeier et al., 2015), they are only indicative.   Hardmeier and Federico (2010).

BLEU
For the sake of completeness, a manual evaluation of two documents from the testset, amounting to 405 sentences or 203 pronouns, was completed. Two translations with and without the AR component were evaluated. The results are given in Table  3.
It can be seen that the reflexive/reciprocal pronouns did not change between the two outputs. Besides, all observed errors were due to incorrect antecedent identification, leading to incorrect pronoun generation. One such a case is (9), where the  him  0  17  0  it  18  86  6  them  0  21  0  themselves  0  1  0  they  2  47  5  Total  20  172  11   Table 3: Results obtained from the manual evaluation of 203 pronouns from the test set.
algorithm turns a correctly translated pronoun by the baseline into an incorrect one. In this example, the word procedures, which is feminine in French, is identified as antecedent, causing then the generation of elles instead of ils.
(9) a. SRC And he spent all this time stuck in the hospital while he was having those procedures, as a result of which he now can walk. And while he was there, they sent tutors around to help him with his school work. b. W/O AR Et il a passé tout ce temps englué dans l'hôpital tandis qu'il avait ces procédures, comme un résultat de lequel maintenant il peut marcher. Et tandis qu'il était là-bas, ils ont envoyé des professeurs autour pour l'aider avec son école à travailler. c. W/ AR Et il a passé tout ce temps englué dans l'hôpital tandis qu'il avait ces procédures, comme un résultat de lequel maintenant il peut marcher. Et tandis qu'il était là-bas, elles ont envoyé des professeurs autour pour l'aider avec son école à travailler In almost the double of cases, however, the AR works in favor of a better pronoun translation. This is the case in example (10). Here the word acceptance is correctly identified as the antecedent. This translates as the feminine acceptation in French, therefore, the pronoun it is translated as elle.
(10) a. SRC But acceptance is something that takes time. It always takes time . b. W/O AR Mais l'acceptation est quelque chose qui prend le temps. Il prend toujours le temps. c. W/ AR Mais l'acceptation est quelque chose qui prend le temps. Elle prend toujours le temps.
Despite our own evaluation, the official manual evaluation results of the task produced an accuracy of 0.419 without translations as OTHER and 0.339 with OTHER. These results were rather low when compared with the other submitted systems, but they are not discouraging. These scores are rather due to the fact that our system does not generate ça, cela, ce or on as possible translations of it, they. This is the case of example (11), where a translation of it as ça or cela would have been preferable. Yet, there is an effect of the AR component, visible in the generation of pronoun elle.
(11) a. SRC And when I was an adolescent, I thought that I'm gay, and so I probably can't have a family. And when she said it, it made me anxious. b. W/O AR Et quand j'étais un adolescent, j'ai pensé que je suis gai, probablement et ainsi je ne peux pas avoir une famille. Et quand elle l'a dit il m'a rendu anxieux. c. W/ AR Et quand j'étais un adolescent, j'ai pensé que je suis gai, probablement et ainsi je ne peux pas avoir une famille. Et quand elle l'a dite elle m'a rendu anxieux .
The manual evaluation also revealed that refining our rules to translate cases such as (7) and (8) as ce instead of il would be a good start for tackling this problem.

Conclusion and Future Work
We have presented an implementation of an AR component within the transfer-based system Its-2. The AR strategy, which applies during parsing, is based on the principles of Chomsky's Binding Theory. Currently, this strategy is restricted to 3rd person personal pronouns they, he, she, it, her, him and does not consider translations as demonstrative pronouns ça, cela or ce. However, given recent evidence from different corpora, rules to include these translation options will be developed in the future.