Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities

Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A core micro-planning task within NLG is referring expression generation (REG), which aims to automatically generate noun phrases to refer to entities mentioned as discourse unfolds. A limitation of novel REG models is not being able to generate referring expressions to entities not encountered during the training process. To solve this problem, we propose two extensions to NeuralREG, a state-of-the-art encoder-decoder REG model. The first is a copy mechanism, whereas the second consists of representing the gender and type of the referent as inputs to the model. Drawing on the results of automatic and human evaluation as well as an ablation study using the WebNLG corpus, we contend that our proposal contributes to the generation of more meaningful referring expressions to unseen entities than the original system and related work. Code and all produced data are publicly available.


Introduction
Data-to-text Natural Language Generation (NLG) is the computational process of generating natural language in the form of text or voice from non-linguistic data. A traditional micro-planning task within the pipeline data-to-text architecture is referring expression generation (REG) (Krahmer and van Deemter, 2019), which aims to automatically generate appropriate noun phrases (e.g., The mathematician Ada Lovelace) to refer to entities (e.g., Ada Lovelace) mentioned as discourse unfolds (e.g., " was the first to recognise that the machine had applications beyond pure calculation.").
Traditionally, REG systems produce references to discourse entities in two explicit steps. First, they decide on the referential form, i.e., choosing whether a referring expression should be a pronoun (She), a proper name (Ada Lovelace), a description (The mathematician), etc. Once the choice is made, such systems textually realize the referring expression based on the chosen referential form and discourse context. If the first step selects a proper name as the form to refer to Ada Lovelace for instance, the ensuing step is responsible for deciding, among Ada, Ada Lovelace, or another text realization of a proper name, i.e., the one that is the most appropriate referring expression to that entity in a given discourse context.
With the advent of large amounts of data, REG systems have undergone a significant change in their architecture. From being rule-based modular, they have become data-driven end-to-end systems that aim to perform the choice of referential form and surface realization jointly. An example of these more integrated approaches is NeuralREG (Castro Ferreira et al., 2018a), an end-to-end neural REG model that produces referring expressions deciding on form and content jointly based on representations of the referent and its surrounding context. Although NeuralREG is able to generate adequate referring expressions to discourse entities already seen during the training phase, the model does not generalize to unseen ones, i.e., it can not generate referring expressions to entities which were not seen during its training. This study aims to fill this gap by proposing two extensions to the model's original architecture. The first is a copy mechanism, which may decide at each decoding timestep whether the next token of the referring expression should be generated from the output vocabulary or copied from the input representation of the target entity. We thereby hypothesize that the model will be able to generate a token from the vocabulary for seen entities and to copy tokens from the input representation in the case of unseen ones. The second extension consists of representing the gender and type of the entity as input to the model. Such information can be easily extracted from the Semantic Web and may help the model to generate pronominal (e.g., She) and descriptive (e.g., The country) referring expressions to unseen entities.
To evaluate our approach, we conducted experiments relying on a delexicalized version (Castro Ferreira et al., 2018b) of the WebNLG corpus (Gardent et al., 2017b). We first compare our proposal with the original NeuralREG and other related approaches as ProfileREG (Cao and Cheung, 2019). Second, to assess the quality of the texts generated by our model, we conducted a supplementary evaluation with human judges. Next, we follow the rationale of ablation studies to analyze the importance of each feature in our model within the process of referring expression generation. Finally, we discuss some advantages of the introduced features and how they interact to improve accuracy, variety, and generalization.

Related work
Given an entity to be referred to in a particular context, traditional REG methods have addressed this task in two steps. The first one concerns the choice of referential form, i.e., deciding whether the target reference is more likely to be a proper name (Belo Horizonte), a description (The city), a pronoun (It), or another referential form. Regarding this step, Reiter and Dale (2000) suggested to always choose a full proper name as the first reference to a particular entity in a given context, whereas pronouns may be used for its subsequent references if there is no other entity with the same person, gender and number in-between the target reference and its antecedents. More recently, Castro Ferreira et al. (2016) proposed a naive Bayes method, which is able to non-deterministically choose a referential form to a particular reference. The model's choice is conditioned upon discourse features which studies in psycholinguistics have shown to impact this choice, such as grammatical position, givenness and recency of the target reference.
Once the referential form is chosen, the second step of traditional REG models focuses on the surface realization of the reference. Most part of the literature on this step focuses on the generation of descriptions (Dale and Reiter, 1995) although some studies have approached the generation of proper names (Siddharthan et al., 2011;van Deemter, 2016;Castro Ferreira et al., 2017).
In contrast to previous proposals that have focused on selecting referential form or referential content, Castro Ferreira et al. (2018a) proposed an end-to-end approach: NeuralREG, a referring expression generator able to perform the choice of referential form and the surface realization in an end-to-end style using a neural encoder-decoder architecture. Given an entity to be referred to in a particular textual context, the approach first encodes the entity identifier and the text prior (pre-context) and subsequent to the reference (post-context) to later decode this representation into an appropriate referring expression using attention (Bahdanau et al., 2015).
Although NeuralREG (Castro Ferreira et al., 2018a) can generate suitable referring expressions to entities seen during training, it presents certain problems when referring to unseen ones. To overcome this limitation, Cao and Cheung (2019) presented a profile based model. Their solution uses information from both profile 3 (i.e., information retrieved from the entity's Wikipedia page) and context (pre-and post-contexts jointly) to generate suitable references to unseen entities. The authors conclude that their approach is more successful to determine the most suitable referring expression to a particular entity.
In contrast to Cao and Cheung (2019) solution, in order to address the limitations of dealing with unseen relations and entities, our proposal uses a combination of a copy mechanism together with rep- Corresponding Text Adenan Satem was born in Japanese Occupied British Borneo. His successor was Abdul Taib Mahmud, who, resides in Sarawak and is a member of the "Barisan Raáyat Jati Sarawak" party.

Tag
Entity/Constant Referring Expression BRIDGE-1 Adenan Satem Adenan Satem PATIENT-1 Japanese occupation of British Borneo Japanese Occupied British Borneo BRIDGE-1 Adenan Satem His AGENT-1 Abdul Taib Mahmud Abdul Taib Mahmud PATIENT-2 Sarawak Sarawak PATIENT-3 "Barisan Raayat Jati Sarawak" Barisan Ra'ayat Jati Sarawak Template BRIDGE-1 was born in PATIENT-1. BRIDGE-1 successor was AGENT-1, who, resides in PATIENT-2 and is a member of the PATIENT-3 party. resentations of gender and type of referent as input to the model. We expect both extensions to make the model able to produce suitable referring expressions in particular to unseen entities. We describe our model in more detail in the next sections, based on the data used to investigate and evaluate our approach.

Data
We evaluated our proposal based on an enriched version (Castro Ferreira et al., 2018b) of the WebNLG corpus (Gardent et al., 2017a). The original resource is a parallel corpus with sets of RDF (Resource Description Framework) triples and their corresponding verbalizations. Each RDF triple set consists of subject-predicate-object (e.g., Adenan Satem | birthPlace | Japanese occupation of British Borneo), which is illustrated in Table 1 and can be verbalized in different forms. Each subject and object is a Uniform Resource Identifier (URI), which can be represented by a Wikipedia ID (e.g., Adenan Satem, Abdul Taib Mahmud) or a literal value like a date, number, or constant (e.g., "Barisan Raayat Jati Sarawak"), followed by a predicate (e.g., birthPlace) which is a relation between these entities (Gardent et al., 2017a;Gardent et al., 2017b).
The WebNLG dataset is an NLG benchmark that differs from other datasets (Novikova et al., 2017;Mille et al., 2018) due to its data diversity in terms of attributes, patterns, and shapes (i.e., RDF tree shapes from DBPedia). The corpus contains 25,298 English texts verbalizing sets of 1 to 7 RDF triples in 15 different domains. The dataset has five domains exclusive to the test set, providing adequate means to evaluate our model's performance regarding the generation of referring expressions to unseen entities.
We used an enriched version of the WebNLG corpus obtained by a delexicalization process (i.e., mapping each entity to a generic tag and later replacing their corresponding referring expressions in discourse with these tags) which was created by Castro Ferreira et al. (2018b). Table 1 shows an example of a set of 4 triples and corresponding text, together with the intermediate representations obtained in the delexicalization process, such as general tags, Wikipedia IDs (entity/constant), referring expressions and the delexicalized template.
To train and evaluate our approach, we have a pre-processing stage where we extract a collection of referring expression entries from the enriched version of WebNLG (Castro Ferreira et al., 2018b). This stage is performed once, where we map the WebNLG corpus information as the basis to obtain external information without adding new features nor changing the WebNLG structure. We avoid a possible impact on the evaluation results since all entities are available on DBPedia. Each entry consists of a Wikipedia ID, i.e., a target entity (Adenan Satem), a truecased tokenized referring expression (Adenan Satem or His), and lowercased tokenized pre-(Adenan Satem was born in) and post-contexts (Adenan Satem successor was Abdul Taib Mahmud, who, resides in Sarawak and is a member of the barisan raayat jati sarawak party.), indicating the surrounding context of the target reference.

Model
Our approach was based on NeuralREG (Castro Ferreira et al., 2018a) and aims to generate a referring expression y = {y 1 , y 2 , ..., y N } with N tokens to refer to a target entity, given the textual context prior to the reference M } with M tokens (e.g., pre-context) and subsequent to the reference } with L tokens (e.g., post-context). Unlike Castro Ferreira et al. (2018a), where the target entity is represented by a single token, our approach describes the referent by an identifier } with T tokens and its entity type E and gender G. To generate the referring expression given the description of the target entity and its surrounding context, we implemented an encoder-attention-decoder architecture with a copy mechanism, sharing the same input word-embedding matrix V , as explained in the following sections.

Encoder
In order to generate feature representations for the inputs, the model starts by encoding the identifier of the target entity as well as the pre-and post-contexts, using three different bidirectional Long-Short Term Memory layers (LSTM) (Hochreiter and Schmidhuber, 1997). The identifier of the target entity X } is represented by the forward and backward hidden- ). To form its final feature representation, forward and backward hidden-state representations at each timestep t are concatenated as h ]. Using the two remaining bidirectional LSTMs, the same process is repeated for the textual context surrounding the reference, resulting in the final pre-and post-context repre- , respectively. Finally, the type and gender of the target entity is encoded into their respective vector representations, V type and V gender , by looking up their entry in the sharing word-embedding matrix V .

Decoder
Once the information about the target entity and its surrounding contexts are encoded, their vector representations are fed into an LSTM decoder, augmented with attention and copy mechanisms, in order to produce an adequate referring expression to the target entity according to the context. The process is explained in detail in the following sections.
Attention Mechanism The decoder process starts by the attention mechanism (Bahdanau et al., 2015), which aims to compute a vector c t at each timestep t. The mechanism first computes the energies e At each decoding step t, a final context vector c Finally, in order to obtain the final context vector c t , we follow the concatenative approach of NeuralREG, where the attention vectors c Decoding After attending the representations of the target entity and its surrounding contexts, the resulting attention vector c t is concatenated with the previous decoding state s t−1 , the word-embedding of the previous generated token V y−1 and the vector representations of the type and gender of the target entity, V type and V gender . This concatenation is then fed into the decoding layer, which produces its next state s t . Finally, a softmax layer is applied over the decoding state s t to generate a probability distribution over the output vocabulary. Equations 4, 5 and 6 summarize this process: Copy Mechanism To make the approach able to generate referring expressions to unseen entities, we also implemented a copy mechanism during the decoding process, similar to the one presented by See et al. (2017). This mechanism first computes a probability p gen based on the attention vector of the target entity c (wiki) t , the decoding state s t−1 and the word-embedding of the previously generated token V y t−1 , as the following equation expresses: p gen is used to decide between (1) choosing the token with the highest probability in the softmax probability distribution P vocab (w) in Equation 6 or (2) copying the token from the description of the entity X (wiki) with the highest probability according to the attention weights α (wiki) t . The final probability distribution to choose the next token at each timestep t is given by the following Equation: In this context, we expect the model to learn that p gen should have a higher value when the target entity was seen during training, and a lower one when a referring expression should be generated for an unseen entity.
Loss During training time, the approach has its training parameters updated in order to minimize the following loss function: 5 Automatic Evaluation

Data
We used the delexicalized version of the WebNLG corpus described in Section 3. In particular, we used version 1.5 of the corpus, which is publicly available 4 . This version of the corpus contains 67,027, 8,278 and 19,210 referring expression instances in training, development and test sets, respectively. Training and development domains have instances of 10 semantic domains, whereas the test set has instances of those 10 domains, plus 5 unseen ones in the former sets. Each instance of the sets is formed by the target entity, a referring expression, and pre-and postcontexts. Pre-and post-contexts are represented in their lowercased and tokenized forms, whereas the referring expression in its truecased and tokenized one. Moreover, references to different discourse entities are represented by their Wikipedia IDs. In contrast, numbers, dates, and other constants are represented by one-word ID replacing white spaces with underscores and eliminating double-quotes (Castro Ferreira et al., 2018a;Castro Ferreira et al., 2018b). To represent the target entity X (wiki) as described in Section 4, we lowercase the Wikipedia ID of the target entity, remove all special characters and split it in a list based on underscores (e.g., Abdul Taib Mahmud → [abdul, taib, mahmud]). Accordingly, all target entities' gender (female, male, neutral) and type (person, organization, etc.), used by our approach, were automatically retrieved from DBpedia 5 .

Model Settings
Regarding the model parameters, we followed most of the settings from NeuralREG's set-up of Castro Ferreira et al. (2019). We trained the model with 60 epochs with a dropout of 0.2. Furthermore, we set the early stopping of the neural networks to 10 and the beam size to 1. We applied a maximum output limit generation of 30. Moreover, we set the batch, state, and attention sizes to 80, 256, and 256, respectively. Additionally, we set pre-context, post-context, and entity word embeddings to be 128D each.
OnlyNames correlates an entity that will be referred to by its Wikipedia ID. This baseline exclusively works with proper names by replacing entities underscores with white spaces (e.g., Ada Lovelace to "Ada Lovelace"). Instead of working exclusively with proper names, our approach implements other referential forms, such as pronouns and descriptions, consequently yielding a more natural discourse flow in the texts produced.
NeuralREG+Catt is as an end-to-end deep neural network model that uses both form and content to generate texts. The model works with a delexicalized version of the WebNLG corpus by first encoding pre-and post-contexts as a reference. In contrast to our proposal, NeuralREG+Catt does not implement a copy mechanism and does not consider any external knowledge when selecting the best referring expression.
ProfileREG encodes information from a local context and an external profile to generate references to a given entity. This model is able to determine the best reference to an entity by selecting from existing vocabulary, pronouns, or entity profile. Contrary to ProfileREG, our model uses selected entity features and different architectures in order to evaluate the best scenario for generating referring expressions.

Metrics
We calculated Accuracy and String Edit Distance (Levenshtein, 1966) in order to measure the quality of the generated referring expressions in comparison with the gold-standard ones. To evaluate the models' performance in realizing pronouns, we also computed the accuracy, precision, recall, and F1-score, based on a concise difference between the gold-standard referring expressions to the ones produced by the model. Finally, we compared the original texts against the references lexicalized through the models by computing text accuracy and BLEU score (Papineni et al., 2002). Table 2 presents the results of our model in comparison with the baselines for all entities as well as for seen and unseen ones. In terms of referring expression accuracy, string edit distance, text accuracy, and BLEU score, our proposed approach outperforms the three baselines considering all entities and seen ones only. Regarding unseen entities, our model presents higher results for the same metrics in comparison with all models, except for OnlyNames one. Regarding pronouns, ProfileREG introduces  Table 2: (a) Referring Expressions' Accuracy and String-edit distance (SED), (b) BLEU and Text Accuracy scores of the models, (c) Pronoun -Precision, Recall, and F1-Score of the models in the automatic evaluation. Best results are boldfaced, whereas the second best are underlined.

Results
Original: Adenan Satem was born in Japanese Occupied British Borneo. His successor was Abdul Taib Mahmud, who, resides in Sarawak and is a member of the "Barisan Raáyat Jati Sarawak" party. OnlyNames: Adenan Satem was born in Japanese Occupied British Borneo. Adenan Satem successor was Abdul Taib Mahmud, who, resides in Sarawak and is a member of the "Barisan Raáyat Jati Sarawak" party. NeuralREG: The Boeing light combat was born in Abilene, in Texas. They successor was the hal of the astronaut, who, resides in the state of Grenada and is a member of the "Barisan Raáyat Jati Sarawak" party. NeuralREG+Copy: Adenan Satem was born in the Japanese Occupied British Borneo. His successor was Abdul Taib Mahmud, who, resides in Sarawak and is a member of "Barisan Raáyat Jati Sarawak" party. ProfileREG: 258.2 Satem was born in the Japanese. Its successor was the Taib the Moro, who, resides in the Sarawak and is a member of the "Barisan Raáyat Jati Sarawak" party. the best results, while the OnlyNames model is not considered, since this model is not able to generate this form of reference. Table 3 shows an example of a text lexicalized with referring expressions generated by our proposal and the three baselines. The text was extracted from the test set of the data in the Politician domain, not present in the training and development sets. By comparing our approach (NeuralREG+Copy) to the baseline OnlyNames, we can see that our model is able to generate more variation in referring mechanisms since it makes use of a pronoun as a referential form, while OnlyNames uses repetition of proper names. The outputs for the Adenan Satem for NeuralREG and ProfileREG models show generation problems, namely entities completely unrelated to the references (e.g., The Boeing light combat and 258.2 Satem, respectively).

Human Evaluation
To assess the quality of the texts generated by our proposal and the three baselines, we conducted a supplementary evaluation with human judges.
Method Two applied linguists were recruited to rate the texts. They are proficient in English and have over 20 years' expertise as translators and language advisers.
We selected 75 instances of the delexicalized version of the WebNLG corpus, considering a unique occurrence for each combination between the number of triples (ranging from 1 to 7) and domain (10 seen and 5 unseen ones). After selecting the set of triples, we collected the corresponding produced versions of each investigated model introduced in this study (our proposal and three baselines). Finally, we randomly ordered the final trial set of (4 × 75 =) 300 sentences to decrease the bias of having the 4 generated texts together during the evaluation.
The performed evaluation followed the best practices suggested by  and the guidelines in Novikova et al. (2018) regarding human evaluations of NLG systems. For instance, we  chose well-defined criteria to assess text quality and a well-established scale for assessment. The participants were asked to rate the automatically generated sentences with respect to three criteria: fluency, i.e., whether the text flow was acceptable; grammaticality, i.e., whether grammatical and lexical patterns were close to human language patterns; and semantic adequacy, i.e., whether the information in the output text matched that of the input representation. In addition, a 5 point Likert scale was used (1 -very low, 2low, 3 -medium, 4 -high, and 5 -highly/fully adequate).
Results Table 4 summarizes the results of human evaluation regarding fluency, grammaticality, and semantic adequacy for all, seen, and unseen entities. Our proposed model outperformed the previous version of NeuralREG and presented competitive results compared to the current state-of-the-art in the literature. Regarding grammaticality, our model presents the best results for all, seen, and unseen entities considering the three baselines. Regarding fluency and semantic adequacy, human evaluation showed similar scores for our proposal and OnlyNames. Despite its limitations in referring expression generation, OnlyNames baseline performed very well, which can be accounted for by the fact that WebNLG is a corpus made up of texts potentially used to yield encyclopedia entries, which allow for repetition of proper nouns unlike other types of text. Hallucination and repetition often present on neural models (Rohrbach et al., 2018;Moryossef et al., 2019;Holtzman et al., 2018) can also account for OnlyNames good performance which does not suffer from this problem. An example of this can be seen in the RDF triple set: [Alfa Romeo 164 | assembly |Milan. Alfa Romeo 164 | relatedMeanOfTransportation | Saab 9000]. OnlyNames output was "Alfa Romeo 164, which is assembled in Milan, is a related means of transportation to Saab 9000, in that they are both cars", whereas Neu-ralREG+Copy produced the following output "Romeo Romeo 164, which is assembled in Milan, is a related means of transportation to Saab, in that they are both cars". Despite eventual hallucination problems, human evaluation showed that our model has more consistent performance, improving overall quality.

Ablation Study
We also performed an ablation study in order to analyze the performance of the different features used by our proposal.
Method We evaluated the copy mechanism, pre-and post-contexts, as well as gender and type embeddings in order to determine which feature best influences the model. The performance of every single feature was analyzed by running the model without it and measuring loss, according to the referring expression accuracy metric in the test part of the data, considering all entities as well as only seen and only unseen ones. When removing the copy mechanism, the model has a similar performance to the original NeuralREG though with the target entity also represented by the entity embeddings for gender and type.  Results Table 5 depicts the results of our ablation analysis. The removal of the copy mechanism feature (Ablation 1) causes the highest decrease in the referring expression accuracy for all entities as well as only seen and unseen ones, validating this feature as the most efficient within the model. In addition, the copy mechanism relevance for generalizing to unseen entities is proved by a high accuracy drop in the analysis. Furthermore, removing entity embeddings for the entities' gender and type (Ablation 2) causes a negligible drop in all scores, particularly when generating referring expressions to unseen entities. Regarding context, pre-context (Ablation 3) causes the second highest decrease, being validated as the second best feature. Post-context (Ablation 4) does not yield the expected performance regarding accuracy for seen entities, since the produced referring expressions to this type of entities proved better without this feature. Nevertheless, we can stress the importance of post-context for unseen entities, since referring expression accuracy for this kind of entity decreases with the removal of this feature.

Discussion
This study set out to address a limitation of NeuralREG, a state-of-the-art encoder-decoder referring expression generation system, which is its failure to generate references to entities not previously seen during its training. To solve the problem, we proposed two extensions to the original approach: a copy mechanism and using a multi-token representation for the referent as well as its gender and type. Considering pre-and post-contexts where an entity should be referred to and information about the entity's gender and type, at each decoding step our model decides whether the next token of the referring expression should be generated from the output vocabulary or copied from the multi-token input representation of the entity.
Although our approach set out to improve the generation of referring expressions to unseen entities only, an automatic evaluation shows that it presents competitive results regarding all the models when comparing overall performance and also for seen entities. Regarding generation of pronouns and references to unseen entities, our model outperforms ProfileREG and OnlyNames. Furthermore, human evaluation, conducted to rate the automatically generated sentences showed that our model achieved the best results regarding grammaticality. Regarding fluency and semantic adequacy, NeuralREG+Copy and OnlyNames presented similar results as shown in table 4. The similarities between both models are striking, which demonstrates that OnlyNames remains a competitive baseline in REG. In order to understand these results and have a deeper insight on the performance of each feature, we also conducted an ablation analysis, which showed different results in referring expression generation for seen and unseen entities.
Surrounding Context Pre-and post-contexts seem to perform different roles when used as input features to generate referring expressions. Based on our ablation analysis, pre-context plays a crucial role, being ranked the most important feature when generating referring expressions to seen entities and third to unseen ones. On the other hand, post-context seems to have a slight contribution only for the generation of references to unseen entities. In fact, when not used, the approach performs better for generation of referring expressions to seen entities.
Copy Mechanism Among the input features, the copy mechanism proved an essential feature of the model. Its importance was supported by the results in the ablation analysis, which pointed to this feature as the most important for the generation of referring expressions to unseen entities. This confirms the copy mechanism to be a productive addition to NeuralREG in order to make it able to work with entities not seen during training.
Gender and Type Entity Representations Besides the copy mechanism, we sought to make Neural-REG generalize to unseen entities by feeding it with embedding representations of the referent's gender and type. Among the motivations to use these features, we considered how easy it is to access this information in the Semantic Web, since entities in WebNLG are represented by their URIs. Second, we hypothesized that such representations would allow the model to generate pronominal and descriptive referring expressions to unseen entities. To some extent, the pronominal reference his to the unseen entity Adenan Satem produced by our approach and depicted in Table 3 shows that the representations may indeed help. Ablation results also showed that they are the second most important feature in the referring expression generation to unseen entities, only behind the copy mechanism (another extension proposed by our study). Such result confirms our hypothesis.
Future Work Although our two proposed extensions allowed NeuralREG to perform better when compared to its previous version and also generate superior referring expressions to unseen entities, Only-Names performed slightly better than our approach in the generation of referring expressions to this kind of entities as well as produced similar results to our model during human evaluation. We assume that part of this result is related to known issues in semantic neural models, such as hallucination, which could influence both automatic and human evaluation results, in particular for unseen entities. To fix this issue, we aim to investigate the generation of synthetic referring expression data to augment the training data and better tune our approaches.
Results also show that ProfileREG outperformed our model in the generation of pronouns. We hypothesize this result as an impact of incorrect gender and type information for some entities extracted from DBpedia. For instance, the entity BBC in DBpedia 6 is also considered of the type Person, leading to the generation of inaccurate descriptions (The person) and pronominal (She or He) outputs. In future work, we aim to manually inspect all type and gender information extracted from DBpedia in order to avoid errors. Additionally, to generate better pronominal referring expressions, we will enhance our approach by using the "profile" computed by the ProfileREG model.

Conclusion
We have proposed extensions to the NeuralREG model to overcome shortcomings in not being able to generalize to entities not seen during the training process when generating referring expressions. We can conclude that our proposal contributes to generating more significant referring expressions to unseen entities, besides seen ones. Furthermore, our study provides a new version of a strong baseline within the NLG area. A future direction in our work is to implement the improvements discussed in this study in order to match OnlyNames performance for unseen entities.