Designing Algorithms for Referring with Proper Names

Standard algorithms for attribute choice in the generation of referring expressions have little to say about the role of Proper Names in referring expressions. We discuss the implications of letting these algorithms produce Proper Names and expressions that have Proper Names as parts.


Introduction
Reference -the production and comprehension of referring expressions -has been studied intensively throughout the cognitive sciences. Computational Linguists are no exception, often paying particular attention to the generation of referring expressions (REs, (Krahmer and Van Deemter, 2012) for a survey). This area of Natural Language Generation is known as Referring Expressions Generation (REG). An important strand of REG focusses on "one-shot" REs, which do not rely on any linguistic context (precluding anaphoric and other attenuated REs); these are also the primary focus of this paper. 1 One of the classic algorithm coming out or REG is the Incremental Algorithm (IA) (Dale and Reiter, 1996). Simplifying slightly, the IA starts by ordering properties in a sequence known as the Preference Order. The algorithm starts with an empty RE, then examines the first property from the Preference Order. If this property is true of the referent r and rules out one or more distractors, it is added to the RE; otherwise it is not added, and the next property in the Preference Order is examined. The algorithm terminates when properties P i 1 , .., P i k have 1 See, however, section 2.1 on the use of salience. been selected that jointly identify the referent (i.e., . Different Preference Orders tend to generate different REs, so finding a good one is important. Proper Names (PNs) are among the most widely studied REs in cognitive science (see e.g., (van Langendonck, 2007), passim;(van Deemter, 2016), chapters 2 and 7), and a crucial area of applied work in Information Extraction (e.g., (Jurafsky and Martin, 2009) chapter 22 on Named Entities). Yet REG 2 has neglected PNs, presumably because names could easily trivialise REG: suppose the KB contained a set of people. If only one of the people in the KB is named Obama, then it is easy to identify him, by referring to him by his name. Since PNs tend to make excellent REs, REG would become trivial -so the presumed argument goes.
We argue that this line of reasoning misses some important points and that PNs deserve more attention from researchers in REG.

Generating REs that contain a PN
Observe that: -Name are often ambiguous. "Obama", for instance (not to mention "Smith") could refer to many different people.
-A referent can have many names ("Barack", "Obama", "Barack Obama", etc.) or none. -A name can combine with other properties and epithets, as in "Mr Barack Obama, America's current president".
-A name can be part of an expression that refers to another referent. The process is recursive, e.g., "The height of the income of Obama's Secretary of State". So how might PNs be given a place in REG?

Incorporating Proper Names into REG
Received views of REG suggest that the process contains two steps (Reiter and Dale, 2000): Step 1 decides what general syntactic type of RE to use (e.g., a full description, a PN, a pronoun, or some other type); once this decision is taken, Step 2 (discussed in section 1 above) makes more fine-grained decisions, for example, in case of a full description, this step decides what properties should be expressed in the description. The observations of the previous section make this two-step approach problematic, for example because (in some situations) no PN may be available for a given referent, or because PNs and descriptions must be combined (in other situations). In what follows, we explore a radical alternative, showing that if a suitable representation scheme is used, it is possible to incorporate all decisions related to PNs within Step 2.
Suppose each individual in the KB comes not just with a number of descriptive properties but with 0 or more PNs as well, where a PN is regarded as a property that is true of all individuals who bear this name.
-(being named) Joe Klein is a property of all individuals named Joe Klein -(being named) Joe is a property of all those individuals named Joe -(being named) Klein is a property of all those individuals named Klein The idea that a PN can be viewed as a property of its bearer deviates from a long tradition of work in philosophy and logic that regards PNs as rigid designators (Kripke, 1980), yet it enjoys considerable support. (Burge, 1973), for example, observes that PNs can behave like common nouns, as in "There are relatively few Alfreds in P", and "An Alfred joined the club today" (see (Larson and Segal, 1995) and (Elbourne, 2005) for further support).
A simple KB containing PNs as well as ordinary properties could look like this: JOB: political commentator, commentator NATIONALITY: American NAMES: Mr Joe Klein, Joe Klein, Joe, Klein Because longer versions of a person's name are applicable to only some of the individuals to whom a shorter version is applicable, the values of the NAMES attribute often subsume each other: all people who are called Mr Joe Klein are also called Joe Klein, and so on. These properties can be dealt with using the mechanism for subsumption in the Incremental Algorithm (which would also state that all political commentators are commentators, for instance) (Dale and Reiter, 1996).
Of course if Joe Klein is the only Joe in the room, we can refer unambiguously to him saying "Joe". This is accounted for by making the REG algorithm that operates on the KB above salience aware in one of the standard ways, e.g., (Krahmer and Theune, 2002). Salience also suggests a way in which REG can extend beyond one-shot REs to cover reference in extended discourse or dialogue: if x is introduced by means of the PN "Joe Klein" in a text, then if x is the only Joe so far mentioned, then this makes x the most salient of all Joe's, licencing the short RE "Joe".
In short: -Each object has an attribute NAMES.
-The set of values of NAMES can be empty (no name is available), singleton (one name), or neither (several names).
-A subsumption (i.e., subset) relation can be defined among these values.
-Different objects can share some or all of their names.
If names are the "canonical" way of referring to an entity, then standard mechanisms could be invoked to favour names at the expense of other properties. One option is to Dale and Reiter's Preference Order (Dale and Reiter, 1996), making NAMES the most highly preferred attribute in an Incremental Algorithm. Alternatively, a new type of brevity-based algorithm might be used that generates the RE that contains the smallest number of syllables. 3 Assuming that PNs are brief (as they often are), this type of approach would favour PNs, and it would favour shorter PNs over longer ones (e.g., "Klein" over "Joe Klein"). It would also predict that PNs are avoided where large sets are enumerated (compare the RE "the citizens of China" with an enumeration of all the elements of this set).
To see how REG could work in an Incremental Algorithm, consider a simple KB, where each individual has 1 name: This approach generates REs such as: d 1 : "Rover" d 2 : "The dog called Max" w 3 : "Shona, who loves a dog" With the above representation scheme in place, classic REG algorithms can be applied without modifications. However, the scheme does not allow PNs to have properties (e.g., "is a posh name", "has 5 characters" ,"is common in Scotland"). If names are reified, then this becomes possible; what's more, PNs themselves could be referred to (e.g., "the name his friends call him"): a name is just another object linked (on the one hand) to the things it names and (on the other hand) to the ways in which it manifests itself in spelling, pronunciation, etc. For example, n 2 may name both a man and a dog, and it may be written as "Max": Type: woman {w 1 , w 2 , w 3 }, man {m 1 }, dog {d 1 , d 2 }, name {n 1 , n 2 , n 3 , n 4 } Action: feed {(w 1 , d 1 ), (w 2 , d 2 ), (w 2 , d 1 )} Affection: love {(w 1 , d 1 ), (w 3 , d 1 )} Naming: name {(d 1 , n 1 ), (d 2 , n 2 ), (w 1 , n 3 ), (w 2 , n 4 ), (w 3 , n 4 ), (m 1 , n 2 )} Spelling: written {(n 1 , Rover), (n 2 , M ax), (n 3 , M ary), (n 4 , Shona)} Standard REG algorithms can use this KB to generate "The name shared by a man and a dog" (i.e., "Max"). If n 4 is Scottish, we obtain "women with a Scottish name" as well. A slight drawback of this approach, which treats names as objects, is that subsumption can no longer be used to compare names.

Challenges facing this approach
This approach works, but it puts a spotlight on some difficult issues, some of which affect the generation of descriptive REs as well: 1. PNs are not always preferred. For example, if the Director of Taxes is Mrs X, this does not mean that "Contact the Director of Taxes" is always better worded as "Contact Mrs X", since her job title may be relevant. The lack of a computational theory of relevance affects all of REG but becomes very noticeable in the choice between PNs and descriptions. 2. There is no reason for limiting reification to PNs. Colours too could be reified, for example, to generate "the colour of grass". The traditional dichotomy between objects and properties limits the range of REs that these algorithms can generate. 3. REG algorithms are ignorant about social relations between speaker, hearer, and referent. Consider a couple with a son and a daughter. Speaking to his mother, the son could say "my sister", "your daughter", etc., yet in most situations a PN would be better. Titles and epithets like "Dr" and "Aunt(y)", complicate matters further. 4. As elsewhere in REG, questions about overspecification need to be faced. When, for example, is it useful to add an appositive to a PN, as in "Mr Barack Obama, America's current president"? Furthermore, Linguistic Realisation will have to decide about the surface order of the PN and the appositive, perhaps depending on whether the PN and/or the appositive (by itself) refers uniquely. 5. If PNs are properties of the referent, then this leaves room for expressing one and the same PN with a different string. (For example, "Doctor" may be worded as "Doctor", "Dr.", or "Dr".) The desirability of this use of Linguistic Realisation would need to be investigated. 6. It is often difficult for the speaker to assess whether the hearer knows who a given PN refers to. The hearer may never have heard of Joe Klein, for example, and this would cause the RE "Joe Klein" to mis-fire. Lack of shared knowledge is a problem for descriptive REs as well, but it is exacerbated in the case of PNs, because names are highly conventional: once I've learned what "red" means, I can apply the word to any red object, but learning your name does not teach me to apply this name to anyone else.
The last point has important implications. Imagine a programmer wanting to implement the algorithm of section 2.1, aiming to mimic human language use. If she decides to implement an Incremental Algorithm, then how to choose its free pa-33 Figure 1: A trial in the "people" part of the TUNA experiment rameter, the Preference Order? She could learn one via an elicitation experiment, but how does she find a generic REG algorithm that works for all PNs?
Consider a scene from an experiment where speakers referred to stimuli on a screen . Participants called the man in the top right "the man with the white beard", etc. They might have said "Samuel Eilenberg", yet noone did, because participants didn't know his name. Participants could have been trained to be familiar with every individual's name, but this could easily have primed the use of names at the expense of descriptions; the same happens when names are visible as captions, as was done in (de Oliveira et al., 2015) using fictitious names of geographical areas; see also (Anderson et al., 1991). Such an approach does not give reliable information on how REG algorithms should choose between PNs and descriptions. The problem is not just that PNs are conventional, but that their conventional meaning can be entrenched to different degrees, varying from shortlived "conceptual pacts" (Brennan and Clark, 1996) to names that are very widely known and used.

Lessons from situations where PNs are avoided
Suppose someone asks "Who is Joe Klein?" (cf., section 2.2, point 6). Would it make sense to respond "(He is) the author of the bestselling political novel of the 1990s?" It depends on the importance of this fact and how widely it is known.
To model answers to "Who is?" questions (see (Boër and Lycan, 1986) for a theoretical study), (Kutlak et al., 2013) designed a REG algorithm that employs the following Heuristic: Based on the frequency with which a name n co-occurs with a property P , the Heuristic estimates how likely the proposition P (n) is to be known by an arbitrarily chosen hearer. Evaluation studies suggest that this Heuristic goes a long way towards estimating how many people know a fact, and the complete REG algorithm (which involves 2 other heuristics) outperforms its competitors in terms of its ability to generate descriptions that allow hearers to guess correctly the name of the referent. Although the authors focussed on the WWW, the approach can use any corpus that represents the ideas of a community (e.g., a company's intranet).
This approach suggests a promising handle on the conventionality of PNs. It allows us to estimate, for example, the likelihood that a name like "Joe Klein" is known by hearers to refer to the commentator and novelist of that name, and this would allow us to limit the KB of section 2 to names that are well enough known. We hypothesise that PNs have a higher likelihood of being uttered as part of REs by members of a community (e.g., users of the WWW) the more frequently these PNs occur as names of this referent in documents produced by that community. Further experiments could flesh out how the use of PNs depends on a number of factors, including the Knowledge Heuristic. Essentially, PNs would be treated as properties of a referent that may or may not be known to the hearer, analogous to the descriptive properties of (Kutlak et al., 2013).

Conclusion
We have shown how, given appropriate semantic representations, standard attribute algorithms are able to generate REs that contain PNs, thereby solving problems with the standard 2-step perspective on REG that separates choosing the general syntactic type of RE from more fine-grained decisions about the content of the RE. However, our approach raises difficult questions about the choices that a REG algorithm needs to make between PNs and descriptive REs. We argue that some of the trickiest questions in this area may be solved if large corpora are employed as a source of insight into the degree to which a PN is likely to be known by the recipient of the RE. 34