Discontinuous Genitives in Hindi/Urdu

This paper discusses genitive phrases in Hindi/Urdu in general and puts a particular focus on genitive scrambling, a process whereby the basic order of constituents is changed. In Hindi/Urdu, genitive phrases may not only occur at different structural positions within the NP that they modify; under the right circumstances, they can also be found outside of the NP, yielding discontinuous structures. The theoretical challenge is to identify and formalize the linguistic constraints that govern genitive scrambling. Further, a successful computational treatment correctly attaches the genitive phrase to its head NP. I use a Lexical-Functional Grammar to solve both challenges, demonstrating that the constraints can be aptly formulated using a functional uncertainty path. Successful attachment further depends on the morphological agreement of the genitive phrase with its head. On a theoretical level, the present contribution sheds light on the possibilities of NP discontinu-ities in a morphologically rich language like Hindi/Urdu.


Introduction
Discontinuous constituents offer particular challenges for various NLP applications, such as question-answering, coreference resolution or topic modeling. This paper relates to an application that is further up the NLP toolchain: syntactic parsing. Here, the main challenges lie in: • adapting the parser to be able to process the discontinuous structures; • reconstruct the dependencies in the analysis, i.e., attach the discontinuous parts to their syntactic heads.
Third, from a theoretical linguistic point of view, one would also want to derive generalizations about what kinds of discontinuities are possible, and what kinds do not appear. Depending on the language studied, investigating such constraints is helpful since they can provide cross-linguistic insight into the phenomenon of discontinuity, and why it can or cannot take place. This paper presents a study of discontinuous NPs in the morphologically-rich South Asian language Hindi/Urdu. 1 The focus is on genitive NP modifiers, which display a large deal of discontinuity. As will be seen below, in the right configurations, they may be scrambled out of their NP domain, removing them from the heads that they modify. Neither the phenomenon itself nor the configurations that allow for it have been previously discussed in the literature.
The paper contributes to solving all three of the above challenges. It discusses the empirical properties of the Hindi/Urdu genitive in general as well as genitive discontinuity, investigated by collecting data from native speakers and searching the Hindi/Urdu Treebank (Bhatt et al., 2009) ( §2,3,4). I arrive at a couple of theoretical generalizations, which can be aptly formulated via functional uncertainty within the framework of Lexical-Functional Grammar (LFG, Dalrymple (2001)). I suggest that the possibility of the genitive to appear outside its NP is a result of the rich agreement between the genitive case marker and the NP head. Finally, I describe how the Hindi/Urdu ParGram grammar (Butt and King, 2007;Bögel et al., 2009), a computational LFG grammar developed as part of the Par-Gram project (Sulger et al., 2013;Butt et al., 2002) and implemented in XLE (Crouch et al., 2015), is adapted to parse and correctly attach discontinuous genitives to their NPs ( §5). 2 The paper concludes in §6.

General Description
The genitive case in Hindi/Urdu is realized using the clitic k-, which is attached to a possessor NP. Under the analysis of Hindi/Urdu case in Butt and King (2004), which I adapt here, all case clitics functionally head a KP (case phrase). 3 The genitive differs from other case clitics: it agrees in number, gender and morphological form (nominative or oblique) with the head noun, the possessum. For the feminine, there is morphological syncretism in that a single form ki is used throughout the feminine inflectional pattern. For the masculine, there is syncretism between the singular oblique and plural nominative and oblique. Within NPs, the modifying possessor phrase comes first, then the possessum (i.e., the head of the NP); this conforms to the general clausal word order in Hindi/Urdu, which is head-final (Mohanan, 1994;Butt, 1995). The position of the genitive phrase varies with respect to other NP modifiers, such as adjectives or quantifiers; see (4) for an example. NP modifiers occurring after the NP head are judged as ungrammatical by the informants; see (4c) for an example. Another example illustrating the variable word order inside the NP is shown in (5).  The constraint that NP modifiers have to precede their head inside the NP is corroborated by data such as in (6b) (a permutation of (6a)). Here, the genitive occurs after the NP head, bet . e 'sons', which is itself marked with the ergative case. The fact that (6b) is ungrammatical is a clear indication that the genitive phrases cannot be right-adjoined to the NP head.

Genitive Scrambling
In addition to the variable word order inside NPs, there are examples showing that the genitive modifiers can occur outside of the NPs they modify. I will refer to this as Genitive Scrambling. In (8a), the genitive occurs in the canonical position inside the NP to the left of the head noun. In (8b), the genitive is scrambled outside of the subject NP to the end of the clause; still, it must be analyzed as a modifier of the head noun dost 'friend', since it cannot be argued to be an argument of the intransitive verb a 'come'.  (Butt and Zinsmeister, 2009) In (9a), the object gar . i 'car' is modified by the genitive Us=ki 'her/his/its'. The genitive can be scrambled out of the object to the beginning of the clause as in (9b). From the morphosyntax, it is clear that in (9b) the feminine-inflected Us=ki 'her/his/its' modifies gar . i 'car', since that is the only feminine nominal in the sentence. A very similar example is in (10). Genitives may also be scrambled to the right. In (11a), a permutation of (9a), the object is topicalized to the front of the clause. In (11b), the genitive phrase modifying the object is scrambled to the right and occurs after the subject. A similar example is given in (12), where kIs=ki 'whose' modifies kItab 'book', but is not in the same constituent.  (Bögel and Butt (2013), p. 301) Recall that the order within NPs is head-final. As seen in (11)-(12), however, when genitives are scrambled outside of their NP, this order is not necessarily preserved. Using the terminology of Fanselow and Féry (2006), I refer to scrambled genitives that occur before their heads in the sentence as non-inverted scrambled genitives, and to scrambled genitives that occur after their heads as inverted scrambled genitives.
It is a reasonable assumption that scrambling of genitive phrases is possible since the genitive displays rich morphology which agrees with its head, enabling speakers to identify the nominal in the sentence modified by the genitive. Fanselow and Féry (2006) identify agreement inside NPs as a main factor influencing the availability of discontinuous NPs across languages, but there are also counterexamples against this generalization; Turkish, for example, has discontinuous NPs, in spite of the absence of agreement inside nominal projections.

Some Preferences and Constraints
The operation of genitive scrambling does not occur without constraints. This section sums up these constraints, which serve as the empirical background for the XLE implementation of genitive scrambling as described in §5. Each of the constraints was verified by intensive consultation with at least three native speakers.

Local Attachments are Preferred
Consider (13a), which involves a topicalized object. The possessor of that object can be scrambled to the right as in (13b). In cases such as (13b), Us=ki is either a scrambled genitive modifying gar . i 'car' or a canonical genitive locally attached to bag 'park'; the agreement morphology does not rule out either. Where the agreement morphology permits both scrambled as well as locally attached genitives, local attachments are highly preferred. Here, informants judge Us=ki 'his/her' as modifying bag 'park', but acknowledge that it may also modify gar . i 'car'. The preference for local attachment is reflected in a principle well-known from cognitive science, first discussed by Kimball (1973) as the Right Association principle, and reformulated by Gibson (1991) as the Recency Preference.

Scrambling and Case
The examples above involve genitives that are scrambled out of bare NPs. Genitives may also be scrambled out of NPs that are overtly casemarked; in this case, inverted scrambled genitives are ungrammatical, and the genitive has to precede its head in the clause. Examples are shown in (14). In both sentences, ram=ke 'Ram's' modifies bAcco=ne 'children=ERG', but since the latter is ergative-marked, the former has to precede it. A similar example involving a genitive scrambled from an overtly-marked object NP is given in (15) Recall that genitive KPs modifying nominals in overtly case-marked KPs need to have oblique nominal morphology. One might assume, then, that examples such as (15b) are bad simply because there are several options for the genitive KP to modify a nominal, given the high amount of syncretism in genitive case marking for the oblique; e.g., in (15b) the genitive could modify both bAcco and kUt . t . e. (16) shows that this cannot be the issue. Here, the genitive can modify both nominals, being in linear precedence to both of them; cf. also (14b), which is ungrammatical, even though the agreement morphology clearly rules out any other possibilities of modification aside of bAcco.

Scrambling from Complement Clauses
Another constraint concerns complement clauses. None of my informants judge possessors scrambled out of finite complement clauses as grammatical; cf. the ungrammatical examples in (17). However, a majority of my informants indicate that it is grammatical to scramble genitive phrases from within non-finite complement clauses, e.g., the clause headed by the modal verb sAk 'can' in (18). This is in line with the findings by Mahajan (1990), Kidwai (1999) as well as Kidwai (2000), who state that scrambling of arguments from within finite complement clauses is generally not accepted, whereas scrambling from infinite complement clauses is.

No Scrambling out of Adjuncts
The third constraint concerning genitive scrambling is that genitive KPs may not be scrambled from within adjuncts. In (19a), Us=ki 'her/his/its' is a genitive phrase modifying bag 'park', which itself is locative case-marked and an adjunct to the overall clause. It is found that the possessor may not be scrambled from its NP to any other position in the clause (19b-c Island behavior, i.e., the unavailability of constituents for movement/scrambling, is symptomatic for clausal adjuncts and is well-known throughout the literature, first discussed by Ross (1967). It is also a well-known diagnostic for distinguishing arguments from adjuncts, as discussed by, e.g., Needham and Toivonen (2011) in an LFG setting.

No Scrambling from Deep Within
The last constraint to be discussed here indicates that it is not possible to scramble genitive phrases that are selected by nominals further down a path of grammatical functions. Consider the examples in (20a). SOhAr 'husband' is modified by a genitive SUBJ orAt=ke 'the woman's'. SOhAr=ki, in turn, is an extrinsic possessor SUBJ modifying the overall object of the clause, gar . i 'car'. The structure is as indicated by the bracketing in (20b). In the similar example (21), sUrx rAng=ke 'of red color' is an AD-JUNCT modifying mAkan 'house'.  In (22a-b), orAt=ke 'the woman's', the SUBJ genitive KP modifying SOhAr 'husband', cannot appear outside of the NP it is embedded in, i.e., outside the NP headed by gar . i 'car', since it is embedded too far down in that NP, its GF path being (↑ OBJ SUBJ SUBJ) (starting from the main clause). (23a-b) show that the same restriction holds for attributive genitives such as sUrx rAng=ke 'of red color', which has the path (↑ OBJ SUBJ ADJUNCT) here.

XLE Implementation
This section describes the implementation of the Hindi/Urdu genitive as well as its scrambling properties and resulting discontinuities. The implementation uses the XLE grammar development platform, which includes an industrial-strength parser and generator for LFG grammars (Crouch et al., 2015).

General Setup
The lexical entry for the feminine genitive case marker ki is given in (24). Recall the agreement pattern of the genitive case marker in Table 1; in XLE, constraining equations can account for the requirements concerning gender, number as well as morphological form. In (24), the constraints are in the form of inside-out constraining equations, since the genitive KP may either be embedded in a SUBJ, ADJUNCT or in an OBJ f-structure inside the head noun's f-structure. The last line in (24) states that the case marker needs to be inside an f-structure that has the feature NTYPE; this ensure that the genitive only occurs as a nominal case (i.e., not on verbal arguments/adjuncts). The XLE grammar rules in (25) construct the KP and NP. (25a) states that the KP consists of an NP and an optional case marker K. (25b) states that an NP may consist of a simple pronoun or a modified noun (Nadj). In (25b), the use of the shuffle operator (,), separating the KP, AP and N nodes ensures that each of these nodes may occur in any order, thereby allowing for different word orders inside the NP. The annotation ! <hˆ(making use of the head precedence operator <h) indicates that the currently annotated c-structure node (here: KP or AP) has to precede the c-structure node of the higher-level f-structure, modeling the fact that genitives and other NP modifiers have to precede their heads. Sample c-and f-structures for (2a)

Generalizing and Implementing Genitive Scrambling
The genitive scrambling facts can be formalized via a functional uncertainty path as in (26). 6 The expression is matched by a variety of paths, e.g., SUBJ, 5 The rules in the grammar are more complicated than shown here; e.g., the Nadj rule includes further nodes such as quantifiers, demonstratives etc. The scheme used by the Hindi/Urdu ParGram grammar for transliterating the Urdu Arabic script is described in Malik et al. (2010) as well as Bögel (2012). 6 Documentation for the implementation of functional uncertainty in XLE is at http://ling.uni-konstanz.de/ pages/xle/doc/notations.html#N4.1.5. OBJ, XCOMP SUBJ, etc. (XCOMP is the grammatical function used for non-finite complement clauses). Thus, (26) describes exactly those paths that scrambled genitives may be extracted from; it does not allow for genitives scrambled from adjuncts, finite complement clauses (which are inside the COMP GF) or from deeper GF paths (e.g., OBJ SUBJ). (27) is the XLE rule template that adds scrambled genitive KPs to the c-structure tree. Functionally, they are annotated as subjects, objects or adjuncts (lines 6-8) inside a path variable instantiated from KP-SCRAMBLE-PATH (line 2). Lines 3-5 check the case feature of the head noun; it is either nominative (i.e., a bear NP), in which case there is no precedence constraint, or it is not nominative (i.e., it is overtly case-marked), in which case the genitive is required to precede its head (again implemented using head precedence, see above). Finally, line 9 adds an O(ptimality)T(heory) mark to the scrambled genitive, called attach, which marks the analysis as non-optimal when it is in direct competition with a local attachment analysis (which does not carry the OT mark).

Testsuite Creation
To perform regression tests on the implementation, a separate testsuite file was created with examples of vanilla genitives as well as instances of genitive scrambling. The testsuite currently includes 36 grammatical and ungrammatical examples, each between two and eight words long, and has been manually constructed in close collaboration with the native speakers. All grammatical sentences are parsed successfully, while all ungrammatical sentences are ruled out. 7 Given the ambiguity of the genitive discussed in Section 2, all sentences yield ambiguous parse results. As an example, reconsider (13b). The sentence is part of the testsuite and yields two optimal as well as two unoptimal solutions. Under the two optimal readings, Us=ki 'his/her' locally modifies bag 'park' as a subject or an adjunct; under the two unoptimal readings, Us=ki 'his/her' is a scrambled genitive subject or adjunct modifying gar . i 'car'.
XLE does not display unoptimal solutions by default; the developer/annotator can select the unoptimal solution(s) by clicking the OT mark that controls the (dis)preference. Figure 3 shows the optimal solution where Us=ki 'his/her' is a subject, while Figure 4 shows the corresponding unoptimal solution, i.e., the scrambled genitive analysis. 8

Summary
The paper describes Hindi/Urdu genitives in general and its scrambling properties in particular. I take a detailed look at the empirical distribution of this phenomenon, including its syntactic constraints, and formulate a generalization using LFG. The generalization is implemented in the Hindi/Urdu ParGram grammar using XLE. 8 The c-structures are not shown here due to space limitations. In the c-structure corresponding to the f-structure in Figure 3, the genitive attaches below the NP headed by bag 'park', while in the c-structure for Figure 4, the genitive attaches to the clausal node, resulting in a flat structure.  Future theoretical work includes a comparison with other morphologically-rich languages. An initial investigation has shown that scrambling data in Turkish, as discussed by e.g. Kornfilt (2003), are similar, but display a constraint called the "barrier constraint" by Chomsky (1986), which rules out possessors that occur directly right-adjoined to arguments; the constraint does not exist in Hindi/Urdu. Since ParGram includes a Turkish grammar (Ç etinoglu, 2009), a comparison of the annotations necessary to cover the genitive scrambling facts would be interesting.