Multiword expressions and lexicalism: the view from LFG

Multiword expressions (MWEs) pose a problem for lexicalist theories like Lexical Functional Grammar (LFG), since they are prima facie counterexamples to a strong form of the lexical integrity principle, which entails that a lexical item can only be realised as a single, syntactically atomic word. In this paper, I demonstrate some of the problems facing any strongly lexicalist account of MWEs, and argue that the lexical integrity principle must be weakened. I conclude by sketching a formalism which integrates a Tree Adjoining Grammar into the LFG architecture, taking advantage of this relaxation.

This is a very broad definition, covering everything from full-fledged idioms like cut the mustard to mere hackneyed expressions like never tell me the odds.In this paper, my focus is on semantic idiomaticity, this being the prototypical feature of MWEs, but what I say has implications for, and is not incompatible with, the other kinds as well.

LFG
Lexical Functional Grammar (LFG: Kaplan and Bresnan, 1982;Dalrymple, 2001;Bresnan et al., 2015) is a constraint-based, lexicalist approach to the architecture of the grammar.Its primary focus has always been syntax, but with a special interest in the interfaces between this and other components of the grammar, including argument structure (e.g.Kibort, 2007), morphology (e.g.Butt et al., 1996), semantics (e.g.Dalrymple, 1999), information structure (e.g.Dalrymple and Nikolaeva, 2011), and prosody (e.g.Mycock and Lowe, 2013).A syntactic analysis in LFG involves two formally distinct kinds of object: c(onstituent)structure, which is a phrase-structure tree that represents linear order as well as hierarchical relationships like constituency; and f(unctional)-structure, which is an attribute-value matrix that represents more abstract, functional relations like 'subjectof'.The two are connected via the function φ.An example is given in Figure 1.
The correspondence between c-structure and f-structure is controlled via annotations on the tree, provided either by phrase structure rules or the lexical entries themselves.The convention in writing such annotations is to use ↑ and ↓ as metavariables, representing φ( * ) and φ( * ) respectively, where * is a variable representing the c-structure node where an annotation appears, and * represents the mother of that node.Thus, for example, the canonical English subject and object rules can be written as follows: (1) IP → DP These say, essentially, that an IP can be made up of a DP which is the subject of the clause, and an I ′ , while a V ′ can be made up of a V, and a DP which is the object of the clause.I omit the annotations on the tree in Figure 1 for reasons of space, but in principle all nodes are annotated.Finding the f-structure is then a matter of finding the minimal f-structure which satisfies all of the equations.In this way, the f-structure constrains the over-generation of the context-free c-structure, expanding the grammar's expressive power.
LFG subscribes to a strong version of the lexical integrity principle (LIP), namely that [m]orphologically complete words are leaves of the c-structure tree and each leaf corresponds to one and only one c-structure node.(Bresnan, 2001, 93) This means that c-structure leaves are words, and that words are c-structure leaves.The original motivation for LIP was to ensure that syntactic rules should be 'blind' to morphology.But, in its strong version, it works in the other direction too.This facet of LIP is what Ackerman et al. (2011) call the principle of unary expression (PUE): In syntax, a lexeme is uniformly expressed as a single morphophonologically integrated and syntactically atomic word form.
If we think of MWEs as lexemes, then they are clearly a challenge to this principle.But even if we instead claim they are some kind of 'listeme' (Di Sciullo and Williams, 1987), there remains the question of how a single object, be it in 'the list' or the lexicon, can be realised as multiple potentially disjoint word forms in the syntax.MWEs thus remain at least a prima facie challenge to a strongly lexicalist theory.

Lexical ambiguity approaches
For any strongly lexicalist theory which adheres to (at least the spirit of) PUE, the most obvious way to deal with MWEs is via what we might call the lexical ambiguity approach (LA).In such an approach, MWEs are treated as made up of special words which combine to give the appropriate meaning for the whole expression.Words like pull and strings become ambiguous, meaning either pull ′ and strings ′ or exploit ′ and connections ′ , and so the semantic idiomaticity is encoded directly in the lexical entries.This sidesteps the PUE issue, since MWEs are not single lexical items, but rather collections of separate lexical items which conspire to create the overall meaning.For this reason, versions of LA have been popular in various lexicalist theories: see, for instance, Sailer (2000) for HPSG, Arnold (2015) for LFG, Kay et al. (2015) for SBCG, and Lichte and Kallmeyer (2016) for LTAG.However, LA has a large number of shortcomings which mean that it is untenable as a general position.
Although LA seems to naturally explain socalled decomposable idioms, where the meaning of the whole can be distributed across the parts (since this is exactly what the approach does), it is not so clear how it should handle nondecomposable idioms, like kick the bucket, blow off steam, shoot the breeze, etc., where there is no obvious way of breaking down the meaning of the idiom such that its parts correspond to the words that make up the expression.Solutions have been proposed: for instance, Lichte and Kallmeyer (2016) argue for what they call 'idiomatic mirroring', whereby each of the parts of the idiom contributes the meaning of the whole expression, so that kick means die ′ , bucket means die ′ , and, presumably, the means die ′ as well.A similar approach is pursued in work by Sascha Bargmann and Manfred Sailer (Bargmann and Sailer, 2005, in prep.).Both proposals, however, assume a semantics which allows for redundancy, a decision which is crucial for idiomatic mirroring to work.In a strictly resource-sensitive conception of the syntax-semantics interface like LFG+Glue (Dalrymple, 1999;Asudeh, 2012), each contribution to the semantics must contribute something to the meaning, with the result that multiple items cannot contribute the same semantics without a concomitant change in meaning (big, big man means something different from big man, for example).
Without idiomatic mirroring, we are forced to assume that only one of the words in the expression bears the meaning, and that the rest are semantically inert.For example, perhaps there is a kick id which means die ′ , and selects for special semantically inert forms the id and bucket id .Notice, however, that the choice of where to lo-cate the meaning is ultimately arbitrary.We may as well have bucket id meaning die ′ , or even the id , provided they select for the other inert forms and then pass their meaning up to the whole VP.Such arbitrariness seems undesirable.
It also leads to another formal issue: we now face an explosive proliferation of semantically inert forms throughout the lexicon.What is more, each of these must be restricted so that it does not appear outside of the appropriate expression.But this means that the the in kick the bucket can't be the same the as in shoot the breeze.We need as many thes as there are expressions which include it.Instead of having to expand the lexicon by as many entries as there are MWEs, we have to expand it by as many entries as there are words in MWEs, which is much less appealing, and smacks of redundancy.
One empirical issue facing LA relates to the psycholinguistic findings on processing.Swinney and Cutler (1979) showed that idioms are processed in the same way as regular compositional expressions; i.e. there is no special 'idiom mode' of comprehension.At the same time, others have found that idiomatic meanings are processed faster and in preference to literal ones (Estill and Kemper, 1982;Gibbs, 1986;Cronk, 1992).If both these things are true, then LA is in trouble: in this approach, there is no reason to think idioms or other MWEs should be processed any faster; if anything, we might expect them to be slower, since they involve ambiguity by definition.
Rather, the psycholinguistic findings plead for what seems intuitively appealing anyway: that MWEs are inserted en bloc, being stored in the lexicon as units.But this requires there to be objects in the lexicon which are larger than single words, defined as terminal nodes in a tree, which necessitates abandoning PUE.

TAG-LFG
Really, we want to be able to extend the domain of the lexical entry so that it can naturally include MWEs.This can be readily achieved in Lexicalised Tree Adjoining Grammar (LTAG: Joshi et al., 1975), which has successfully been used to analyse MWEs in the past (e.g.Abeillé, 1995).
One of the key strengths of any TAG-based approach is its extended domain of locality.Since the operation of adjunction allows trees to grow 'from the inside out', as it were, relationships can be encoded locally even when the elements involved may end up arbitrarily far apart.This is precisely the situation which obtains with idioms and other MWEs which allow for syntactic flexibility.
What is more, a TAG-based approach where MWEs are multiply-anchored trees (that is, trees with more than one terminal node specified in the lexicon, so that they contain more than one word) aligns with the psycholinguistic findings.A parse involving a MWE will involve fewer elementary trees: for example, in a parse of John kicked the bucket, instead of the four trees for John, kicked, the, and bucket, it will just involve the two for John and kicked the bucket, explaining the increased processing speed (Abeillé, 1995).
However, I am not advocating that LFG practitioners should abandon LFG in favour of LTAG.Space precludes a full defence of the virtues of LFG here, but I believe it possesses a number of advantageous features we should like to retain.Firstly, there is the separation of abstract grammatical information from the constituency-based syntactic tree.A detailed and dedicated level of representation for functional information is motivated by the fact that it is important in grammatical description and not necessarily determined by phrase structure.For example, functional information is relevant in terms of describing binding domains (Dalrymple, 1993), or for phenomena related to the functional/accessibility hierarchy (Keenan and Comrie, 1977), or in describing argument alternation (Bresnan, 1982).
Secondly, LFG has grown beyond just c-and f-structure, and now has a well-developed grammatical architecture encompassing many different levels of representation, from phonological, to morphological, to semantic and information structure, and the relations and constraints that exist between them.This correspondence architecture (on which see Asudeh, 2012, 49-54) is a powerful tool for describing the multi-modal phenomenon that is natural language, and something we would like to preserve.
With this in mind, then, what we should like to do is to incorporate the advantages of the TAGstyle extended domain of locality into the preexisting LFG architecture.The most obvious way to do this is to replace the context-free grammar of LFG's c-structure with a TAG instead. 1 Let us call this variant TAG-LFG.In the rest of this section I will sketch its key features.
The first thing to note is that such a move does not increase the expressive power of LFG.Of course, a TAG is mildly context sensitive, which is more powerful than the context-free grammar of LFG's c-structure.However, LFG is not just c-structure, and the presence of f-structure already pushes LFGs beyond the mildly context-sensitive space (Berwick, 1982).Thus, although we are empowering a part of the formalism, we are not increasing the power of the formalism as a whole.
Since c-structure nodes in LFG can be 'decorated' with functional information, another concern is how to handle these during substitution and adjunction.Substitution is straightforward: since no elementary tree will be annotated on its root node, we simply retain the annotation on the substitution target.For adjunction, feature-based TAG standardly makes use of top and bottom features (Vijay-Shanker, 1987).Since in TAG-LFG we are unifying features from the whole tree in one place, the f-structure, rather than locally on each node, we do not need to separate annotations in the same way.Instead, at the top of the adjunction structure, annotations are retained from the target, while at the bottom, they are retained from the foot of the auxiliary tree.This is equivalent to seeing adjunction as two instances of substitution following a dividing up of the tree; in each case the target retains its annotations.
Let us now turn to the question of representation.In standard LFG, a lexical entry is a triple (W, C, F ), where W is a word form, i.e. the terminal node in the phrase-structure tree, C is a c-structure category, i.e. the pre-terminal node, and F is a functional description, i.e. a set of expressions spelling out additional linguistic information via the correspondence architecture.In TAG-LFG, a lexical entry is instead a triple ( W , T, F ), consisting of a list of word forms, a tree, provided by some metagrammar (Crabbé et al., 2013), and a functional description.A simple example for a non-decomposable idiom is given in down).See also Clément and Kinyon (2003) for a proposal to generate both LFG and TAG grammars from the same set of linguistic descriptions (encoded in a metagrammar).
A reviewer points out potential similarities with LFG-DOP (Bod and Kaplan, 1998; see also references in Arnold and Linardaki, 2007), which combines LFG with Data-Oriented Parsing (Bod, 1992).This also makes use of tree fragments, but it still relies on a lexicon stated in terms of context-free rules to generate these fragments, and thus is still reliant on a version of LA to encode MWEs in the lexicon.

W
= kicked, the, bucket The word forms occur as a list because the trees for MWEs will be multiply anchored.For regular lexical entries, this list will be a singleton.
The lexical anchors, marked with ♦s, are numbered according to the list index of the word form that is to be inserted there.The functional description remains the same, although it now allows reference to more remote nodes, and so instead of ↑ or ↓ I use node labels as a shorthand for the nodes in question. 3 ,4 In Figure 2, I have given the semantics in the form of a meaning constructor.This is an object used in Glue Semantics, the theory of the syntaxsemantics interface most often coupled with LFG (Dalrymple, 1999;Asudeh, 2012).It consists, on the left-hand side, of a formula in some 'meaning language', in this case a lambda expression, and, on the right-hand side, of an expression in linear logic (Girard, 1987) over s(emantic)-structures (projected from f-structures via the σ function), which controls composition.In this case, it says that the meaning of the whole sentence is obtained by applying λx.die(x) to the meaning of the sentence's subject.
By associating the meaning constructor not with any particular node in the tree, but with the tree as a whole, via the lexical entry, we avoid the arbitrariness of having to choose one word to host the meaning.It remains possible to represent decomposable idioms, too, since we can simply include multiple meaning constructors in the f-description, separating out the meaning and referring to the relevant parts as required.Figure 3 gives an example for pull strings.In this case, two meaning constructors are present, one for each of the decomposable 'parts' of the idiomatic meaning. 5This allows for internal modification of strings, for example (e.g.pull family strings).
The varying syntactic flexibility of MWEs can be accounted for by the standard TAG approach of associating each lexeme with different families of elementary trees.For example, assuming a more abstract level of lexemic entry, which is used to generate the set of lexical entries associated with each lexeme (or listeme) (Dalrymple, 2015), we can simply say that the lexemic entry for kick the bucket is associated with only the active voice tree, while that for pull strings is associated with many others, including trees for wh-questions, passive, and relative clauses.This results in different sets of potential lexical entries for each expression, and thus different potential syntactic configurations.
One other notable property of idioms is that the words they contain are morphologically related to independently existing words: for example, kick in kick the bucket inflects like a regular English verb (such as literal kick), while come in come a cropper inflects irregularly in just the same way as literal come (e.g. he came a cropper).Space precludes a full treatment of this here, but it is straightforward enough to implement, for example by having the idiomatic lexemic entry select its word forms from the 'form' paradigms of existing lexemes (Stump, 2001(Stump, , 2002)).Note that such a relationship, whereby parts of a lexical entry draw from the morphological paradigm of independent words, is not unique to MWEs: for example, the lexeme UNDERSTAND is, in terms of inflection, made up of UNDER+STAND, where the second part is identical in inflectional terms to the independent verb STAND, e.g. it shares the irregular past tense form, as in understood.Thus, such a mechanism is needed independently of the present proposal, and its extension to TAG-LFG should not pose any undue problems.

Conclusion
Strongly lexicalist theories which subscribe to the principle of unary expression cannot deal with MWEs.They are forced to adopt some version of the lexical ambiguity approach, which ultimately fails both formally and empirically.Once we abandon PUE, the question then open to us is how to represent MWEs at the interface between the lexicon and syntax.A formalism like (L)TAG offers an elegant and well-tested means of doing just this.And with minimal modifications, and no increase in generative power, it can be integrated into the LFG architecture.

Figure 1 :
Figure 1: C-structure and f-structure for The cat is yawning.

Figure 2 :
Figure 2: TAG-LFG lexical entry for kicked the bucket