Efficient Computation of Implicational Universals in Constraint-Based Phonology Through the Hyperplane Separation Theorem

This paper focuses on the most basic implicational universals in phonological theory, called T-orders after Anttila and Andrus (2006). It develops necessary and sufficient constraint characterizations of T-orders within Harmonic Grammar and Optimality Theory. These conditions rest on the rich convex geometry underlying these frameworks. They are phonologically intuitive and have significant algorithmic implications.


Introduction
A typology T is a collection of grammars G 1 , G 2 , . . . For instance, T could be the set of syntactic grammars corresponding to all possible combinations of values of a set of parameters (Chomsky, 1981). Or the set of phonological grammars corresponding to all possible orderings of an underlying set of phonological rules (Chomsky and Halle 1968). Or the set of grammars corresponding to all rankings of an underlying constraint set (Prince and Smolensky, 2004).
The structure induced by a typology T can be investigated though its implicational universals of the form (1). This implication holds provided every grammar in the typology T that satisfies the antecedent property P also satisfies the consequent property P (Greenberg 1963).
To illustrate, suppose that T is the typology of syntactic grammars. Consider the antecedent property P of having VSO as the basic word order. And the consequent property P of having prepositions (as opposed to postpositions). In this case, (1) is Greenberg's implicational universal #3. In this paper, we are interested in typologies of phonological grammars. We assume a rep-resentational framework which distinguishes between two representational levels: underlying representations (URs), denoted as x, x, . . . ; and surface representations (SRs), denoted as y, y, . . . or z, z, . . . . A phonological grammar G is a function which takes a UR x and returns a SR y. For instance, the phonology of German maps the UR x = /bE:d/ to the SR y = [bE:t] ('bath'). A phonological typology T is a collection of phonological grammars G 1 , G 2 , . . . that we assume are all defined over the same set of URs (Richness of the Base assumption; Prince and Smolensky 2004).
Since phonological grammars are functions from URs to SRs, the most basic or atomic antecedent property P of an implicational universal (1) is the property of mapping a certain UR x to a certain SR y. Analogously, the most basic consequent property P is the property of mapping a certain UR x to a certain SR y. We thus focus on implicational universals of the form (2). This implication holds provided every grammar in the typology T that succeeds on the antecedent mapping (i.e., it maps the antecedent UR x to the antecedent SR y), also succeeds on the consequent mapping (i.e., it also maps the consequent UR x to the consequent SR y). This definition makes sense because every grammar in the typology T is defined on every UR, so that every grammar can be applied to the two URs x and x.
The relation T → thus defined over mappings turns out to be a partial order (under mild additional assumptions). It is called the T-order induced by the typology T (Anttila and Andrus, 2006).
A familiar example concerns coda cluster simplification in English. Suppose that a coda t/d deletes before vowels in a certain dialect, so that the UR /cost us/ is realized as the SR [cos' us].
Then the coda also deletes before consonants in that same dialect, so that the UR /cost me/ is realized as the SR [cos' me] (Guy, 1991;Kiparsky, 1993;Coetzee, 2004). In other words, the implication (/tV/, [V]) T → (/tC/, [C]) holds relative to the typology T of English dialects.
Two important phonological frameworks explored in the literature are Harmonic Grammar (HG; Legendre et al., 1990;Smolensky and Legendre, 2006;Potts et al., 2010) and Optimality Theory (OT; Prince and Smolensky, 2004). The crucial idea shared by HG and OT is that the relevant properties of phonological mappings are extracted by a set of n phonological constraints that effectively represent discrete phonological mappings as points of R n . The goal of this paper is to express an implication (x, y) → ( x, y) in HG and OT in terms of the constraint violations of the two mappings (x, y) and ( x, y) and their competitors.
Section 2 presents the constraint condition for HG T-orders. It rests on the rich geometry underlying HG, as it follows from a classical result of convex geometry (the Hyperplane Separation Theorem), as detailed in section 3. Section 4 presents the constraint condition for OT T-orders. It rests on an equivalence between OT and HG Torders established in section 5.
These constraint conditions admit a straightforward interpretation and thus help us better understand the phonological import of T-orders. Furthermore, they allow us to compute T-orders efficiently, circumventing the laborious computation of the entire HG or OT typology (as it is currently done in the literature; see for instance the OT Torder Generator by Anttila and Andrus, 2006).

Constraint Conditions for HG T-orders
HG assumes a relation Gen which pairs each UR x with a set Gen(x) of candidate SRs. It also assumes a set of n phonological constraints C 1 , . . . , C n . Each constraint C k takes a phonological mapping (x, y) of a UR x and a candidate SR y in Gen(x) and returns the corresponding number of violations C k (x, y) ∈ N, a nonnegative integer which quantifies the "badness" of that mapping (x, y) from the phonological perspective encoded by that constraint C k . A weight vector w = (w 1 , . . . , w n ) ∈ R n + assigns a nonnegative weight w k ≥ 0 to each constraint C k .
The w-harmony of a mapping (x, y) is the weighted sum of the constraint violations multi-plied by −1, namely − n k=1 w k C k (x, y). Because of the minus sign, mappings with a large harmony have few constraint violations. The HG grammar corresponding to a weight vector w maps a UR x to the candidate SR y in Gen(x) such that the mapping (x, y) has a larger wharmony than the mapping (x, z) corresponding to any other candidate z in Gen(x) (Legendre et al., 1990;Smolensky and Legendre, 2006;Potts et al., 2010). The HG typology (relative to a candidate relation and a constraint set) consists of the HG grammars corresponding to all weight vectors.
We denote by (x, y) HG −→ ( x, y) the implication between an antecedent mapping (x, y) and a consequent mapping ( x, y) relative to the HG typology. We assume that the antecedent UR x comes with only a finite number m of antecedent loser candidates z 1 , . . . , z m besides the antecedent winner candidate y. Analogously, we assume that the consequent UR x comes with only a finite number m of consequent loser candidates z 1 , . . . , z m besides the consequent winner candidate y. This assumption is nonrestrictive. In fact, a UR admits only a finite number of HG optimal candidates (Magri, 2018). Candidate sets can thus be assumed to be finite without loss of generality.
For each antecedent loser z i , we define the antecedent difference vector C(x, y, z i ) as in (3). It has a component for each constraint C k defined as the violation difference C k (x, y, z i ) between the number C k (x, z i ) of violations assigned by C k to the loser mapping (x, z i ) minus the number C k (x, y) of violations assigned to the antecedent winner mapping (x, y).
The consequent difference vector C( x, y, z j ) is defined analogously, as pitting the consequent winner mapping ( x, y) against one of its losers ( x, z j ).
The definition of the HG implication (x, y) HG → ( x, y) requires every HG grammar which succeeds on the antecedent mapping to also succeed on the consequent mapping. This condition is trivially satisfied if no HG grammar succeeds on the antecedent mapping, namely the mapping (x, y) is HG unfeasible. Thus, let's suppose that is not the case. The following proposition then provides a complete (both necessary and sufficient) characterization of the HG implication (x, y) HG → ( x, y) in terms of condition (4) stated entirely in terms of antecedent and consequent difference vectors.
Proposition 1 If the antecedent mapping (x, y) is HG feasible, the HG implication (x, y) HG → ( x, y) holds if and only if for every consequent loser candidate z j with j = 1, . . . , m, there exist m nonnegative coefficients λ 1 , . . . , λ m ≥ 0 (one for each antecedent loser candidate z 1 , . . . , z m ) such that and furthermore at least one of these coefficients λ 1 , . . . , λ m is different from zero. 2 Proposition 1 admits the following phonological interpretation. Condition (4) says that each consequent loser z j violates the constraints at least as much as (some conic combination of) the antecedent losers z 1 , . . . , z m . In other words, the consequent losers are "worse" than the antecedent losers. The consequent winner y thus has an "easier" time beating its losers than the antecedent winner y, as required by the definition of T-order.
Proposition 1 has important algorithmic implications. In fact, checking the definition of Torder (in general, of any implicational universal) directly is costly, because it requires computing the entire typology, which can be large. But proposition 1 says that, in the case of HG, T-orders can be determined locally, by only looking at the antecedent and consequent mappings together with their losers. Indeed, this proposition effectively reduces the problem of computing HG T-orders to the problem of finding coefficients λ i which satisfy the inequality (4). The latter is a polyhedral feasibility problem that can be solved efficiently with standard linear programming technology. A Python package to compute HG T-orders using condition (4) will be released shortly.
0 1 1 1 0 Proposition 1 admits the following geometric interpretation. Suppose there are only n = 2 constraints and m = 4 antecedent difference vectors is the convex cone generated by these antecedent difference vectors, depicted in dark gray in figure  1a. The region in light gray singles out the points which are at least as large (component by component) as some point in this cone. Condition (4) thus says that each consequent difference vector C( x, y, z j ) must belong to this light gray region.
Indeed, suppose that some consequent difference vector does not belong to this light gray region, as represented by the white dot in figure 1b. The dashed line leaves the antecedent difference vectors (black dots) and the consequent difference vector (white dot) on two different sides. This means that the HG grammar corresponding to a nonnegative weight vector orthogonal to this line succeeds on the antecedent mapping (x, y) but it fails on the consequent mapping ( x, y), defying the implication (x, y) HG → ( x, y). The existence of (a weight vector corresponding to) a dashed line such as the one depicted in figure 1b is geometrically obvious in the case with only n = 2 constraints. For an arbitrary number n of constraints, a fundamental result of convex geometry, the Hyperplane Separation Theorem (HST; Rockafellar, 1970, §11;Boyd and Vandenberghe, 2004, §2.5), indeed guarantees the existence of a weight vector which separates the cone generated by the antecedent difference vectors from the outlier consequent difference vector. This is the core of the proof of proposition 1 provided in section 3.
Let's finally look at a couple of examples (based on Bane and Riggle 2009). We assume n = 5 con- straints: ONSET, which penalizes surface syllables starting with a vowel (V); NOCODA, which penalizes surface syllables ending with a consonant (C); MAX, which penalizes deletion of underlying segments; and DEPV and DEPC, which penalize epenthetic vowels and consonants, respectively. We focus on the two URs /CC/ and /CCC/. We only consider their non-harmonically bounded candidates, listed in table 1 with their constraint violations (the candidate [CVC.CV]  There are therefore three consequent difference vectors C( x, y, z j ), which appear on the left hand side of the three inequalities in table 2. Condition (4) holds: each consequent difference vector is at least as large as a conic combination of the antecedent difference vectors, as shown in table 2. Proposition 1 thus establishes the HG implication (CC, CV.CV) HG → (CCC, CV.CV.CV). Proposition 1 can also be used to show that an implication fails in HG. To illustrate, we focus on the implication (CC, CVC) → (CCC, CV.CVC). We consider the consequent difference vector C(/CCC/, [CV.CVC], [null]), which appears on the left hand side of (5).
Condition (4) fails: the consequent difference vector is not larger than any conic combination of the two antecedent difference vectors, no matter the choice of the coefficients λ 1 , λ 2 ≥ 0. In fact, the inequality (5) for DEPV requires λ 1 ≥ 2, whereby the inequality fails for MAX. Proposition 1 thus establishes that the implication (CC, CVC) HG → (CCC, CV.CVC) fails in HG.

Proof of Proposition 1
The HST has a number of algebraic consequences known as theorems of the alternatives. 1 One of these theorems is the Motzkin Transposition Theorem (MTT; Bertsekas, 2009, proposition 5.6.2), which is particularly suited to our needs. It states that conditions (C1) and (C2) below are mutually exclusive (one and only one of them holds) for any two matrices A ∈ R p×n and B ∈ R q×n .
(C1) There exists a vector w ∈ R n such that Aw < 0 and Bw ≤ 0.
(C2) There exist two nonnegative vectors ξ ∈ R q + and µ ∈ R p + with µ = 0 such that It is useful to specialize the MTT as follows. Consider some vectors a 1 , . . . , a m , b ∈ R n . Let A be the matrix whose p = m rows are −a T 1 , . . . , −a T m . Let B be the matrix whose q = n+1 rows are −e T 1 , . . . , −e T n , b T (where e i ∈ R n has all components equal to 0 but for the ith component which is equal to 1). The two conditions (C1) and (C2) thus become (C1 ) and (C2 ).
(C2 ) There exist some nonnegative coefficients µ 1 , . . . , µ m , ξ ≥ 0 with at least one of the coefficients µ 1 , . . . , µ m different from 0 With these preliminaries in place, we now consider the HG implication (x, y) HG → ( x, y). Suppose that the HG grammar corresponding to some nonnegative weight vector w ∈ R n + succeeds on the antecedent mapping (x, y). This means that the wharmony of this mapping (x, y) is larger than that of every antecedent loser mapping (x, z i ). This condition can be stated in terms of the antecedent difference vectors as in (6), taking advantage of the linearity of the HG harmony.
The implication(x, y) HG → ( x, y) then requires the HG grammar corresponding to that weight vector w to also succeed on the consequent mapping ( x, y). This means that the w-harmony of this mapping ( x, y) is larger than that of every consequent loser mapping ( x, z j ). This condition can be stated in terms of the consequent difference vectors as in (7).
In other words, the HG implication (x, y) HG → ( x, y) holds if and only if every nonnegative weight vector w which satisfies (6) also satisfies (7). Equivalently, the HG T-order holds if and only if for every j = 1, . . . , m, it is false that there exists a nonnegative weight vector w ∈ R n + such that C(x, y, z i ) T w > 0 for every i = 1, . . . , m but C( x, y, z j ) T w ≤ 0. In other words, for every j = 1, . . . , m, condition (C1 ) is false, with the positions a i = C(x, y, z i ) and b = C( x, y, z j ). By the MTT, condition (C2 ) must therefore be true for every j = 1, . . . , m. This means that there exist some non-negative coefficients µ 1 , . . . , µ m , ξ ≥ 0 such that at least one of the coefficients µ 1 , . . . , µ m is strictly positive and furthermore the inequality (8) holds.
Consider a weight vector w whose corresponding HG grammar maps the antecedent UR x to the antecedent winner y, which exists by hypothesis. This weight vector w thus satisfies condition (6). Since w is non-negative, the scalar product of both sides of (9) with w preserves the inequality, yielding (10). But the latter inequality requires µ 1 = · · · = µ m = 0, contradicting the assumption that at least one of the nonnegative coefficients µ 1 , . . . , µ m ≥ 0 is strictly positive.
Since the coefficient ξ is strictly positive, both sides of (8) can be divided by ξ, yielding the inequality (4) with the position λ i = µ i /ξ.

Constraint Conditions for OT T-orders
This section extends the convex geometric analysis of T-orders developed in the preceding sections from HG to OT. We start by recalling that in OT a constraint C k is said to prefer a mapping (x, y) to another mapping (x, z) provided C k assigns less violations to the former than to the latter, namely C k (x, y) < C k (x, z). A constraint ranking is an arbitrary linear order over the constraint set. A constraint ranking prefers a mapping (x, y) to another mapping (x, z) provided the highestranked constraint which distinguishes between the two mappings (x, y) and (x, z) prefers (x, y). The fact that the highest -ranked relevant constraint defines the preference of the entire ranking, irrespectively of the preferences of lower -ranked constraints, is captured by saying that the former constraint strictly dominates the latter constraints. The OT grammar corresponding to a ranking maps a UR x to that SR y such that prefers the mapping (x, y) to the mapping (x, z) corresponding to any other candidate z in Gen(x) (Prince and Smolensky, 2004). The OT typology (for a given candidate relation and constraint set) consists of the OT grammars corresponding to all rankings.
We denote by (x, y) OT → ( x, y) the implication between an antecedent mapping (x, y) and a consequent mapping ( x, y) relative to the OT typology. By definition, this implication holds provided every constraint ranking that succeeds on the antecedent mapping also succeeds on the consequent mapping. Thus, a natural strategy to check the OT implication (x, y) OT → ( x, y) would be to use Recursive Constraint Demotion (RCD; Tesar and Smolensky, 1998) to check that for every j = 1, . . . , m, no ranking is consistent simultaneously with the two mappings (x, y) and ( x, z j ). In this section, we develop instead an alternative strategy which uses the HG-to-OT-portability result of Magri (2013) to extend to OT the convex geometric characterization of HG T-orders developed in sections 2-3.
To start, we recall that an OT grammar can be construed as an HG grammar (as long as the constraint violations are bounded, which is the case when the set of URs and the candidate sets are finite). In fact, OT's strict domination can be mimicked through HG weights which decrease exponentially. Indeed, if a weight is much larger than every smaller weight, the preferences of the constraint with the larger weight cannot be overcome by the preferences of the constraints with smaller weights (Prince and Smolensky, 2004;Keller, 2006). Since the OT typology is a subset of the HG typology, whenever an implication (x, y) HG → ( x, y) holds in HG, the implication (x, y) OT → ( x, y) holds in OT. Lemma 1 slightly strengthens this conclusion. In fact, OT only cares about constraints' preferences. Equivalently, about the sign of the violation differences. Thus, the HG implication (x, y) HG → ( x, y) entails not only the corresponding OT implication (x, y) OT → ( x, y) but also any other OT implication (x * , y * ) OT → ( x * , y * ) whose antecedent and consequent mappings (x * , y * ) and ( x * , y * ) yield violation differences with the same sign as the original antecedent and consequent mappings (x, y) and ( x, y). The proof of this lemma simply uses the observation that exponentially decaying HG weights mimic OT strict domination and it is therefore omitted.
Lemma 1 Given an antecedent mapping (x, y) with its m antecedent loser candidates z 1 , . . . , z m , consider another mapping (x * , y * ) with the same number m of loser candidates z * 1 , . . . , z * m such that the m corresponding violation differences have the same sign, in the sense that condition (11) holds for k = 1, . . . , n and i = 1, . . . , m.
Analogously, given the consequent mapping ( x, y) with its m consequent loser candidates z 1 , . . . , z m , consider another mapping ( x * , y * ) with the same number m of loser candidates z * 1 , . . . , z * m such that the m corresponding violation differences have the same sign, in the sense that condition (12) holds for k = 1, . . . , n and j = 1, . . . , m.
The HG implication (x, y) The preceding lemma establishes an entailment from HG to OT implications. We now want to investigate the reverse entailment from OT to HG implications. Thus, we suppose that an implication (x, y) OT → ( x, y) holds in OT. Of course, that does not entail that the implication (x, y) HG → ( x, y) between the same two mappings also holds in HG. That is because the HG typology is usually a proper superset of the OT typology. And a larger typology yields sparser T-orders. Thus, it makes no sense to try to establish that the OT implication (x, y) OT → ( x, y) entails the HG implication (x, y) HG → ( x, y) between the same two mappings. We will try to establish something weaker instead: the OT implication (x, y) OT → ( x, y) entails an HG implication (x dif , y dif ) HG → ( x easy , y easy ) between an antecedent mapping (x dif , y dif ) different from (x, y) and a consequent mapping ( x easy , y easy ) different from ( x, y). And we will choose this new antecedent mapping (x dif , y dif ) and this new consequent mapping ( x easy , y easy ) in such a way that the new HG implication (x dif , y dif ) HG → ( x easy , y easy ) is "more likely to hold" than the original implication (x, y) HG → ( x, y) and thus validates the entailment from OT to HG implications.
What does it mean that an implication is "more likely to hold"? Intuitively, an implication from an antecedent to a consequent mapping is "likely to hold" when the antecedent mapping is "difficult" to obtain, namely it is consistent with very few grammars. In the limit, the implication holds trivially when the antecedent mapping is consistent with no grammars at all. Thus, we want to define the new antecedent mapping (x dif , y dif ) in such a way that it is "more difficult" to obtain in HG than the original antecedent mapping (x, y), whereby the superscript "diff". Analogously, an implication from an antecedent to a consequent mapping is intuitively "likely to hold" when the consequent mapping is "easy" to obtain, namely it is consistent with very many grammars. In the limit, the implication holds trivially when the consequent mapping is consistent with every grammar. Thus, we want to define the new consequent mapping ( x easy , y easy ) in such a way that it is "easier" to obtain in HG than the original consequent mapping ( x, y), whereby the superscript "easy".
Let us now turn to the details. As discussed above around (6), it suffices to define the difference vectors corresponding to the new difficult mapping antecedent (x dif , y dif ). Given the original antecedent mapping (x, y) with its m loser candidates z 1 , . . . , z m , we assume that the new antecedent mapping (x dif , y dif ) comes with the same number m of loser candidates z dif 1 , . . . , z dif m whose violation differences are defined as in (13). Here, Ω i is the total number of constraints C k such that C k prefers the original antecedent winner mapping (x, y) to the original antecedent loser mapping (x, z i ), in the sense that C k (x, y, z i ) > 0.
The intuition behind this definition (13) is as follows. OT only cares about the sign of the violation differences. Thus, the new violation dif- is defined in such a way that it has the same sign as the original violation difference C k (x, y, z i ): one is positive or negative if and only if the other is as well. HG also cares about the size of the violation differences, not only about their sign. In order for the mapping (x dif , y dif ) to be "difficult" in HG, we want its positive violation differences to be as small as possible. For this reason, the positive violation differences in (13) have been set equal to 1, which is the smallest positive integer. Analogously, in order for the mapping (x dif , y dif ) to be "difficult" in HG, we want its negative violation differences to be large (in absolute value) relative to the strength of the positive violation differences they have to "fight off". Since the positive entries are all equal to 1 in (13), the "strength" of the positive entries only depends on their number Ω i . For this reason, the absolute value of the negative violation differences in (13) has been set equal to Ω i + 1.
In conclusion, this definition (13) ensures that the mapping (x dif , y dif ) is "difficult" in HG, because the positive violation differences are small and the negative ones are large (in absolute value).
We now turn to the consequents. Given the original consequent mapping ( x, y) with its m loser candidates z 1 , . . . , z m , we assume that the new consequent mapping ( x easy , y easy ) comes with the same number m of loser candidates z easy 1 , . . . , z easy m whose violation differences are defined as in (14). Here Λ j is the total number of constraints C k such that C k prefers the original consequent loser mapping ( x, z j ) to the original consequent winner mapping ( x, y), in the sense that C k ( x, y, z j ) < 0.
The intuition behind this definition (14) is as follows. Whenever the original violation difference C k ( x, y, z j ) is positive or negative, the new violation difference C k ( x easy , y easy , z easy j ) is positive or negative as well, so that the original and the new violation differences have the same sign. The size of the new violation differences has been chosen as follows. In order for the mapping ( x easy , y easy ) to be "easy" in HG, we want its negative violation differences to be as small as possible (in absolute value). For this reason, the negative violation differences in (14) have been set equal to −1, which is the negative integer smallest in absolute value. Analogously, in order for the mapping ( x easy , y easy ) to be "easy" in HG, we want its positive violation differences to be large relative to the strength of the negative violation differences they have to "fight off". Since the negative entries are all equal to −1 in (14), the "strength" of the negative entries only depends on their number Λ j . For this reason, the positive violation differences in (14) have been set equal to Λ j + 1. In conclusion, this definition (14) ensures that the mapping ( x easy , y easy ) is "easy" in HG, because the positive violation differences are large and the negative violation differences are small (in absolute value).
We are now ready to put the pieces together. As anticipated, the OT implication (x, y) OT → ( x, y) might not entail the HG implication (x, y) HG → ( x, y) with the same antecedent and consequent mappings. Nonetheless, the following lemma 2 ensures that the OT implication (x, y) The intuition is that the latter is less demanding than the HG implication (x, y) HG → ( x, y), because its antecedent is "difficult" (namely, consistent with few HG grammars) and its consequent is "easy" (namely, consistent with many HG grammars). The proof of this lemma is provided in section 5, mimicking a reasoning in Magri (2013).

Lemma 2 The OT implication (x, y)
OT → ( x, y) entails the HG implication (x dif , y dif ) HG → ( x easy , y easy ) between the antecedent mapping (x dif , y dif ) and the consequent mapping ( x easy , y easy ) whose violation differences are defined in (13) and (14). 2 As remarked explicitly above, (13) ensures that the original antecedent violation differences C k (x, y, z i ) and the new antecedent violation differences C k (x dif , y dif , z dif i ) have the same sign. In other words, condition (11) holds with the positions x * = x dif , y * = y dif , and z * i = z dif i . Analogously, (14) ensures that the original consequent violation differences C k ( x, y, z j ) and the new consequent violation differences C k ( x easy , y easy , z easy i ) have the same sign. In other words, condition (12) holds with the positions x * = x easy , y * = y easy , and z * i = z easy i . The two lemmas 1 and 2 can therefore be combined into the following conclusion: the OT implication (x, y) OT → ( x, y) holds if and only the HG implication (x dif , y dif ) HG → ( x easy , y easy ) holds. We can thus extend to OT the characterization of HG T-orders provided by the HG proposition 1 above, obtaining the following: Proposition 2 If the antecedent mapping (x, y) is OT feasible, the OT implication (x, y) OT → ( x, y) holds iff for every j = 1, . . . , m, there exist m nonnegative coefficients λ 1 , . . . , λ m ≥ 0 such that and furthermore at least one of these coefficients λ 1 , . . . , λ m is different from zero. 2 To illustrate, we have seen at the end of section 2 that the HG implication (CC, CVC) HG → (CCC, CV.CVC) fails in HG because condition (4) fails, as shown in (5). But this entailment (CC, CVC) OT → (CCC, CV.CVC) does hold in OT. In fact, the three "easy" consequent difference vectors C( x easy , y easy , z easy j ) in this case are listed on the left hand side of the three inequalities in table 3. The two "difficult" antecedent difference vectors C(x dif , y dif , z dif i ) are repeated on the right hand side of the three inequalities. The table thus shows that condition (15) holds.

Proof of Lemma 2
We assume that the OT implication (x, y) OT → ( x, y) holds. We consider an arbitrary nonnegative weight vector w = (w 1 , . . . , w n ) which succeeds on the "difficult" antecedent mapping (x dif , y dif ) and we prove that it is also succeeds on the "easy" consequent mapping ( x easy , y easy ), thus securing the HG implication (x dif , y dif ) HG −→ ( x easy , y easy ).
The assumption that the weight vector w succeeds on the "difficult" antecedent mapping (x dif , y dif ) means that n k=1 w k C k (x dif , y dif , z dif i ) > 0 for every i = 1, . . . , m. The latter inequality can be unpacked as in (16). In step (16a), we have used the definition (13). Here W (x, y, z i ) and L(x, y, z i ) are the sets of winner-preferring and loser-preferring constraints relative to the winner (x, y) and the loser (x, z i ). In step (16b), we have upper bounded the sum h∈W (x,y,z i ) w h with its largest term max h∈W (x,y,z i ) w h times the number Ω i of its addenda. In step (16c), we have lower bounded the sum k∈L(x,y,z i ) w k with one of its terms, as the addenda are all non-negative.
We now show that the conclusion reached in the last line of (16) entails that the strict inequality (17) holds for every j = 1, . . . , m.
In fact, suppose by contradiction that (17) fails for some j = 1, . . . , m. Consider a ranking which respects the relative size of the weights, in the sense that conditions [A] and [B] hold for any two constraints C s , C t with weights w s , w t .
[A] If w s > w t , then C s is -ranked above C k .
[B] If w s = w t and C s ∈ L( x, y, z j ) and C t ∈ W ( x, y, z j ), then C s is -ranked above C k . The ranking succeeds on the antecedent mapping (x, y). In fact, the condition obtained in the last line of (16) says that there exists a constraint which prefers the winner (x, y) to the loser (x, z i ) whose weight is strictly larger than the weight of every constraint which instead prefers the loser (x, z i ) to the winner (x, y). By [A], this means that a constraint which prefers the winner (x, y) is -ranked above every constraint that instead prefers the loser (x, z i ). The ranking therefore prefers the winner (x, y) to the loser (x, z i ). Since this conclusion holds for every i = 1, . . . , m, the ranking succeeds on the antecedent mapping (x, y).
On the other hand, the ranking fails on the consequent mapping ( x, y). In fact, the contradictory assumption that (17) fails means that max h∈W ( x, y, z j ) w h ≤ max k∈L( x, y, z j ) w k . In other words, there exists a constraint which prefers the loser ( x, z j ) to the winner ( x, y) whose weight is strictly larger than or equal to the weights of the constraints which instead prefer the winner ( x, y) to the loser ( x, z j ). By [A] and [B], the ranking cannot prefer ( x, y) to ( x, z j ).
The conclusion that succeeds on the antecedent (x, y) but fails on the consequent ( x, y) contradicts the assumption that the implication (x, y) OT → ( x, y) holds in OT, thus establishing the inequality (17). This inequality can in turn be unpacked as in (18). In step (18a), we have lower bounded Λ j max k∈L( x, y, z j ) w k with the sum k∈L( x, y, z j ) w k , because Λ j is the number of addenda in the sum. In step (18b), we have upper bounded the maximum max h∈W ( x, y, z j ) w h with the sum h∈W ( x, y, z j ) w h , because the weights being summed over are all non-negative. In step (18c), we have used the definition (14) of the con-straint differences C k ( x easy , y easy , z easy j ). max h∈W ( x, y, z j ) w k C k ( x easy , y easy , z easy j ) > 0 (18) The inequality in the last line of (18) holds for every j = 1, . . . , m, ensuring that the weights w succeed on the consequent mapping ( x easy , y easy ).

Conclusions
A central task of linguistic theory is to characterize the typological structure predicted by a grammatical formalism in order to match it to linguistic data. A classical strategy to characterize typological structure is to chart the implicational universals predicted by the formalism. In this paper, we have focused on the two constraint-based phonological formalisms of HG and OT. And we have considered the simplest type of implicational universals, namely T-orders. The main result of this paper has been a complete constraint characterization of T-orders in HG and OT. These constraint conditions rely on an elegant underlying convex geometry. These conditions are phonologically intuitive and have important algorithmic implications.