The Computational Complexity of Distinctive Feature Minimization in Phonology

We analyze the complexity of the problem of determining whether a set of phonemes forms a natural class and, if so, that of finding the minimal feature specification for the class. A standard assumption in phonology is that finding a minimal feature specification is an automatic part of acquisition and generalization. We find that the natural class decision problem is tractable (i.e. is in P), while the minimization problem is not; the decision version of the problem which determines whether a natural class can be defined with k features or less is NP-complete. We also show that, empirically, a greedy algorithm for finding minimal feature specifications will sometimes fail, and thus cannot be assumed to be the basis for human performance in solving the problem.


Introduction
The distinctive feature is held by many phonologists, independently of theoretical orientation, to be the fundamental unit of analysis of sound patterns in language. The underlying working assumption of most phonological approaches is that a single sound or a set of sounds is expressed through a combination of positive or negative features and that these features are in some sense universal across languages (Mielke, 2008a). The exact makeup of the feature set employed has varied over time, ranging from the limited, more acoustic-oriented features of Jakobson et al. (1951), to the richer model presented in Chomsky and Halle (1968), to more complex hierarchically organized features in Clements (1985). The concept of natural class is intimately tied to such feature systems and is taken to be any set of segments that share some number of distinctive features. Furthermore, phonological alternations that do not target natural classes are hypothesized not to occur.
Another assumption that is found in phonological literature-often less explicit-is that whenever a phonological process targets a group of sounds, that group is to be expressed nonredundantly by the minimum number of features required to do so. In general, one can find a multitude of ways in which a set of phonemes can be specified using positive or negative features. For example, using the feature system shown in Table 1, the set {m,n} has the obvious minimal description [+nas], since m and n are the only nasals. But that set could also be specified non-minimally as: The potential complexity of this problem, finding a minimal specification, is not addressed in the phonological literature. Yet, finding such a minimal specification is not trivial. Using more realistic feature systems such as the ones given in Hayes (2011), there are, for example, 208 distinct solutions to specifying the set {e,i} in English, only four of which have the minimum length of 3 features. While such modern feature systems work with 25-30 features, one can actually assume more features are needed to cover more cross-linguistically exotic contrasts such as those produced by including click consonants (Miller, 2011). The P-base 3 resource (Mielke, 2008b) lists 398 features over 8 feature systems, covering 629 languages.
The general notion of feature economy has a long tradition in phonology (Jakobson, 1942;Martinet, 1955;Clements, 2003), mostly targeting entire feature systems in a language, i.e. advocating that on the grammar level, individual languages make maximal use of the feature inventory available (Fant, 1966). Equally prominent is the assumption that any specification of phonological alternations be made with the minimal number of features necessary: "one should use the minimum number of features required to specify all and only the sounds in the class." (Zsiga, 2012, p.282). Hayes (2011), among others, argues, following Ockham's Razor, that this is how generalization can take place and that phonological hypotheses are made in precisely this way-witnessing alternations that target a set of sounds, learners find the minimal feature specification that is consistent with the alternating sounds, generalizing from there to other sounds that may enter the language. Halle (1962) also proposes a mechanism of "feature counting" as a methodology to rule out spurious generalizations one might propose-a process which implicitly includes the capability of feature minimization.
Similar arguments of feature minimization are used to perform an optimization of an entire phonological grammar. In Radical Underspecification, Archangeli (1984) refers to what is termed FEATURE MINIMIZATION PRINCIPLE: "A grammar is most highly valued when underlying representations include the minimal number of features necessary to make the different phonemes of the language" (Archangeli, 1984, p. 48).
Given such claims concerning acquisition and phonological analysis, it is of some interest to assess the actual computational complexity of feature minimization. This entails answering how difficult it is in the worst case to determine whether a set of segments represents a natural class, and also how to find the minimal feature specification.

Overview
We will assume a set of phonemes P -the phoneme inventory-and another set Q, our target set that we want to express through a combination of features, and a feature system F such as the one shown in Table 1. The first problem we address is that of determining whether a set of phonemes Q forms a natural class, which we call the feature description problem. We show that this is decidable in polynomial time. Further, we will show that a minimization version of the problem, which we call the feature minimization problem is N P-complete (Garey and Johnson, 1979;Sipser, cons son syl voi cnt nas lat ant cor hi bk lo rd 2013). We will show this by reduction from the well known set covering problem (Karp, 1972).
Many phonologists espouse a combination of binary (equipollent) and privative (univalent) features (Trubetzkoy, 1969;Ewen and van der Hulst, 1985;Goldsmith, 1985). The rationale is that, for example, the feature [±labial] has rarely, if ever, been found to play a role in a phonological system as [−labial], i.e. phonological processes that target non-labials seem to be absent. Hence, many phonologists favor the use of a single possibility [LABIAL] in a specification that includes the labials. For this reason, we analyze separately both the complexity of using only such privative features (positive features only), which we call the positive feature description problem (is Q a natural class if we only use positive features?) and the corresponding positive feature minimization problem.

Notation and terms
Throughout, when A is a set, we use ℘(A) to denote the power set of A.
Relative to a set P of phonemes, we define a feature system to be a subset F ⊆ ℘(P ). When Q ⊆ P , we define a F-description of Q to be a sequence G 1 , . . . , G m ⊆ P such that there exist pairwise distinct elements F 1 , . . . , F m ∈ F where: • each G i is equal to either F i or P \ F i , and We refer to m as the size of the description. We say that such a description is positive if each G i is an element of F.
We define the feature description problem as follows. An instance consists of a set P of phonemes, a feature system F, and a non-empty subset Q ⊆ P ; the problem is to decide whether or not there exists a F-description of Q.
We define the feature minimization problem as follows. An instance consists of a tuple (P, F, Q, k) where P , F, and Q are an instance of the feature description problem, and k ≥ 1 is a natural number. The problem is to decide whether or not there exists a F-description of Q having size less than or equal to k.
We define the positive feature description problem to have the same instances as the feature description problem, but where the problem is to decide whether or not there exists a positive Fdescription of Q. Analogously, we define the positive feature minimization problem to have the same instances as the feature minimization problem, but where the problem is to decide whether or not there exists a positive F-description of Q obeying the size restriction.

The feature description problems
We will first show that the feature description problems are decidable in polynomial time.
Proposition 1. The feature description problem and the positive feature description problem are each polynomial-time decidable.
Proof. The algorithm for the feature description problem is as follows. Given an instance (P, F, Q), compute a set C ⊆ ℘(P ) as follows.
if so, accept, otherwise, reject. It suffices to argue that if there exists a F-description of Q, then the elements of C constitute such a description. If there exists a F-description of Q, say, G 1 , . . . , G m , we have Q ⊆ G i for each i; thus, G 1 , . . . , G m ∈ C and we have implying that the elements of C provide a Fdescription of Q.
For the positive feature description problem, the algorithm computes C + to contain each F ∈ F such that Q ⊆ F , and accepts if and only if C + = Q. The proof of correctness is similar to that given for the general feature description problem.

The minimization problems
An instance (U, S, k) of the set cover problem consists of a non-empty set U , a subset S ⊆ ℘(U ), and a natural number k ≥ 1. A set cover S 1 , . . . , S m is a sequence of sets from S such that S 1 ∪ · · · ∪ S m = U ; m is said to be its size. The problem is to decide whether or not there exists a set cover of size less than or equal to k (Karp, 1972). We prove that both the feature minimization problem and the positive feature minimization problem are N P-complete, by reducing from set cover.
The reduction is the same for both of these problems, and is as follows. Given an instance (U, S, k) of set cover, let x be a fresh element not in U . Define The resulting instance is (P, F, Q, k).
The following establishes the correctness of this reduction.
Proposition 2. The following are equivalent: 1. There exists a size m set cover of U .
2. The set Q = {x} has a positive F-description of size m.
3. The set Q = {x} has a F-description of size m.
Proof. 1 ⇒ 2: Let (S i ) be such a set cover, so that S 1 ∪ · · · ∪ S m = U . Then, by DeMorgan's laws, (U \ S 1 ) ∩ · · · ∩ (U \ S m ) = ∅, and so 3 ⇒ 1: Suppose that G 1 , . . . , G m is such a description. Since x ∈ G i for each i, it follows that each G i is an element of F. Define S 1 , . . . , S m to be the sets such that G i = (U \ S i ) ∪ {x}. By reversing the argumentation in the 1 ⇒ 2 case, we obtain that S 1 , . . . , S m is a set cover of U .
In the just-given reduction, the parameter k is not changed. We show that there in fact exists a reduction in the other direction, from set cover to each of the minimization problems, that likewise does not change the parameter k. This indicates a tight relationship between these minimization problems and the set cover problem.
Proposition 3. Let (P, F, Q, k) be an instance of the feature minimization problem; let C be the set (from Proposition 1). The mapping which, upon being given this instance, returns (U, S, k) where U = P \ Q and S = {G \ Q | G ∈ C}, is a reduction to the set cover problem; here, the complement is with respect to U . Likewise, one obtains a reduction from the positive feature minimization problem to the set cover problem, by using C + in place of C.
Proof. We argue this for the feature minimization problem as follows (the positive minimization case is analogous). It was seen in the proof of Proposition 1 that (P, F, Q, k) is a 'yes' instance if and only if there exist G 1 , . . . , G m ∈ C, with m ≤ k, such that G 1 ∩ · · · ∩ G m = Q.

This equality holds if and only if
Parametrized complexity The set cover problem, with the value k taken to be parameter, is known to be W[2]-complete in parameterized complexity theory (Cygan et al., 2015, Theorem 13.21). As we have given polynomial-time reductions between set cover and each of the minimization problems that do not change the value k, we obtain the following.
Proposition 4. Both the feature minimization problem and the positive feature minimization problem are W[2]-complete, when viewed as parameterized problems with k taken to be the parameter.

Empirical concerns
Many N P-hard problems can in practice often be solved by either a greedy algorithm (Chvatal, 1979) or a branch-and-bound algorithm (Land and Doig, 1960) that recursively explores the search space, terminating search branches as soon as some defined limit is exceeded.

Greedy Search
We have implemented a greedy search strategy that starts with no features, and S = {{a,b,c},{b,d},{c,d},{d,e}}, U = {a,b,c,d,e}, k = 2 F1 = {d,e,x} F2 ={a,c,e,x} F3 ={a,b,e,x} F4 ={a,b,c then picks a single (±) feature from the known, possibly non-minimal description C discussed in §3.1 in such a way as to rule out the majority of phonemes not in the desired set Q. Features are added to the description until only the set Q is described. For example, in describing the set {m} using the example in Table 1, the greedy approach would first pick the feature [+nas] since this cuts down the number of corresponding phonemes to just two (m and n), less than any other feature choice. This is essentially an analogue to the well-known greedy approximation algorithm for set cover (Chvatal, 1979).
This greedy algorithm, however, in many practical cases fails to find the correct minimal specification, and is therefore not a viable candidate for efficient feature minimization. A simple example is finding the featural specification for the set {@} in English under the fairly standard featural system provided in Hayes (2011) andvan Vugt andHayes (2012): the algorithm recovers the features [-tense, -back, -front, -coronal] while the minimal specification is [-back, -front, -coronal].
Branch & Bound We also implemented a branch-and-bound algorithm with a recursive component that explores, exhaustively, all combinations of features in the full description C, bounding the search whenever the current search tree contains more features than in the shortest solution found so far. The branch-and-bound algorithm is efficient in practice in finding minimal feature specifications, but still needs to explore a reasonably large search space (see Table 2).

Discussion
Humans have been noted to outperform simple, low-polynomial time heuristic algorithms for some intractable problems such as the traveling salesman problem (MacGregor and Ormerod, 1996) using intuition and visual inspection of the problem structure. It is, however, unclear if such performance carries over to other problems of a different structure, such as MINPHONFEAT, or whether the hypothesis that a phonological acquisition process should include this type of minimization is too strong.
An important point to address in the complexity analysis is whether we are operating in a bounded domain. With a fixed, finite set of features to choose from and a fixed finite phoneme inventory, the problem can be considered static and solvable in constant time by memorizing or pre-calculating all the possible patterns of feature combinations and their corresponding phoneme sets. Such counter-arguments have been leveled (Kornai, 2006) at other N P-completeness analyses, such as those that have shown that Optimality Theory is potentially intractable (Eisner, 2000;Idsardi, 2006). In the case at hand, however, such an argument has less traction since the domain in question is rather large (potentially hundreds of features) and it seems inevitable that some search method must be used by speakers (or phonologists) to discover the minimum number of features required. The fact that simple greedy algorithms do not always find the minimum specification, and that branch-and-bound algorithms, while efficient for practical computational use, still need to explore a large search space, prompts the question whether some alternative strategy would work particularly well with phonological feature structures proposed in the literature. This could address the problem that this intractability poses to phonological learning.

Conclusion
We have shown that one of the commonly assumed subtasks of acquisition of phonological generalizations-distinctive feature minimization-is computationally intractable. The decision version of the minimization problem is N P-complete, and it follows that the optimization version N P-hard. This is true even if one limits oneself to using only positive features, i.e. a privative feature system. Furthermore, a simple greedy strategy for solving the problem can not be attributed to purported human performance in finding minimal feature specifications since such a strategy will sometimes fail to find a minimal description with commonly proposed feature systems and phonological inventories. The problem of simply deciding whether a set of phonemes constitutes a natural class-the feature description problem-is solvable in polynomial time.

Reproducibility
Our feature systems data and code for the feature description and minimization problems is available at https://github.com/mhulden/ minphonfeat.