Learning with Partially Ordered Representations

This paper examines the characterization and learning of grammars defined with enriched representational models. Model-theoretic approaches to formal language theory traditionally assume that each position in a string belongs to exactly one unary relation. We consider unconventional string models where positions can have multiple, shared properties, which are arguably useful in many applications. We show the structures given by these models are partially ordered, and present a learning algorithm that exploits this ordering relation to effectively prune the hypothesis space. We prove this learning algorithm, which takes positive examples as input, finds the most general grammar which covers the data.


Introduction
Foundational connections between formal languages, finite-state automata, and logic have been known for decades (Büchi, 1960;Thomas, 1997).Logical approaches are advantageous since they flexibly admit different representations.In many domains, such as biological sequencing or linguistics, shared properties of symbols in sequences provide information currently ignored by stringbased inference algorithms, which largely focus on learning automata (de la Higuera, 2010).Here we explore the idea that domain-specific knowledge can be encoded representationally via model theory (Libkin, 2004), and shows how these representations can facilitate pattern learning.This paper synthesizes results in grammatical inference and model theory to present a novel algorithm which learns classes of formal languages using enriched representations of strings.In fact, our model-theoretic approach immediately generalizes these results to arbitrary data structures.
Here we are concerned with the learning of those formal languages which can be defined via a set of structural constraints, such as the Strictly k-Local and Strictly k-Piecewise languages (Rogers and Pullum, 2011;Rogers et al., 2010).Models of strings in the languages must not contain these forbidden structures (Rogers et al., 2013).Specifically, we define a learner whose hypothesis space is structured as a partial order by the relational signature of the particular model theory.We show how to traverse this space bottom-up from positive data to find a grammar which covers the data with the most general constraints.
The paper is structured as follows: Section 2 provides mathematical preliminaries in model theory.Section 3 characterizes ordering relations over these structures.Section 4 generalizes the grammars employed in string extension and lattice-based learning (Heinz, 2010;Heinz et al., 2012) to show how these model theoretic structures can define classes of formal languages.Section 5 discusses some entailments our learning al- gorithm takes advantage of.Section 6 defines a learning problem and criteria for selecting adequate solutions.Section 7 presents a general-tospecific, bottom-up algorithm which provably satisfies the learning criteria.Section 8 concludes the paper.

Elements of Language Theory
The set of all possible finite strings of symbols from a finite alphabet Σ and the set of strings of length ≤ n are Σ * and Σ ≤n , respectively.The unique empty string is represented with λ .The length of a string w is |w|, so |λ |= 0. If u and v are two strings then we denote their concatenation with uv.If w is a string and σ is the ith symbol in w then

Elements of Finite Model Theory
Model theory, combined with logic, provides a powerful way to study and understand mathematical objects with structures (Enderton, 2001).In this paper we only consider finite relational models (Libkin, 2004) of strings in Σ * .
Definition 1 (Models).A model signature is a tuple S = D; R 1 , R 2 , . . ., R m where the domain D is a finite set, and each R i is a n i -ary relation over the domain.A model for a set of objects Ω is a total, one-to-one function from Ω to structures whose type is given by a model signature.
For example, a conventional model for strings in Σ * is given by the signature = {i ∈ D w | w i = σ }.We will usually omit the superscript w since it will be clear from the context.
For example, with Σ = {a, b, c} and the model above for strings, we have  (Büchi, 1960;Mc-Naughton and Papert, 1971;Rogers et al., 2013).Under this signature, the string abba has the following model.
In this view, a string may be called a onedimensional or unary-branching tree, since it has one axis along which its nodes are ordered.In a standard tree mdoel signature, the set of nodes is ordered by two binary relations, "dominance" and "immediate left-of".Suppose s is the mother of two nodes t and u in some standard tree, and also assume that t precedes u.Then we might say that s dominates the string tu.Standard or two-dimensional trees, then, relate nodes to onedimensional trees (strings) by immediate dominance.A three-dimensional tree relates nodes to two-dimensional, i.e. standard trees, corresponding to Tree-Adjoining Grammar derivations.In general, a d-dimensional tree is a set of nodes ordered by d dominance relations such that the n-th dominance relation relates nodes to (n − 1)dimensional trees (for d = 1, single nodes are zero-dimensional trees).
While a Gorn tree domain as written encodes these dominance and precedence relations implicitly, we may explicitly write them out modeltheoretically so that a signature for a Σ-labeled 2-

Unconventional Word Models
Whereas Rogers (2003) generalized conventional word models to trees, here we generalize word models in a different way.Conventional string models are the successor and precedence models introduced previously.What makes these models conventional is the unary relations which essentially label each domain element with a single, mutually exclusive, property: the property of being some σ ∈ Σ.
In contrast, unconventional models for strings recognize that distinct alphabetic symbols may share properties, and expands the model signature by including these properties as unary relations (Strother-Garcia et al., 2016;Vu et al., 2018).For example, a conventional model of Σ = {a, . . ., z, A, . . ., Z} would include 52 unary relations, one for each lowercase and capital letter.On the other hand, an unconventional model might only include 27: 26 for the letters, and one unary relation Capital.Then, letters A and a share the 'A' property and A additionally has the property of being Capital.
In linguistics, speech sounds are commonly decomposed into binary features based on their phonetic properties.So the set of segments {z,Z,d,b,g,. . .} all share the property +Voice, meaning the vocal cords are activated, while the segments {s,S,t,p,k,. . .} share the property -Voice, meaning the vocal cords are not activated.Thus unconventional models may refer to individual features in defining grammatical constraints, rather than each individual segment.
Different representations of strings and trees provide a unified perspective on well-known subclasses of the regular languages from a modeltheoretic and logical perspective (Thomas, 1997;Rogers et al., 2013).However, they also open up new doors for grammatical inference by allowing one to consider other models for strings (Strother-Garcia et al., 2016;Vu et al., 2018).

Subfactors, Superfactors, Ideals and Filters
We sometimes refer to the model of a string w as a structure.However, structures are more general in that they correspond to any mathematical structure conforming to the model signature.As such, while a model of a string w will always be a structure, a structure will not always be a model of a string w.The size of a structure S, denoted |S|, coincides with the cardinality of its domain.
We next wish to introduce a partial ordering over structures.To do so, we must define the terms connected, restriction, and factor.For each structure S = D; ¡, R 1 , . . .R n let the binary "connectedness" relation C be defined as follows.
Informally, domain elements x and y belong to C provided they belong to some non-unary relation.Let C * denote the symmetric transitive closure of C.
For example, M ¡ (abba) above is a connected structure.However, the structure S ab, ba shown below which is identical to M ¡ (abba) except it omits the pair (2,3) from the order relation is not connected since none of (1,3),(1,4), (2,3) nor (2,4) Note that no string in Σ * has structure S ab, ba as its model.
Informally, one identifies a subset A of the domain of B and strips B of all elements and relations which are not wholly within A. What is left is a restriction of B to A. Definition 4. Structure A is a subfactor of structure B (A B) if A is connected, there exists a restriction of B denoted B , and there exists h : A → B such that for all a 1 , . . .a m ∈ A and for all R i in the model signature: if h(a 1 ), . . .h(a m ) ∈ B and R i (a 1 , . . .a m ) holds in A then R i (h(a 1 ), . . .h(a m )) holds in B .If A B we also say that B is a superfactor of A.
In other words, properties that hold of the connected structure A also hold in a related way within B.
If A B and |A|= k then we say A is a k-subfactor of B. For all w ∈ Σ * , and for any model M of Σ * , let the subfactors of w be Subfact(M, w) = {A | A M(w)} and the k-subfactors of w be Subfact We also define Subfact(M, Σ * ) to be w∈Σ * Subfact(M, w) and Subfact k (M, Σ * ) to be w∈Σ * Subfact k (M, w).When M is understood from context, we write Subfact(w) instead of Subfact(M, w).We define the sets of superfactors Supfact(M, w) and Supfact(M, Σ * ) similarly.
Observe that (Subfact(M, w), ) is a partially ordered set (poset).The next definition and lemma establishes that models of strings are principal elements of ideals and filters.Definition 5 (Ideals).A subset I of a poset is an Ideal if • I is non-empty • for every x in I, y ≤ x implies that y is in I • for every x, y in I, there exists some element z in I, such that x ≤ z and y ≤ z.
The dual of an ideal is a filter.
Definition 6 (Filters).A subset F of a poset is a filter iff • F is non-empty • for every x in F, x ≤ y implies that y is in F • for every x, y in F, there exist some element z in F, such that z ≤ x and z ≤ y.
Definition 7 (Principal Ideals, Filters and Elements).For any poset X, ≤ , the smallest filter containing x ∈ X is a principal filter and x is the principal element of this filter.Similarly, the smallest ideal containing x ∈ X is a principal ideal and x is the principal element of this ideal.
The next two propositions show how this representational perspective unifies the treatment of substrings and subsequences.They are subfactors under the successor and precedence models, respectively.A string x = x 1 • • • x n is a substring of y iff there exists l, r such that y = lxr.String x is a subsequence of y iff there exists v Proposition 1 (Substrings are subfactors under M ¡ ).For all strings x, y ∈ Σ * , x is a substring of y iff M ¡ (x) M ¡ (y).
Proof.We leave this proof to the Reader since it is of similar nature to the previous one.

Grammars, Languages, and Language Classes
Factors can define grammars, formal languages, and classes of formal languages.Usually a model signature provides the vocabulary for some logical language.Sentences in this logical language define sets of strings as follows.The language of a sentence φ is all and only those strings whose models satisfy φ .Within the regular languages, many well-known subregular classes can be characterized logically in this way (McNaughton and Papert, 1971;Rogers and Pullum, 2011;Rogers et al., 2013;Thomas, 1997).
Intuitively, the grammars we are interested in consist of a finite list of forbidden subfactors, whose largest size is bounded by k.Strings in the language of this grammar are those which do not contain any forbidden subfactors.In this way these grammars are like logical expressions which are "conjunctions of negative literals" (Rogers et al., 2013) where the negative literals are played by the the forbidden factors.
Each forbidden subfactor is a principal element of a filter and the language is all strings whose models are not in any of these filters.For each k, there is a class of languages including all and only those languages that can be defined in this way.For example, the Strictly k-Local (SL k ) and Strictly k-Piecewise languages can be defined in this way; they are languages which forbid finitely many substrings or subsequences, respectively (Garcia et al., 1990;Rogers et al., 2010).Formally: Definition 8. Let k be some positive integer, and M a model of The elements of G are principal elements of filters, and are called forbidden subfactors.
As an example, let Σ = {a, b, c} and consider G = {M ¡ (aa), M ¡ (bb), M ¡ (c)}.L(G) includes the strings (ab) + and (ba) + and no other strings, because the substrings aa, bb, and c are all forbidden.This language belongs to L (M ¡ , 2).Proposition 3.For each w ∈ L(G) and each g ∈ G, Subfact(M, w) has a zero intersection with Supfact(g).
Proof.Suppose there exists A ∈ Subfact k (Σ * ) such that A M(w) and g A. This implies that g M(w) and thus that Subfact k (M, w) ∩ G = / 0 which contradicts Definition 8.
In other words, the principal ideal of M(w) is disjoint from the principal filters of the elements of G.

Grammatical Entailments
Given a grammar G, we call a subfactor s in Subfact(Σ * ) ungrammatical if it belongs to a principal filter of any element of G. Subfactors that are not ungrammatical are called grammatical.Lemma 14 ensures that grammaticality is downward entailing, in the sense that if a model of the word M(w) is not contained in the principal filters of the elements of the grammar, then neither are the subfactors of M(w).But it also ensures that ungrammaticality is upward entailing: if a model of the word M(w) belongs to the principal filters of the elements of the grammar, then all of the superfactors of M(w) in that filter are likewise contained.
In this way, the ideals and filters within a a particular model noted above give rise to these entailment properties of grammaticality with respect to the hypothesis space.If the learner constructs filters, then the grammar G will allow structures such that language membership is downward entailing with respect to the grammar G, and language non-membership is upward entailing with respect to the grammar G. [] Figure 3: The Structure ideals(blue) and filters(red) for a capitalized letter model.

Example: Text Capitalization
As an example, consider capitalized letters as discussed above.In an unconventional word model, each capital letter at some position x is represented as satisfying one of the relations R ∈ {a(x), b(x), . . ., z(x)} as well as the unary relation capital(x).Thus the relation a(x) is true of both lowercase a and uppercase A, but a(x) ∧ capital(x) is only true of uppercase A. Note also that in this model no position x of a structure can satisfy both predicates a(x) and b(x).We return to this point in §7.

Example: Long Distance Linguistic Dependencies
As another example, sequences of speech sounds as mentioned earlier may be decomposed into binary features based on their phonetic properties like anteriority (±ant -whether it occurs in the anterior of the vocal tract), stridency (±str -whether it produces a highintensity fricative noise), or voicing (±voiwhether it activates the vocal chords), among others (Hayes, 2009).Each sound at some position x is represented as satisfying relations R ∈ {±voi(x), ±str(x), . . ., ±ant(x)}.Thus the relation +str(x) is true of both the sound s as in the first sound of "sue" and S, as in "shoe", but +str(x) ∧ −ant(x) is only true of S.
Note also that in this model no position x of a structure can satisfy both predicates +str(x) and −str(x).We return to this point in §7 below.We again use square brackets to delimit the domain elements and write the unary features within them, so a model representation like To ease the exposition, we will use square brackets to delimit the domain elements and write the unary relations within them instead of specifying the model in mathematical detail.In an unconventional subsequence word model, then, one possible structure of the subsequence s...S is written +str +ant +str -ant .In many languages, the presence of certain segments is dependent on the presence of another segment.In Samala, subsequences like s...s are allowed but s...S are not, so words like hasxintilawas are allowed but words like hasxintilawaS are not (Hansson, 2010).In an unconventional model, banning structures of the form [+str][+str] is insufficient, since all these segments share that stridency property, while a structure like +str +ant +str -ant will distinguish them, since they disallow stridents which disagree on the ±ant(x) relations.The structure If the structure +str +ant +str +ant is grammatical, then all of its subfactors, are grammatical, and so are their subfactors, in turn.Conversely, if the structure +str +ant +str -ant is known to be ungrammatical, then any structure which has it as a subfactor is also ungrammatical (for example, +voi +str +ant +str -ant , where the first segment is also voiced +voi), shown in Red in Figure 4.
The structure filters give the learner an advantage when confronting hypothesis spaces under a particular model.In particular, it allows the learner to prune vast swathes of the hypothesis space as it reaches for principal elements of features.If a learner identifies one structure as being grammatical, the learner may infer that all of its subfactors are also grammatical and not have to consider them.Alternatively, if the learner knows a structure is ungrammatical, it may infer that the ideals above it are also ungrammatical.
Generally, these reductions can be exponential: an alphabet of size 2 n can be represented with n unary relations in the model signature.However, this exponential reduction does not necessarily make learning any easier.The reason for this is that the size of Subfact k (M, Σ * ) equals ∑ k i=1 (2 n ) i where n is the number of unary relations.Since a grammar is defined as a subset of Subfact k (M, Σ * ), the number of considered grammars is thus very large.Therefore, the problem of how to search this space effectively is paramount.capital in our capitalization example, an ordering over the unary relations such as a < b < capital can help eliminate generating the same subfactor in different ways.For example, if NextSupFact is defined to only add 'lesser' unary relations to positions that already have them then it would only output [capital, a] given the subfactor [a] as input.On the other hand, when given as input the subfactor [capital], it could not add any unary relation to this position.

Conclusion
In this paper, we considered the problem of learning formal languages defined as the complement of the union of finitely many principal filters, whose principal elements make up the grammar.This is one way to characterize the Strictly k-Local and Strictly k-Piecewise languages, but the generalization here lets us consider enriched representations of strings where different elements in a string can be said to share properties.it also lets us learn the shortest forbidden substrings in SL k (Ron et al., 1996) This is useful in many applications where domain-specific knowledge is available and should be taken advantage of.Such enriched representations, however, have a drawback.The number of subfactors is large which makes identifying the principal elements of the filters difficult.This paper showed that the partial ordering of the subfactors motivates a bottom-up learning algorithm which finds the least subfactors whose filters do not include the positive data.

Figure 1 :
Figure 1: Visualizations of the successor (left) and precedence (right) models of abba.
D} is the successor relation which orders the elements of the domain, and [R w σ ] σ ∈Σ is a set of |Σ| unary relations such that for each σ ∈ Σ, R w σ de f Figure 2: 2-dimensional tree model.Dominance and precedence relations shown with solid/dashed and dotted lines, respectively

Figure 2
showcases the relationship among these structures under a model M. The structure for A, [capital, a], contains as subfactors [capital], [a], [], and the empty structure (not shown).The empty structure is a subfactor of [], and [] in turn is a subfactor of [capital] and [a].The subfactor [a] contains the subfactor [], the domain element with no relations, but has superfactors [capital,a], which has one domain element and two relations, and [a][], which has two domain elements, and the first satisfying the property a. Subfactors and superfactors are listed above and below each other, respectively, with lines between them.Members of one ideal are noted with a blue checkmark, and members of a filter are noted by a red asterisk.Applying this to the example in Figure 3, if the structure [capital, a] is grammatical, then all of its subfactors, such as [capital] and [a], and [] are grammatical.Since those are grammatical, each of their subfactors is also grammatical, which in this case is just [ / 0], shown in blue in Figure 3. Conversely, if the structure [a][] is known to be ungrammatical, then any structure which has it as a subfactor is also ungrammatical (in this example, [capital,a][], shown in Red in Figure 3.To see the importance, consider a string with only lowercase letters.In a connected model, the grammar would ban 26 forbidden factors (A,B,C,...), but the "capital" model bans just one, [capital].
[+ant][-ant] however, is insufficient, since consonants like p,b,m have that feature, and would incorrectly ban acceptable strings.To see the importance, a conventional string model must ban multiple sibilant factors sS,zS,sZ,zZ, while an unconventional model must just ban one, +str +ant +str -ant Figure 4 showcases the relationship among these structures under a precedence model M < .The structure for +str +ant [+str] contains as subfactors (among others) [+str] [+str], [+str] [], [], and the empty structure (not shown).The empty structure is a subfactor of [], and [] in turn is a subfactor of [+ant] and [-str], and so on.