Topology of Language Classes

Theimplicationsofa specific pseudometricon the collection of languages over a finite alphabet are explored. In distinction from an approach in (Calude et al., 2009) that relates to collections of infinite or bi-infinite sequences, the present work is based on an adaptation of the “Besicovitch” pseudometric introduced by Besicovitch (1932) and elaborated in (Cattaneo et al., 1997) in the context of cellular automata. Using this pseudometric to form a metric quotient space, we study its propertiesanddrawconclusionsaboutthe locationof certain well-understood families of languages in the language space. We find that topologies, both on the space of formal languages itself and upon quotient spaces derived from pseudometrics on the language space, may offer insights into the relationships, and in particular the distance, between languages over a common alphabet.


Introduction
The question of distance between languages, and comparison of possible definitions, has relatively less consideration in the literature than other language issues, with notable exceptions being (Berstel, 1973) and (Salomaa and Soittola, 1978). This may seem surprising, considering that the current digital climate necessitates the measurement of likeness between texts and languages, for instance in search engine entries and results. Ad hoc measures of differences exist based upon rooted tree distances, but these are more like attempts to incorporate the intuitive notion of differences between words than overall differences between languages. In Linguis-tics, as well, there is as yet no accepted way of measuring the distance between two dialects of a language, with each employing the same vocabulary.
This paper borrows a pseudometric from cellular automata theory to use language density and form a topology on the set of languages (consisting of words of finite length over a fixed alphabet). A similar pseudometric is discussed in (Cattaneo et al., 1997). Our goal is to continue a systematic review and categorization of language distances, with a view to determining what gives rise to apparent weaknesses and strengths of each. As seen in (Salomaa and Soittola, 1978) and (Yu, 1997) language density is understood as the number of words in a language, conceived of as a function ̺(n) of word length n. This is shown to convey information about the nature of a language. Analysis of language density over finite words may be confined to the treatment of regular languages (Yu, 1997;Kozik, 2005), or seen as a probability density of distance between infinite sequences (Kozik, 2006).
Herein we continue the approach of (Kozik, 2006) of capturing distances between arbitrary languages, specifically by looking at features of the topology generated by each. We consider that languagesnatural or formal-are most beneficially understood as potential or actually infinite objects. As such, language patterns may or may not be adequately defined syntactically. We continue the work of Kozik in grasping language differences as word density of distinctions at the limit. Since such a limit may not exist, we look at a pseudometric inspired by Besicovitch (1932) that captures, in fact, the upper density limit of language differences.
Next, we consider "where," in the resulting "Besicovitch" topology of the language space, individual languages lie. We also look at how this relates to the Chomsky hierarchy of languages. We find that the pseudometric space of languages is not complete, and look at the lifting of the pseudometric to a quotient metric space. The hope is that this consideration may contribute to a list of relative advantages and disadvantages associated with various candidate language topologies. Our contribution is thus conceived as a part of a broader exploration in search of the most useful topology of language spaces, with eventual application to linguistic problems like measuring the distance between dialects over a common vocabulary. We have tried to study the Besicovitch topology and its quotients in some detail, but some proofs have been condensed to outlines due to space limitations.

Early approaches
Nelson (Nelson, 1980) elaborated work by Walter (Walter, 1975) which constructed a topological space from a space of rewriting grammars by means of successive divisors of grammatical derivations. The resulting topologies of both languages and grammatical derivations are equivalent to quasiordered sets, and have the property that each point has a smallest open neighborhood. If such a topology is T 1 then it is discrete.
An equivalence relation between languages was suggested by Marcus (Marcus, 1966;Marcus, 1967) based on equivalence of word contexts. Improved and elaborated by Dincǎ (Dincǎ, 1976), this approach treats the space of languages as a semigroup over the alphabet, and a distance in the quotient space (dividing by context equivalence) measures the distance between context classes of strings with respect to some chosen language.
The above described approaches, while not without interest where linguistic applications are in view, do not yield a "sufficiently smooth" topology of a language space. The first approach similar in spirit to our main thread was published by Vianu (1977), who applied the metric proposed earlier by Bodnarchǔk (1965). This approach has a number of variants, but we will point out the most important conclusions to be drawn from them as well as possible limitations of this approach.

Current literature on language topologies and distances
Language spaces allowing infinitary words, on the other hand, can be more easily endowed with adequate topologies arising out of the word topology (Calude et al., 2009), but this will not be a topic of discussion here because there seems to be no application of infinitary word languages to the study of natural human languages.

Preliminaries
In this section we review some basic definitions from formal language theory and review the bestknown approach to language distance, namely, what we will call the Cantor metric.

Notation and Definitions
For the most part, we adopt notation common to formal language theory. There are a few modifications in the interest of brevity and, hopefully, clarity of expression. We consider a language as a set of words which are concatenated from symbols in an alphabet Σ with finite cardinality α. We will deal only with words of finite length (as opposed to the words discussed, for instance, in (Calude et al., 2009)). By a language space we mean the collection of all possible languages, namely, 2 Σ * . Sets. We frequently employ the symmetric set difference of sets A and B, denoted A △ B.
Words. The length of word w will be denoted |w| and will always be non-negative. The empty word, which is the unique word of length zero, will, as usual, be denoted λ. When we need to refer to the ith symbol of the word w, we will denote this by w [i] , preserving ordinary subscripts for the enumeration of words. The fundamental operation on symbols is (non-commutative) concatenation, which is represented multiplicatively. We use the Kleene- * (-star) and -+ (-plus) operations in the usual way. Moreover, Σ n denotes the set of all words of length n, and Σ <n denotes the set of all words of lengths up to n − 1.
Languages. The empty language is simply the null set, ∅. Concatenation extends from words to languages. That is, if L and M are languages, then LM = {uv : u ∈ L, v ∈ M }. Suppose L is a language over Σ and n ∈ N ∪ {0}. Then we de-note by L n (respectively L <n , for n > 0) the set L ∩ Σ n (respectively, n−1 i=0 L i ). For example, L 0 is either ∅ or {λ}. The density of language L is the sequence {̺ n } n∈N such that ̺ i = |L i |. Then |L <n | is the nth partial sum of the series ̺. Finally, given languages L and M , we denote by L △ n M (respectively, L △ <n M ) the symmetric set difference between words of length n (respectively, less than n) in the two languages.

Language norms, metrics and the Cantor space
In setting out to find ways to adequately express the "distance" between two languages, we consider how to adapt the notions of size and separation into the realm of formal symbols. We already observe that the first defined language distance, i.e., the distance between two languages, in the literature, derives from the density of their symmetric set difference. The metric mentioned by (Vianu, 1977) is based on the shortest word in L △ M . Indeed, this leads to a full metric, and a metric topology on 2 Σ * . By analogy to the norm in a normed space representing distance from a zero point, and hence magnitude, a "language norm" can be elaborated from a pseudometric. The reader will recall that a pseudometric d on space X is a function that maps X × X to R ≥0 , such that d(x, x) = 0, d(x, y) = d(y, x) and d(x, y) + d(y, z) ≥ d(x, z). We call a pseudometric a language distance just in case it additionally is such that, if L ∩ N = ∅ and M ⊆ N , then d(L, M ) ≤ d(L, N ).
Then, to every language distance we may associate a function · d : 2 Σ * → R ≥0 by defining L d = d(L, ∅). Note that · d has the following properties: We define a language norm as any such function on languages.
Lemma 2. To each language norm · there corresponds a unique language distance d such that d(L, M ) = L △ M . The contrapositive of Lemma 2 also holds. That is, for any language distance d on 2 Σ * , the function · : 2 Σ * → R ≥0 such that L = d(L, ∅) defines a unique language norm.
2.3 The Cantor language metric and topology on 2 Σ * A Cantor language space Two languages can be compared by beginning with the shortest word in each language and proceeding to longer words. A first notion of distance is obtained using the word-length of the first distinction between languages so observed. To this end, let the language space then be normed by assigning the norm 0 to ∅ and by associating each non-empty language to a power of 1/2, as follows.
To this language norm corresponds the following language metric.
Definition 4. The function d 1 : 2 Σ * × 2 Σ * → R is a metric, where, for L and M in 2 Σ * , d 1 (L, M ) = L △ M 1 To see that d 1 is in fact not only a pseudometric but a metric, consider that d 1 (L, M ) = 0 iff L △ M = ∅, i.e., iff L = M . Let τ 1 be the metric topology induced on 2 Σ * by d 1 . For reasons to be made clear below, we call · 1 , d 1 , and τ 1 the Cantor norm, distance, and topology, respectively, on a language space.
The open neighborhoods of radius ǫ > 0 around some language L ∈ 2 Σ * , denoted B ǫ (L) = {M ∈ 2 Σ * : d 1 (L, M ) < ǫ}, form the standard basis for τ 1 . Since distances between distinct languages are powers of 1/2, it follows that elements of the standard metric basis for (2 Σ * , τ 1 ) form the collection Definition 5 ( (Vianu, 1977;Genova and Jonoska, 2006)). The language cylinder set C L,k of length k ∈ N around language L ∈ 2 Σ * is: Now let C k def = {C L,k : L ∈ 2 Σ * } be the collection of all language cylinder sets of length k. From (6) and (7) it follows that the collection C def = k∈N C k , comprising all language cylinder sets, is the standard basis for (2 Σ * , τ 1 ).

Cantor topology on a language space
As it turns out, τ 1 is equivalent to the topology of the Bodnarchǔk metric space discussed in (Vianu, 1977). We quickly recap the properties of this topology on a language space, as proven in (Vianu, 1977) and (Genova and Jonoska, 2006). Lemma 6. In (2 Σ * , τ 1 ), every cylinder set is both closed and open.
From this and the fact L∩Σ ≤i → L we also have: Corollary 8. The finite languages are dense in a space of languages under the τ 1 topology.
Thus the terminology "Cantor language space, topology," etc. 1 Corollary 10. (2 Σ * , τ 1 ) is compact, perfect, and totally disconnected. d 1 , by exploiting the general philosophy of comparing languages by comparing finite sections of languages. We then show several results, including that neither finite nor locally testable languages are dense in the topology induced. We call this alternative pseudometric the Besicovitch distance, denoted by d ζ . Under the topology induced, a language space is not compact. Rather, it has a geometry which becomes apparent from the vantage point of a metric quotient space.
The original Besicovitch pseudometric expressed the distance between two almost-periodic realvalued functions (Besicovitch, 1932) Because this pseudometric depends on the evaluation of the two functions only at discrete intervals, it is naturally adaptable to expressing distances between objects with a bound proportion of differences, as with the distance between cellular automata (Cattaneo et al., 1997); our adaptation to languages is in some sense a generalization thereof.

A Besicovitch pseudometric on language spaces
We begin by defining a Besicovitch-style language norm. Rather than halting at a particular term of the density of the symmetric set difference between two languages, this norm considers the derived infinite series |L △ <n M | in ratio to the total possible words over Σ * (given by (1)) as n goes to infinity.
Definition 11. Let · ζ for fixed alphabet Σ be the function defined: We call · ζ the Besicovitch language norm. Then let d ζ be the function mapping (L, M ) to L △ M ζ , and call it the Besicovitch language distance.
By Lemma 2, distance d ζ is a language pseudometric.
Remark 12. The Besicovitch distance d ζ between languages 1. can be described as the upper density of their set-difference; 2. turns out to constitute (like the Besicovitch language norm) a continuous, surjective mapping of 2 Σ * into the unit interval [0, 1]; 3. for a language and its complement is 1, since 5. given languages L, M ∈ 2 Σ * , can be written as follows: We present the following without proof.
The conclusion here is that · ζ is truly "normlike." For the remainder of this section, we drop most subscripts ζ.
To establish surjectivity, we first need a way to construct a language with a specified arbitrary norm.
Definition 16. Given 0 ≤ r ≤ 1, consider the sequenceř α def = {⌊rα k ⌋} k∈N . Then we call L r the set of r-simple languages in 2 Σ * , defined as follows: Lemma 17. For r ∈ [0, 1] there is at least one r-simple language; moreover for each particular value r, every r-simple language has norm r.
Proof. By construction, for each r ∈ [0, 1] the sequenceř α exists. We can selectř k words in Σ k for all k. This amounts to the construction of a language L in L r for each r ∈ [0, 1]. But then L = r, which establishes the claim. Now we have established our hoped-for result.
In addition, it is relatively easy to see that diagonalization yields that there are uncountably many r-simple languages for each r ∈ [0, 1].

Besicovitch distance quotient space
The Besicovitch distance equivalence induces a quotient space on 2 Σ * We next form collections of languages at distance zero from each other and map each such collection to a point in a quotient space, which can then be metrized. So, given L, Proof. Reflexivity and symmetry are apparent and, The collection of ∼ equivalence classes will be called the Besicovitch quotient space over 2 Σ * , denoted Q Σ ζ . Here we will drop the ζ subscript for notational clarity and assume the language space 2 Σ * unless otherwise noted. Elements of the quotient space (points in Q) will be denoted with sansserif letters L, M, N, . . ., while collections of such points will be denoted with corresponding bold letters L, M, N, . . . . Let η : 2 Σ * → Q Σ ζ denote the quotient mapping which takes a language to its ∼ equivalence class.
Remark 20. As a partition of 2 Σ * , the mapping η is well-defined and surjective, but not injective since it is a quotient mapping. The set operations of union, intersection and complementation are preserved by mappings from collections of points in Q to the sets of languages of which they are equivalence classes. In particular, every topology on Q Σ ζ is the quotient of a topology on 2 Σ * .
When language L is a member of language family L, and every member of L is contained in an equivalence class in the collection of points L ⊆ Q ζ , we will write L ∈ L and L ⊆ L instead of the more tedious η(L) = L and η(L) ⊆ L.
Proof. (⇒) Suppose there is no such m. That would mean that, for each m ∈ N, there is a word length n m such that k > n m implies |(L △ <k M )| < |Σ k−m |. We can then construct an increasing se- But, if this were true, then the Besicovitch distance between the two languages would be 0, since a straightforward calculation shows that, for each m ∈ N, d ζ (L, M ) is bounded above by α −m .
(⇐) Assume that, for some m ∈ N and for n sufficiently large, |(L △ <n M )| ≥ |Σ n−m |. Then a similarly straightforward calculation shows that Note that when two languages are similar, the sequence used in the first part of the proof is finite.
Definition 22. Given languages L, M ∈ 2 Σ * , we will denote by K ζ (L, M ) the (possibly finite) increasing integer sequence {k i } i∈N in accord with the above lemma. Indeed, K ζ (L, M ) is infinite precisely when L ∼ M .
We note that if K ζ (L, M ) has at least i terms, then k i > i and, by considering the words in L △ M of length greater than m i , we have a first estimate of the distance between two languages, namely, α −i .
By Lemma 21, the unique sequence K ζ (L, M ) expresses the relative location of languages in the quotient space Q ζ .

The quotient space has a natural metric quotient topology
We define the metric d ζ on the Besicovitch quotient space as the lifting of Besicovitch distance d. Proof. Only the implication left to right requires a proof. Suppose that d(L, M) = 0. Now suppose, contrary to the claim, that there is some language L ∈ L but not M. We conclude from the preceding definition 23 that there exists M ∈ M such that d(L, M ) = ǫ > 0. But then, for arbitrary languages L ′ ∈ L and M ′ ∈ M, the triangle inequality provides us that Thus d(L, M) ≥ ǫ/2 > 0 by the preceding definition, Q.E.A.
Corollary 25. For L, M ∈ 2 Σ * the diagram below commutes, showing the isometry between Besicovitch language space and quotient space.
It is by now evident that the Besicovitch quotient space is a metric space under distance d. This just means that the d metric topology on Q ζ is the quotient of the pseudo-metric topology induced by d on 2 Σ * . Letτ ζ denote the collection of open sets in Q ζ under the d metric topology, and let τ ζ denote the collection of language sets in 2 Σ * such that η(τ ζ ) =τ ζ . We will call τ ζ the Besicovitch language topology.

Convergence has a novel interpretation in the quotient space
From Remark 12, the Besicovitch language topology is not T 1 , and so convergence to a language is not well-defined in (2 Σ * , τ ζ ). But there is no such difficulty with the quotient space.
Lemma 27. A sequence {L i } i∈N in Q ζ converges to the point L ∈ Q ζ iff the following: ∀m ∈ N ∃k m ∈ N such that i > k m means that, if language L i ∈ L i and L ∈ L then there exists integer N i for which k > N i implies |(L △ k L i )| < α k−m .
Note that, unlike the case of the Cantor space, in the Besicovitch language (quotient) topology, a convergent sequence of points converges to ∼ equivalence with the (languages in the) limit point.

The quotient space is perfect but not compact
We can next address the compactness question for the Besicovitch quotient space, since it is a metric space, by determining whether every infinite sequence of points has a convergent subsequence. We ultimately show here that neither Q ζ nor 2 Σ * is compact, although Q ζ is a perfect set.
We first establish the latter property using the following fact. It then follows immediately that Q ζ is perfect. To make progress on the compactness question, we construct a family of two-sided word ideals in Σ * which, when split into non-disjoint right ideals, yields an infinite sequence in the quotient space with no convergent subsequence. We will call I a right, left, or two-sided word ideal of the monoid Σ * just in case there is a word w ∈ Σ * such that I = wΣ * , I = Σ * w, or I = Σ * wΣ * respectively. Note this is just like the definition of ideal in a monoid (Howie, 1995) except we are restricting the reference to a singleton set containing particular word w. Now let J w denote the two-sided word ideal Σ * wΣ * . Then for k ∈ N, the kth section of J w is Σ k wΣ * J w , which is denoted J w,k .
Lemma 29. For i, j ∈ N and where |w| = l, We can also compute the norm of J w,i when |w| = l: From Lemma 29 and the above calculation, taking J w,i def = η(J w,i ), the sequence {J w,i } i∈N is such that no subsequence can converge, yet every language in each point of the sequence has the same norm.
Lemma 30. The Besicovitch quotient space Q ζ is not compact.
Proof. It is sufficient to display an infinite sequence of languages belonging to distinct ∼ equivalence classes separated from each other by a distance greater than some fixed ǫ > 0. Then the η-images of these languages will form an infinite sequence in Q ζ which has no convergent subsequence.
To this end, consider the language sequence J a def = {J a,i } i∈N where a ∈ Σ. Two distinct terms J a,i and J a,j are at distance 2α −1 , from the previous lemma, so consider the sequence L = {L i } i∈N , where J a,i ∈ L i for all i ∈ N. By Corollary 25, there is no convergent subsequence of L, Since sequential compactness is not defined in the pseudo-metric language space, we exhibit the following result to clear up any remaining doubts about compactness there.
Corollary 31. The metric d is not complete.
Proof. (Outline) It suffices to exhibit a sequence of points which are Cauchy convergent in Q ζ , but which do not converge to any point in Q ζ . We then produce a sequence, Cauchy in Q, but containing the non-convergent sequence L from the proof of Lemma 30 as a subsequence. We then have by Lemma 30 that any finite subset of O contains at most finitely many languages in J a . Therefore O has no finite subcover.
While establishing noncompactness has been important, it will also be useful to establish a relation to a known compact space. This is the subject of the next subsection.

A second lifting of the quotient space
To obtain a compact space for exploring the most general features of the Besicovitch topology on language spaces, we define the language norm · ζ as a quotient map from Q ζ into [0, 1]. This will result in a total of three spaces: the non-T 1 language space under the topology induced by Besicovitch distance, the quotient space topologized by the metric quotient topology, and a compact upper quotient space with a well-known topology. We proceed as with the definition of Q ζ by defining an equivalence relation, the equivalence classes, and the quotient map which takes points in Q ζ to their equivalence classes. We call the collection of equivalence classes the upper Besicovitch quotient space, denoted N ζ . We ultimately show that the topological space N ζ under the quotient topology is homeomorphic to the unit interval.
Take let κ be the map from Q ζ to N ζ which takes L to its equivalence class L ζ . Finally, for r ∈ [0, 1], let r ζ denote {L ∈ Q ζ : L ζ = r ∀L ∈ L}.
Remark 33. It is obvious that ≡ is an equivalence relation. Moreover, the quotient map κ is welldefined, by Corollary 26. Since r ζ = M ζ for each M ∈ r ζ , this implies by Remark 12 that r ζ = M for precisely one M ∈ N ζ .
We next equip the upper quotient space with a metric. Let the distance function ρ : N ζ × N ζ → [0, 1] be defined such that, if L = r ζ and M = s ζ for some r, s ∈ [0, 1], then ρ(L, M) = |r − s| as a metric on N ζ . The collection U of basis sets under the induced topology equals: Remark 34. The set U is apparently equivalent to the subset topology on the unit interval. To wit, there is a homeomorphism between N ζ and [0, 1] if the function ρ induces the quotient topology on N ζ .
We continue to abuse the notation as was done with languages and the quotient space, and write L ∈ r ζ or equivalently L ∈ L to mean language L is found in points of the equivalence class r ζ . We write L ⊆ r ζ to mean that each language in the class L is in a point (not necessarily all in the same point) in the equivalence class r ζ . We write L ⊆ L to mean that the image κ(η(L)) is a subset of the collection of elements L ⊆ N ζ . We will next show that, with exactly two exceptions, r ζ is always an uncountable subset of Q ζ . Since 1 is a singleton, given a point L there is exactly one point in Q ζ at distance 1. If L, M ∈ Q ζ and d(L, M) = 1, then points L, M will be called antipodes, which we denote as L = ¬M.
Lemma 36. Every point L ∈ Q ζ has a unique antipode in the Besicovitch quotient space.
Proof. From Corollary 25 this is the same as claiming that, if two languages are at distance 1 from the same language L ∈ L, then they are ∼-equivalent. But this is a consequence of the identity We can show this because if d(L, M 1 ) = 1 and d(L, M 2 ) = 1, it follows that L △ M 1 and L △ M 2 are in 1 (from Def. 11), implying by Lemma 35 that d(L △ M 1 , L △ M 2 ) = 0, requiring M 1 ∼ M 2 .
In addition we note that L ∈ 0 iff ¬L ∈ 1, and also that ¬L = L if and only if L = 1 2 for any language L ∈ L.
For each point L ∈ Q ζ , the L-rotation of point M ∈ Q ζ , denoted η L (M), is defined as the point η(L △ M ) for some language L ∈ L. The L-rotation of the Besicovitch quotient space, denoted Q Σ,L ζ , is then the collection {η L (M) : M ∈ Q ζ }. The Lrotation of the ≡-equivalence class r, denoted r L , is defined as the set {M ∈ Q ζ : d(M, L) = r}. The L-rotation of the upper Besicovitch quotient space, meaning the collection {r L : r ∈ [0, 1]}, will be denoted N ζ,L .
Lemma 38. Q Σ,L ζ is equivalent as a set to Q ζ , and L-rotation is a bijection of the quotient space onto itself. Moreover, N ζ,L is a bijection with N ζ .
There are uncountably many ≡ equivalence classes, because the norm · is surjective onto the unit interval. In addition, we now proceed to show that no open set in Q ζ is contained in a single ≡ equivalence class. This is the essential condition for the proof that ρ is the quotient of d. We begin with a straightforward proposition.
Proof. For s = 0, let M = ∅, Q.E.D. For s = r, let M = L, Q.E.D. Now assume s ∈ (0, r). Note that s/r > 0; form language sequence L = {L∩Σ i } i∈N , and using this define the integer sequence {m i } i∈N such that There exists a language sequence {M i } i∈N such that M i ⊆ L ∩ Σ i and |M i | = m i . Then we can calculate that 0 ≤ (s/r)|(L ∩ Σ <k )| = |(M ∩ Σ <k )| < k, so M = (s/r) L = s, and M ⊆ L; Q.E.D.
Remark 40. The above result can be reversed, in that if 0 ≤ r ≤ s ≤ 1, then for any language L ∈ r there exists language M ⊇ L such that M ∈ s. The target language is L in case 0 = r = s, Σ * in case s = 1, and in case s ∈ (0, 1) may be constructed as in Proposition 39 by inverting the fractions in (14) et seq.

Lemma 41.
No open set in Q ζ is a subset of a ≡ equivalence class.
Proof. Since Q ζ is perfect, this follows for the classes 0 ζ and 1 ζ directly from Lemmas 35 and 28. Otherwise, suppose L ∈ 2 Σ * and L ∈ Q ζ such that L ∈ L ∈ r. For any open set L ⊂ Q ζ containing L, there is a number It is sufficient to exhibit a language M ∈ M such that M = L and d(L, M ) < ǫ ′ . Let ǫ = min{r/2, ǫ ′ /2}. Note that ǫ ′ > ǫ > 0. Our selection of ǫ guarantees the following: 0 < ǫ < r ≤ 1, which implies that 0 < r − ǫ < r This corollary states that, under the Besicovitch topology, representatives of some continuous interval of norm values can be found in every open set in the language space. This means that, as was claimed in Remark 12(1), the language norm · ζ is a continuous map from 2 Σ * onto [0, 1].
Theorem 43. The upper quotient space N ζ is homeomorphic to (and so essentially is) the unit interval [0, 1].

Ideals simplify exploration of the elements of the upper quotient space
Earlier we defined the (word) ideals of Σ * . To elaborate on this, recall the earlier discussion of r-simple languages (v. Def. 16), and consider the monoid ideals of Σ * .
Lemma 44. If real number r ∈ [0, 1], there exists a right ideal of Σ * in L r .
Proof. If r = 1 then w = λ trivially satisfies the lemma. So we assume r ∈ (0, 1). Since by Def. 16 0 ≤ř 1 < α, there is a subset I 1 of Σ (actually, at least α subsets) such that |I 1 | =ř 1 . Note from the definition of L r thatř k ≤ rα k <ř k + 1 for all k ∈ N. Multiplying through by α gives the inequality But for k + 1 we havě Since all values are non-negative integers we can combine the preceding two equations to yielď It follows thatř k+1 =ř k α + t k for some t k ∈ N such that 0 ≤ t k < α. Therefore for all k ∈ N, Thus there exists language T 1 ⊆ Σ 2 \I 1 Σ such that |T 1 | = t 1 , so that |I 1 Σ ∪ T 1 | =ř 2 . Set I 2 = I 1 Σ ∪ T 1 . Continuing in this fashion, let T k for each k ∈ N be a language such that T k ⊆ Σ k+1 \I k Σ and |T k | = t k . Finally, for k ∈ N define language I ∈ 2 Σ * such that I ∩ Σ k = I k , which is to say let I = i∈N I i . Then by construction, I ∈ L r , and wΣ j ⊆ I for all w ∈ I and every j ∈ N. Thus IΣ * ⊆ I.
The preceding result provides further evidence that right ideals are ubiquitous in the Besicovitch topological space. We now develop our understanding of the ideals to comprehend the elements of the upper quotient space. We begin by extending the notion of "sections of a word ideal." Definition 45. An n-word ideal in the monoid Σ * is a language J F such that J F = Σ * w 1 Σ * w 2 . . . Σ * w n Σ * for some finite language F = {w 1 , w 2 , . . . , w n } over Σ * . Then f F = n i=1 |w i | is the length of F . If v = (v 1 , . . . , v n ) is a vector over N 1×n , then the vsection of J F is denoted J F,v and is the right ideal defined as: Lemma 46. For every vector v over N 1×n and every language F such that |F | = n, In addition to the above result, it is possible to extend the proofs of Lemmas 44 and 30 to the n-word ideals by induction. Taken together, these results tell us that points in the upper quotient space contain languages that "closely resemble" unions of sections of ideals of Σ * , in the following sense: cardinality of sections of these languages (as word length increases) must approximate the cardinality of the unions of (sections of) ideals.
We conclude this section by showing that all ≡ classes except 0 ζ and 1 ζ are uncountable.
Proof. From Lemma 44, there is an r-simple language L. In fact, there exist at least two r-simple languages, since for each r ∈ (0, 1), This means that for k ∈ N there exists a subset of Σ k \L = ¬(L ∩ Σ k ) consisting of the lesser of either ⌊rα k ⌋ or 1−r 2 α k words, and there exists a subset of L ∩ Σ k consisting of the same number of words. This means there exists an r-simple language at distance s = min{2r, 1 − r} from L. We now construct this language in the following way: let t k = min ⌊rα k ⌋, 1−r 2 α k ; let T k be a language such that |T k | = t k and T k ⊆ ¬(L ∩ Σ k ), which is possible since |¬(L ∩ Σ k )| ≥ 2t k ; and let F k ⊆ L be such that |F k | = t k , which is possible since t k ≤ |L ∩ Σ k |. Let T = i∈N T i , F = i∈N F i , and let N = L\F . Then language L ′ def = N ∪ T is the language formed by exchanging t k words in L for t k words in ¬L. Thus the number of words in L △ k L ′ is 2t k = sα k for each k ∈ N. Hence, d(L, L ′ ) = s and, since L and L ′ contain the same number of words of each length, they have the same norm. Since L is r-simple, so is L ′ .
For all t ∈ R such that 0 ≤ t ≤ s, since s ≤ r there exists language F ′ ⊆ F ⊆ L such that F ′ = t/2, and there exists language T ′ ⊆ T = L ′ ∩ ¬L such that T ′ = t/2 (by Proposition 39). Then it

The Chomsky hierarchy
In this final section we show a few results which relate our Besicovitch topologies to the classical language classes.
The finite and locally testable languages are not dense A major inadequacy of the Cantor topology was the density of the finite languages. By contrast, these are confined to a single ∼-equivalence class in the Besicovitch topology.
Lemma 48. The finite languages are all in 0 ζ .
Proof. If language L is finite, there exists N ∈ N such that n > N implies L ∩ Σ n = ∅, and hence also that |L △ n ∅| = 0.
This naturally leads to the question, addressed presently, what happens if the description of an infinite language is entirely finitary?
We first remind the reader that a language L is locally testable just in case there is a fixed integer k (called a window length) and a proper subset F Σ k such that, if every factor of word w of length k is in F then w ∈ L. The important thing about the locally testable family is that the membership question "Is w ∈ L?" is decidable by inspecting subsequent k-length factors of w. We next define a larger class of "generally testable" languages with the property that every locally testable language is a subset of some generally testable language. Definition 49. A language L is generally testable if there exists a window length n ∈ N and a set of permitted factors S ⊆ Σ n , where L = S * Σ <n .
From this definition we see that word w ∈ L if and only if w ∈ Σ <n or w can be written u 1 u 2 · · · u t v, where u i ∈ S for all i ∈ N t and v ∈ Σ <n . It is interesting that the size of a generally testable language is not really limited, but yet we have the following result. Lemma 50. Every generally testable language in 2 Σ * is in 0 ζ with the exception of Σ * , which is in 1 ζ .
Proof. (Outline) Let the permitted factors of a word in L be S ⊆ Σ n . If |S| = s, suppose s = α n . But then S = Σ n , L = Σ * , and therefore L ∈ 1 ζ .
On the other hand, if s < α n , and word w ∈ L, there exist unique non-negative integers q and r, such that |w| = nq + r and 0 ≤ r < n, and words u 1 , u 2 , . . . , u q in S, and word v in Σ r such that w = u 1 u 2 . . . u q vWe deduce that |L ∩ Σ |w| | = s q α r .
We can therefore easily see that the proportion of the number of words L <i to those in Σ <k is maximized at word lengths where q = n − 1, i.e., where i = nk + n − 1. We conclude the following: By our assumption, s ≤ α n −1. Straightforward calculation shows the right side of the above equation tends to zero, because it is bound above by lim sup k→∞ 1 2 (α n − 1) k (α n ) k .
Proof. Suppose L is a locally testable language over Σ with window length n and permitted factors S Σ n . Consider the generally testable language L ′ with the same window length and the same permitted factors as locally testable language L. Then L ⊆ L ′ and, by the properties of a language norm, L ≤ L ′ ; meanwhile L ′ = 0 from the preceding lemma.

Regular languages are dense in the upper quotient space
We have now seen that all finite and locally testable languages belong to 0 ζ . On the other hand: Lemma 52. Regular languages are dense in the upper quotient space N ζ .
Let language S ǫ ⊆ Σ n have cardinality q. Consider the right ideal S ǫ Σ * , which is a disjoint union of the q right word ideals wΣ * with w ∈ S ǫ . Note that each of these is a 1-word ideal section J F,v , where F = {w} for w ∈ S ǫ and v = (0). Therefore by Lemmas 13 and 46, From (20) this means that | S ǫ Σ * − r| < ǫ as required. Finally, by the Myhill-Nerode Theorem (Nerode, 1958) S ǫ Σ * is a regular language, since all but finitely many words in S ǫ Σ * can be followed by Σ * .
This means that the linear, context-free, contextsensitive, and recursively enumerable languages are all dense in the upper quotient space. We still do not know where all these families lie in the lower Besicovitch topological spaces, but we conjecture that the regular languages are indeed also dense in the Besicovitch topology (2 Σ * , τ ζ ).

Non-r.e. languages are dense in both quotient spaces
We can show fairly simply that the nonrecursively enumerable languages are ubiquitous in the Besicovitch topological spaces. Because d ζ is a strict pseudo-metric, the ∼ equivalence classes are uncountable. We present the following without their (uncomplicated) proofs due to space limitations.
Lemma 53. The single element of the class 0 ζ is uncountable in 2 Σ * and contains a non-r.e. language.

Conclusion
We have attempted to improve upon previous definitions of distance between languages in a language space. After considering previous work by Vianu (1977) which defined a language distance using the density of their symmetric set difference, we progressed to a new adaptation of a pseudometric inspired by Besicovitch (1932). In a language space, the Besicovitch pseudometric was developed which is essentially the upper density of the set-difference between languages. By lifting to the quotient space Q ζ using Besicovitch equivalence, a natural metric topology was developed and shown to be perfect but not compact. Another step of lifting brought us a compact "upper" quotient space N ζ homeomorphic to the unit interval. The ideals of this upper space were studied, also invoking the notion of word ideal defined herein. In the last section it was shown that neither the finite nor locally testable languages are dense in N ζ . Finally, the regular languages were shown to be dense in N ζ , and the non-r.e. languages were shown to be dense in both Q ζ and N ζ .