Invertible Tree Embeddings using a Cryptographic Role Embedding Scheme

We present a novel method for embedding trees in a vector space based on Tensor-Product Representations (TPRs) which allows for inversion: the retrieval of the original tree structure and nodes from the vectorial embedding. Unlike previous attempts, this does not come at the cost of intractable representation size; we utilize a method for non-exact inversion, showing that it works well when there is sufficient randomness in the representation scheme for simple data and providing an upper bound on its error. To handle the huge number of possible tree positions without memoizing position representation vectors, we present a method (Cryptographic Role Embedding) using cryptographic hashing algorithms that allows for the representation of unboundedly many positions. Through experiments on parse tree data, we show a 30,000-dimensional Cryptographic Role Embedding of trees can provide invertibility with error < 1% that previous methods would require 8.6 × 1057 dimensions to represent.


3672
(each neuron contributes to the encoding of each constituent) and the function computation is fully parallel (for many complex functions, a single step).
TPRs introduce bona fide structure to neural representations via the neural embedding of roles that define a particular type of compositional structure. For binary trees, one such role could be left-child-of-rightchild-of-root. The fillers of these roles, whether they are atomic or structures themselves, are bound to their respective roles by taking the tensor product of the neural embedding of each filler and the embedding of its structural role. The sum of these tensor-product filler-role bindings is the TPR for the whole structure.
When the embeddings of the roles are orthonormal vectors, the filler of any given role r with embedding r can be exactly recovered from the embedding S of the structure as a whole simply by taking the inner product: S · r (see Sec. 2.2, which also shows that the same invertibility property holds if the role embeddings are merely linearly independent.) This unbinding process can lead to an error if another role has an embedding r that is not orthogonal to r: unbinding r will induce an intrusion of the filler of r with a magnitude proportional to the inner product r · r.
Orthogonality of the role embeddings is of course possible only if their dimension is at least as large as the number of possible roles. Thus for a k-branching tree of depth δ, the size of the TPR is proportional to k δ . This contrasts with the size of a corresponding symbolic representation, which is proportional to the occupancy of the tree: the number of roles with non-null fillers. In NLP, where parse trees are often relatively sparse, this can be a large difference.
The source of this disparity is that, unlike symbolic representations, distributed neural representations must pre-allocate space for all possible items that may need to be encoded: here, a dimension of the TPR for each possible atomic-filler/role pair, which is required for exact invertibility.
However, we know that, in high-dimensional vector spaces, randomly chosen vectors are most likely to be approximately orthogonal (see Sec. 2.3). The work presented here investigates the Occupancy-Scaling Hypothesis: in high-dimensional embedding spaces, TPRs for trees can be invertible to a good approximation provided the occupancy k of the tree is smaller than γ times the role-embedding dimension n, k < γn, where γ = O(1). The hypothesis will hold if, under such conditions, role vectors can be assigned so that, with sufficiently high probability, pairs of occupied roles have embedding vectors that are sufficiently close to orthogonal: then intrusion will not lead to unbinding errors. If true, this means that TPRs scale like their symbolic counterparts: the number of possible roles is irrelevant; only the number of occupied roles matters.
The contributions of the paper consist in a sequence of tests of the Occupancy-Scaling Hypothesis in settings of varying difficulty. In Sec. 3, we first investigate the hypothesis in the case of random symbol strings, with random embeddings both of the roles and of the symbols that fill them. We next examine the case of natural-language sentences, considered as strings of word tokens: now the fillers are not random, although the filled roles are still predictable (positions 1 through N = length of the sentence). We then present experimental verification of a new theoretical worst-case bound on unbinding error (Appendix B).
The main contributions of the paper lie in Sec. 4, where we consider encodings of constituency-parsed sentences in which the roles filled in the parse tree are variable across sentences and the number of possible roles (tree positions) is essentially infinite. The primary innovation of the paper, the Cryptographic Role Embedding technique, is shown to effectively encode a huge number of role vectors in a fixed embedding dimension.
Our results show that, even when embeddings of symbols are highly compressed and an unbounded set of structural positions are embedded in a modest-sized space, the Occupancy-Scaling Hypothesis holds with a scaling coefficient particular to the type of data being represented. For example, in Sec. 4.3 we show that where exact invertibility would require an embedding dimension of 8.6 × 10 57 , a 30,000-dimensional Cryptographic Role Embedding of trees can provide invertibility with error < 1%.

Tensor Product Representations (TPRs)
TPRs provide a principled way of representing information with compositional structure in vector spaces, such as those used as the input and output domains of neural networks (Smolensky, 1990). Developing a tensor-product-based representational scheme begins by decomposing a compositional structure into structural roles, which define a structural type (Newell, 1980); a string can be defined by roles first-position, second-position, etc., and a tree by root, first-child-of-root, etc. A particular instance of the structural type is defined by assigning fillers to (some of) these roles. For a specific string, first-position might be filled by Kim; for a tree, first-child-of-first-child-of-root might be filled by the. A compositional structure can then be represented as the bindings of fillers to roles. Once decomposed, roles and fillers are embedded into their respective representational vector spaces. Let some information (e.g., a sentence) be encoded as a particular instance S of a structural type defined by a set of indexed roles {r j }, and let the possible fillers constitute an indexed set {f i }. Now let b S be a list of ordered pairs (i, j) representing that in S, the filler with index i (embedded as vectorf i ) is bound to the role with index j (embedded as vectorr j ). The tensor product representation (TPR) T of S is then given by In certain settings, this TPR may itself be used as a filler and subsequently be bound to another role vector (Legendre et al., 1991). This process results in a TPR that represents hierarchical compositional structure. Here we adopt a setting in which the filler of each role is an atom (e.g., a word), and hierarchical structure, e.g. of a tree, is encoded in the roles themselves, which include embedded roles such as first-child-of-second-child-of root.

Invertibility: Unbinding from TPRs
TPRs are useful because they embed arbitrary symbolic structure in a vector space in such a way that simple linear algebra operations may be used to retrieve the form of the symbolic structure, including its compositional structure. The core operation in retrieving this structure is called unbinding. We may use unbinding to query a role for its filler. When the role vectors are linearly independent, there is a method for exact unbinding (see (Smolensky, 1990) for details). When the dimension of the role-embedding space is smaller than the number of distinct roles, the case we explore below, we must use an approximate unbinding method. This 'self-addressing' 1 unbinding method is what we will use to attempt to invert TPR embeddings to recover the filler of any given role as we test the Occupancy-Scaling Hypothesis, exploring how small a TPR we can use and still retain invertibility to a good degree of approximation. Self-addressing unbinding retrieves the fillerf i for the roler i by simply computing the inner product between the role vector and the TPR:f i = T ·r i = k j=1 (r j ·r i )f i . (Here and henceforth we assume all role vectors have been normalized.) This unbinding is exact if the role vectors are orthogonal. Otherwise, the intrusion of the filler of role j,f j , into the unbound filler of role i,f i , is cos θ jifj , where θ ji is the angle between the role vectorsr j andr i .
While this unbinding is not exact, often we are interested in the case in which there is a fixed, known filler vocabulary with a given vector embedding. In such a case, it may be possible to use a similarity metric to compare the vector obtained from unbinding to the vectors embedding the vocabulary of fillers and selecting the vocabulary item with the highest value of the metric. Here, the cosine similarity of the two vectors is used as the metric; thus, we say an unbinding error for role i has occurred when there exists

The geometry of S n
In this section, we briefly present two geometric motivations for the hypothesis that, in high-dimensional spaces, random unit vectors may approximate orthogonality sufficiently for TPR unbinding. We also review a simple method for sampling from S n , used throughout the paper to generate uniformly distributed unit vectors. The first factor to note is that for a unit vector u ∈ R n , as n → ∞, the proportion of the n-dimensional unit sphere S n with an angle φ ≤ Θ of u goes to 0 for all values of Θ < 90 degrees. This has the implication that the proportion of S n forming an angle of 90 − ε ≤ φ ≤ 90 + ε (thus within ε of orthogonality) goes to 1. We can empirically estimate the rate at which this limit is approached, using Li (2011). As seen in Fig. 1 (left), the rate at which this region grows slows as the dimension increases, but the area is nevertheless large even for fairly small dimensions.
Another manifestation of the increased mass of S n close to orthogonality to a given vector in higher dimensions can be found by considering the dot products of points selected at random from S n . As shown in Appendix A, the mean and variance of the dot product of two random unit vectors in R n is 0 and 1/n. In high dimensions, the distribution appears to be well-approximated by a normal distribution: Fig. 1 (right) shows the distribution for R 100 . Therefore, most dot products are fairly small, and the larger the dot product, the less common it is. Finally, note that it is possible to sample uniformly from S n (and thus sample a random unit vector) simply by sampling from the standard normal distribution. Samples Z 1 , Z 2 , ..., Z n−1 are taken from the standard normal distribution. Then the ith coordinate of the sampled vector v ∈ S n can be obtained by Z 1 , Z 2 , ..., Z n , (Muller, 1959).

Lower and Upper Bounds on Unbinding Error
First, we consider a lower bound on error: fully random TPRs. In this case, both filler vectors and role vectors are drawn uniformly from the unit sphere, and filler-role bindings are selected from the uniform distribution. This eliminates some potential contributions to error: there is no special relation between the representations of filler vectors, and no special co-occurence properties of roles or fillers. In this case, the intrusions of other fillers on the unbinding will typically be destructive, and the expected value of the intrusion term will be 0. Nevertheless, as the number of bound roles becomes large compared to the role dimension, the variance of the intrusion term becomes larger, resulting in errors. In each simulation run, the size N of the set of possible fillers and the filler embedding dimension d were fixed. We perform a number of samples for each simulation, each time drawing a new set of n-dimensional role vectors {r i } k i=0 ∼ U(S n−1 ) that will be bound to fillers (so the occupancy is k). dings k is fixed and the dimension of the role vectors n is varied, while in others n is fixed and k is varied. The former simulations can be thought of as answering the question "How large of role vectors do I need if my symbolic structures are no larger than k?", while the latter answer "How much information can be packed into TPRs using roles of size n?" The filler embedding that will be bound to eachr i is drawn IID from a uniform distribution over the set of N possible fillers. For each fixed set of parameters, we select the filler-role bindings and create the TPR according to Equation (1). We then unbind all the roles using the self-addressing unbinding procedure, compute the similarities between the result of the unbindingf and each of the filler vectorsf j , recording whether an error was made. We divide the number of errors made by the total number of bindings to obtain a simple maximum likelihood estimate of the error probability for any one combination of N , d, n, and k. Simulations were computed in batches using PyTorch (Paszke et al., 2019).
This experiment was conducted with both the role dimension n fixed and the number of bindings k varied and vice-versa, for fixed n, k = 25, 100, 200, with d = 100, N = 2000, 10000, 50000. Results for fixed n and varied k and fixed k and varied n showed analogous patterns of error, suggesting that the ratio of n/k is the relevant factor for error, rather than their independent values. The number of possible fillers N did not seem to substantially effect the error rate. Overall, across all combinations of n and k, the error for k/n < 2 was generally less than 1%. This constitutes a confirmation of the Occupancy-Scaling Hypothesis, with scaling coefficient γ = 2: unbinding error is < 1% when k < 2n. Representative results are shown in Fig. 2. Another relatively simple but more challenging setting is given by embedding English sentences. We use the Reuters corpus from the NLTK Python package (Bird et al., 2009), taking only the sentences of length ≤ 50, yielding 49442 sentences. Here we construct at TPR as follows: if w i is the ith word in a sentence from the corpus, the filler vector f i is the embedding of w i in some vector space; this is bound to an embedding r i of the role denoting the ith linear position in the sentence. As before, the role vectors are randomly chosen from the uniform distribution on the unit sphere S n−1 ⊂ R n . For the word embeddings, we use 300-dimensional word2vec vectors taken from the Google News vectors (Mikolov et al., 2013). There are a number of potential issues here: the fillers cannot be modeled as being drawn from a uniform distribution; since the fillers are words, common words will appear in TPRs more often, being drawn from a distribution which is approximately Zipfian (Zipf, 1949). If a word appears more than once in a sentence (and thus TPR), that increases the chance of constructive interference in the direction of that word. Another challenge is the non-uniformity of word2vec vectors: since word2vec creates vectors on the basis that words that occur near each other (e.g. in the same sentence) should have more similar vectorial embeddings. This means the embeddings of the intruding vectors will be closer to the true filler than a random vector would be expected to be, leading to a potential for errors. Finally, the density of the filler space in this experiment is much greater than in the previous experiments, as there are approximately 3,000,000 GoogleNews vectors, with a dimension of 300 (N/d = 10000), increasing the change of a random error. Due to this large filler dimension, we also consider a top-5 setting, which reduces errors. Despite these potential issues, using role dimension n = 25 and letting k (here, the number of words in the sentence) vary, we again find γ ≈ 2 with a tolerance of 1% error, as shown in Fig. 3. Finally, we present a worst-case scenario. Consider a TPR where role vectors are uniformly drawn from the unit sphere S n−1 and where each role is bound to one of only two fillersâ orb. Specifically,r 0 is bound toâ, and for all i = 0,r i is bound tob. When unbinding a role from a TPR where the fillers are widely scattered, there will be destructive interference causing cancellation between the intrusions of the fillers of other roles; in this case, however, when unbindingr 0 there will be no such destructive interference, but instead constructive interference in the direction ofb. Thus, we call this scenario maximal intrusion.  error in this case risks being rather high, although this scenario is very unlikely in real-data settings. The probability that unbindingr 0 will erroneously yieldb rather than the correct resultb is bounded by P (Error) < e −n/2k √ 2πn/k (see Appendix B for proof.) In Fig. 4, we can see that this bound is not tight, with the observed error being substantially lower. Still, the bound has the property of being exponentially decreasing in n-that is, for a fixed number of bindings k, the error drops off at a rate proportional to 1 e n , so even in this worst-case scenario it is possible to unbind with a low error rate with a number of roles favorably proportional to the number of bindings, provided k is not than the desired n.

Invertible Tree Representations
The TPR embedding of trees is guaranteed to be perfectly invertible if the role embeddings are linearly independent. The roles here are the possible positions in a k-ary tree, so the number of tree roles grows exponentially with the depth of the tree; as such, maintaining the linear independence of all role vectors that is required for exact unbinding would require extremely large role vectors. However, the experiments presented in Section 3 suggest that sufficiently-high-dimensional random role vectors may be close enough to orthogonal that trees can be represented with relatively small role vectors while introducing only a small amount of error in inverting the embedding (i.e., only a small probability of unbinding errors). However, purely random roles are not feasible in the tree context, as the number of roles is potentially infinite and grows exponentially in the depth. In this section, we present a scalable role scheme for the representation of tree TPRs, and demonstrate its efficiency in representing syntax trees with minimal information loss (as demonstrated by reconstruction).

Cryptographic Role Embedding: A pseudorandom, deterministic role scheme 2 .
This tree representation scheme is designed with a few goals in mind. First, in conformity with the Occupancy-Scaling Hypothesis, the size of the representation should scale not with the depth, but with the number of filled nodes in the tree: the occupancy of the tree. Thus, sparse trees should enable a smaller representation than complete trees, even if the sparse trees are much deeper. Additionally, the roles should be close to random. independent samples from the unit sphere. The theoretical and empirical work presented here indicates that while orthogonality of role vectors is required for guaranteed perfect accuracy in role unbinding, a high degree of accuracy across varied scenarios is possible when roles are randomly drawn from the unit sphere of sufficiently high dimension. We seek then to represent positions in a tree as random points on the unit sphere; however, simply drawing randomly from the unit sphere is not scalable. For each position in the tree, a unique random vector is needed. The number of such positions in a k-ary branching tree of depth d is (k d − 1)/(k − 1). To avoid storing an exponential number of vectors and requiring a maximal depth, we propose a system in which no vectors need be stored, but the (pseudo)random vector for any position can instead be generated on-demand repeatedly.
The generation of an n-dimensional role vector in the proposed scheme can be divided into 4 steps (see 3. 3n pseudorandom uniform bytes → n pseudorandom independent Gaussian samples (Box- Muller) 4. n pseudorandom independent Gaussian samples → n-dimensional pseudorandom unit vector Each role (tree-node position) is addressed by variable-length bit string which encodes the path from the node to the root according to a simple set of rules. 3 This string must then be used to deterministically generate a sequence of pseudorandom bits. Since the string representation is such that similar roles have similar strings (in terms of, e.g., edit distance), it is essential that this generation process not map similar strings to similar sequences of bits if the independence of roles is to be maintained.
Cryptographic hash functions are a class of functions designed to solve exactly this problem. These functions map input strings of any length to a fixed number of output bits such that 1) it is not feasible to find 2 inputs which map to the same output, 2) a small change to the input results in a large change in the output, and 3) the process is fully deterministic, relying on no source of randomness. These attributes of hash functions point towards the output bits of different tree position strings being random and uncorrelated. In this work, the SHAKE256 variable-length hash function was applied to the tree position strings to generate a sequence of output bits (FIPS 202, 2015).
The output bits of the SHAKE256 hash function are effectively uniformly distributed; however, in order to obtain a uniform sample of unit vectors, random samples from a Gaussian distribution are needed (see Section 2.3). The Box-Muller transform takes uniform samples on the interval [0, 1] and deterministically produces independent normally-distributed samples. To obtain samples on the interval [0, 1] from the output of the hash function (a sequence of random bits), 3 (8-bit) bytes of output were taken at a time and normalized by dividing by 2 (3·8) . This results in n uniform samples to which the Box-Muller transform may be applied 4 . The Box-Muller transform takes 2 (pseudo)random uniform samples U 1 and U 2 as input, and produces 2 (pseudo)random Gaussian-distributed samples Z 1 and Z 2 as output according to the following formulae: While the formulae are similar, it has been proven that Z 1 and Z 2 are independent samples. The Box-Muller transform is applied iteratively to pairs of the uniform samples derived from the hash function output until n independent Gaussian samples are generated. With these Gaussian samples Z 1 , Z 2 , ..., Z n , the i-th coordinate of the node's role vector r is given by [r] i = Z i / n j=1 Z j . To test how close to uniformly distributed these role vectors are, the branching factor was set to 3 and the vectors for all tree positions up to depth 5 were generated (1093 vectors). The dot products of all pairs of vectors were computed, and the distribution was compared with an analogous distribution for 1093 randomly sampled unit vectors using Levene's test for variance equality, which showed equal variances with p > 0.99.

The filler space
We assume the vocabulary of words (terminal labels) and nonterminal labels is fixed and known, and relatively small compared to the number of tree positions. Thus, we memoize the filler vectors, and we need not use any sort of generation scheme in contrast to the roles. Each filler vector is an independent random sample from the unit sphere in R d , S d−1 , where d is the filler vector dimension.
In inverting a tree embedding, determining which positions are actually present in the tree and which are empty is a non-trivial issue -unless a node has the maximum number of children, it may have further children to the right of the child being processed; it is similarly possible that a leaf node may have further children. In the TPR scheme these unfilled roles are implicitly bound to the 0 filler vector; however, empirically no threshold was identified for filler magnitude to reliably distinguish filled positions and unfilled positions. As a result, special fillers were created representing each filler when it is the rightmost child of a parent (adding the suffix "&") and when it is a leaf (adding the prefix "#"); these fillers and their associated vectors are used instead of the true fillers in the representations. When inverting the representation, the prefixes are used to guide the search through the tree to only filled positions and are stripped from the fillers in order to reconstruct the true tree. Figure 6: The largest number of nodes (k) for which a given filler and role dimension combination has error < 1% for all smaller tree sizes. 5

Experiments
In order to test the invertibility of this representation scheme and its utility in representing naturalistic data, we carry out experiments on syntactic trees from the MASC dataset (Ide et al., 2010). This dataset contains approximately 500,000 words of text from diverse domains, separated into sentences which are annotated into parse trees in the Penn Treebank format. This dataset contains a wide variety of trees -its most branching node has 98 children, and its deepest tree has a depth of 43 nodes. This means there are 98 43 ≈ 4.19×10 85 possible tree positions to be represented by role vectors. Extreme outliers in terms of number of nodes were removed by taking all trees of size (occupancy) < 183 (approx. 99% of trees), yielding 35,379 syntax trees. Inversion of the representation is accomplished by conducting a breadth-first tree traversal by enqueuing possible tree positions, then at each position producing the appropriate role vector through the process outlined in Section 4.1, unbinding and seeing if the bound symbols are marked as rightmost or leaf symbols, and using that to keep empty sibling and child positions out of the queue.
The reconstructed tree and the original tree may not contain the same set of nodes -there may be, for instance, positions found in the reconstruction that are not present in the true tree. Due to the multiple types of errors possible, to produce a unified metric we take the F-score over the pairs of positions and node labels (thus, an incorrect filler for a correct role is treated as 2 errors). 1 − F is then treated as the "error." Within each experiment, the role and filler dimensions are held constant and evaluated over all sentences in MASC. Sentences are grouped by the total number of nodes in their tree representations. The role and filler dimensions were then varied independently. Figure 6 shows the largest tree size (in terms of number of occupied nodes) for which error is below < 1% and the error for all smaller trees is < 1%-the last point at which error is consistently below 1%.
The data indicates that while filler dimension plays a role in representation quality, there seems to be a "threshold size" in these experiments, 150, above which the filler dimension does not substantially aid performance. In contrast, no such upper bound was identified for increasing role dimension. In terms of the Occupancy-Scaling Hypothesis, we find a more complex story here. For sufficiently large filler dim. d ≥ 150, values of γ = k/n range from 0.69 to 1. Further experimentation is needed to definitively confirm whether the Occupancy-Scaling Hypothesisholds in this case; if it does, its value likely lies within this range.
We found that for trees of length < 150, Error was consistently below 1% for role dimension 200 and filler dimension 150 (γ = 0.75). Since the deepest tree of length ≤ 150 has depth 29, the widest branching has 98-ary branching, and there, it would take roles of dimension (98 29 − 1)/(98 − 1) ≈ 5.7 × 10 55 to achieve the linear independence required to represent this exactly, yielding representations of approx 8.6 × 10 57 units if using the 150-dimensional filler-vectors used here.
The reconstruction of tree-structured data presents a significant challenge that may account for the higher error in this case than previous cases -in a tree, a large percentage of nodes are either the rightmost child of their parent or a leaf. This means that there are many opportunities to make an error in reconstructing these nodes -when such an error is made, it often results in a final node being mistaken as a non-final node or vice-versa, leading to spurious unbinding of roles that are unbound or not traversing an entire subtree. In addition, the branching nature of tree structure means that such errors easily result in an exponential number of additional errors. This is an issue with the nature of the task, not the representation. Preliminary investigation by the authors into augmenting the unbinding process with an oracle determining which nodes are and are not final, indicated γ ≈ 2, similar to the random TPR and sentence TPR results in Section 3; however, this is not pursued here because it a less challenging task that requires burdensome assumptions for use.

Summary and conclusion
Can trees of potentially large depth be encoded as distributed representations in such a way as to enable fully parallel processing and high-accuracy decoding without requiring representations with a dimensionality that grows rapidly with the maximum possible depth? General hand-designed methods for neural encoding of compositional structure, and learned internal neural representations for strongly compositional tasks, have been shown to be cases of tensor product representations (TPRs), which provide distributed representations that enable parallel processing and accurate decoding -if the dimension of the representation is very large, growing exponentially with tree depth. A method is presented here for designing distributed tree embeddings (TPRs) that, by contrast, have the same scaling properties as symbolic tree representations: they grow with the number of labelled tree nodes, independently of tree depth. This technique uses Cryptographic Role embeddings to encode tree positions. This is essentially a means of hashing tree positions onto a high-dimensional unit sphere such that nearby tree positions are not mapped to nearby unit vectors, minimizing interference between decoding symbols that are close together in the tree. The use of a cryptographic hash means there are no burdensome requirements of memoization, allowing the flexible handling of extremely large or unbounded role schemes for complex compositional structures. Experiments show that trees that would require TPRs of size 8.6 × 10 57 to enable exact decoding can, with Cryptographic Role embeddings, be embedded as vectors of dimension 30,000 while keeping the probability of error in decoding a tree position less than 1%. Because the representation size is fixed, any of the 80000 98·150 150 = 8.34 × 10 213 trees of 150 nodes can be represented with this scheme. This success, coupled with the upper and lower bounds on error shown in Section 3, point to a powerful potential for TPR-based representations of many types of structure in language and other compositional domains.

B Proof of bound on maximal intrusion error
In the restricted worst-case scenario of maximal intrusion, we can prove an upper bound on a restricted case of error. We will consider it a Type I error when the unbindingf 0 ofr 0 is closer tob than the correct filler vectorâ. What will call a Type II Error arises if there exists any j < N such thatf j is closer tõ f 0 thanâ is. Note that Type II Error is the type of error we have been considering so far. In this case, all intrusion is in the direction ofb, so we expect that Type I errors will constitute the majority of Type II errors. We can express the unbinding ofr 0 asf =â + s k,nb where s k,n ≡ k i=1 i ·r 0 is a random variable with distribution determined by the independently uniform distribution of {r} k i=0 ⊂ S n−1 . A Type I error occurs iff 1 + s k,n c < c + s k,n ⇔ 1 − c < s k,n (1 − c) ⇔ s k,n > 1, where c ≡â ·b ∈ [−1, 1]. Thus, P (Type I error) = P s k,n ≡ k i=1r i ·r 0 > 1 r i ∼ U(S n−1 ) .
Under the assumption k 1, we can derive an upper bound on this error probability as a function of k and n. By the Central Limit Theorem, if the distribution of s 1,n ≡r i ·r 0 has mean zero and variance σ 2 n (∀i = 1, ..., k), then as k → ∞, P 1 k k i=1r i ·r 0 > 0 → P (X > a | X ∼ N (0, σ 2 n )) = P Y > a σ n | Y ∼ N (0, 1) .
While the bound is on Type I error and not the more general Type II error, we found empirically almost errors are of Type I, as all intrusion is in the direction of a Type I error.