NeuReduce: Reducing Mixed Boolean-Arithmetic Expressions by Recurrent Neural Network

Mixed Boolean-Arithmetic (MBA) expressions involve both arithmetic calculation (e.g.,plus, minus, multiply) and bitwise computation (e.g., and, or, negate, xor). MBA expressions have been widely applied in software obfuscation, transforming programs from a simple form to a complex form. MBA expressions are challenging to be simplified, because the interleaving bitwise and arithmetic operations causing mathematical reduction laws to be ineffective. Our goal is to recover the original, simple form from an obfuscated MBA expression. In this paper, we first propose NeuReduce, a string to string method based on neural networks to automatically learn and reduce complex MBA expressions. We develop a comprehensive MBA dataset, including one million diversified MBA expression samples and corresponding simplified forms. After training on the dataset, NeuReduce can reduce MBA rules to homelier but mathematically equivalent forms. By comparing with three state-of-the-art MBA reduction methods, our evaluation result shows that NeuReduce outperforms all other tools in terms of accuracy, solving time, and performance overhead.


Introduction
Mixed Boolean-Arithmetic (MBA) expression emerges as a software obfuscation (Collberg and Nagra, 2009;Collberg et al., 2012;Ceccato, 2014;Bardin et al., 2017) technique, converting software into a syntactic different but semantic equivalent form. Software developers have broadly adopted MBA expressions obfuscation to resist malicious reverse engineering attacks or illegal cracking. For instance, software vendors (Mougey and Gabriel, 2014) and communication providers (Moghaddam et al., 2012) employ MBA obfuscation to protect critical information such as Digital Rights Man-agement (DRM) or communication protocols and protect users' private contents.
MBA obfuscation technology draws strength from its neat design and rigorous mathematical foundation (Zhou and Zhou, 2006;Zhou et al., 2007). It transforms a simple expression into an equivalent but more complex form, which contained mixed arithmetic and bitwise calculations. However, existing mathematical reduction rules can hardly simplify complex MBA expressions, because they only fit either pure arithmetic or bitwise operation. Existing researches explore diverse solutions to conquer MBA obfuscation, including bitblast (Eyrolles, 2017;Guinet et al., 2016), pattern matching , and software synthesis (Blazytko et al., 2017). Nevertheless, these methods treat MBA expressions as black-boxes and neglect expressions' inner structures, which led to inevitable limitations such as low simplification accuracy or high-performance penalty.
In this paper, we propose NeuReduce 1 , a novel solution that utilizes neural networks to defeat complex MBA expressions. Our proposal can take complex MBA expression input as a character string format and output the simplification results. NeuReduce leverages supervised learning to ensure the correctness and conciseness of its outputs. We also notice that no large scale or diverse MBA expressions dataset is available for training and evaluating our proposed approach. We first generate a MBA dataset consisting of 1,000,000 MBA expressions with diversified features. To the best of our knowledge, this is the largest and most diverse MBA expression dataset. Second, we implemented NeuReduce based on modern neural network models, i.e., Long Short-Term Memory, Gate Recurrent Unit, and attention-based recurrent networks. We train NeuReduce using our comprehensive MBA dataset and compared its performance with state-of-the-art reduction tools. For an impartial comparison, we carefully reviewed previous researches and summarized three evaluation metrics, i.e., accuracy, complexity, and solving time, as described in Section 5. Our experiments show that NeuReduce presents a superior performance than advanced reduction tools in these three aspects.
In summary, we make the following contributions: • We develop a large-scale MBA expression dataset, including diversified types of obfuscated MBA expressions and related reduced form. The dataset resolves the problem of lacking sufficient MBA samples to do indepth MBA research.
• We propose a novel sequence to sequence model NeuReduce, which can help security experts analyze software obfuscated by MBA rule. To the best of our knowledge, NeuReduce is the first proposal of applying a neural network method for defeating MBA obfuscation.
• We perform a comprehensive evaluation of NeuReduce's effectiveness with other stateof-the-art methods, and the result shows that NeuReduce outperforms peer methods in various aspects.

MBA Obfuscation
Mixed-Boolean Arithmetic (MBA) obfuscation (Zhou et al., 2007) is a concise and practical software obfuscation approach. It complicates original simple operations such as x + y with complex but equivalent ones with mixed arithmetic operations (e.g., +, −, ×, ...) and Boolean operations (e.g., ∧, ∨, ¬, ⊕, ...), which hamper reverse engieers from quickly obtaining important software information. Figure 1 presents an application of MBA obfuscation. Zhou's work proves that any simple operations such as x − y or x ∧ y can be transformed into complicated and equivalent MBA rules, which lays the solid mathematical foundation of MBA obfuscation. Therefore, the MBA obfuscation technique has achieved great success in software safeguards (Liem et al., 2008;Quarkslab, 2019;Irdeto, 2017).

Existing MBA Deobfuscation
Due to its simplicity and high efficiency, MBA obfuscation has been applied in software obfuscation. On the other side of the arms race, researchers have started to investigate how to simplify MBA expressions.
Arybo (Guinet et al., 2016) converts all arithmetic operations into boolean operations. It utilizes traditional math rules for Boolean simplification to reduce an intermediate Boolean expression into a bit-level symbolic expression, which represents the simplification result. Since high-performance cost caused by transforming arithmetic operations into Boolean ones, Arybo can only deal with small-size MBA expressions. Moreover, simplified results generated by Arybo is difficult to interpret by human because it is in a pure Boolean form.
SSPAM  uses pattern matching to simplify MBA expression. SSPAM can figure out some existed real-world MBA expression cases mentioned by Mougey and Gabriel (2014). However, the effectiveness of pattern matching methods heavily relies on collected substitution rules, which restricted SSPAM from handling generic MBA expressions.
Syntia (Blazytko et al., 2017) utilizes program synthesis technique to generate a comprehensible expression for a complex MBA expression. The result shows that Syntia can successfully synthesize 89% expressions on a synthesized dataset including 500 MBA expressions. Nevertheless, Syntia cannot guarantee the correctness of generated expressions due to the uncertainty nature of program synthesis.
Other nonproprietary reduction tools such as LLVM compiler optimization (Garba and Favaro, 2019) has a limited effect on MBA reduction. Eyrolles (2017) have proven that other popular symbolic calculation software such as Maple 2 , Wolfram Mathematica 3 , SageMath 4 , and Z3 (Moura and Bjørner, 2008) lack the capabilities to handle MBA expressions.

Methodology
It has been proven that the MBA deobfuscation is an NP-hard problem (Zhou et al., 2007), which means no general deterministic algorithms can solve this problem effectively. Existing methods mentioned in section 2.2 treat MBA obfuscation as a black-box, rather than understand the mechanism. To address the limitation on existing MBA deobfuscation methods, we propose NeuReduce, a novel approach based on the sequence to sequence architecture (Sutskever et al., 2014; with encoder-decoder (Cho et al., 2014b) to reduce MBA expressions. Considering the characteristics of the MBA reduction problem, the reasoning from one sequence to another, we review and compare several deep neural networks and adopt the most effective model as the basic module of NeuReduce. We compare four broadly used neural networks: Long Short-Term Memory (Hochreiter and Schmidhuber, 1997), Gated Recurrent Unit , recurrent neural network based on the attention mechanism (Bahdanau et al., 2014b), and Transformer (Vaswani et al., 2017). The following sections elaborate the techniques in NeuReduce in details.

NeuReduce Design
We apply the Encoder-Decoder as NeuReduce's framework to implement expression to expression reduction, as shown in Figure 2. The input of NeuReduce is an arbitrary-length MBA obfuscation expression represented by a sequence. NeuReduce uses character-level one-hot encoding to encode the inputs into a matrix and feeds it into an encoder composed of recurrent neural networks. The encoder transforms the input MBA expression into a fixed-length hidden state vector through a linear layer. The decoder in NeuReduce is responsible for generating output matrices through recur-rent neural networks based on the encoder's output. With the result vector, we can further reconstruct the corresponding MBA expressions through the character dictionary. In order to get the best result from NeuReduce, we adopt four neural networks as the candidates and discuss the detail of how these four models are incorporated in NeuReduce in the next two subsections.

Recurrent Architecture
LSTM is a powerful basic model for natural language processing and reaches state-of-the-art industry standards in many areas. The gate-based units endow LSTM with the power to solve the vanishing gradient problem that often occurs in RNN. With that, LSTM can capture long term dependencies and discover potential relationships between variables or operators, which can help NeuReduce to understand complicated MBA expressions.
We set an embedding layer as the input receiver and respectively used to accept complicated MBA expressions and their corresponding expected expressions in our first experiment. Two layers of LSTM with tanh activation functions are connected to the embedding layer. We use the above configuration to construct NeuReduce's encoder and decoder. A linear layer with a softmax activation function is connected to the LSTM layer for the final output channel to export the prediction result in the decoder. With the LSTM-based NeuReduce, we can encode expressions into a size-fixed onehot encoding matrix and fed it to NeuReduce. All hyperparameters of the network are derived from grid search.
Although LSTM has a strong understanding ability of long sequence, with complex structure and numerous parameters, it usually requires numerous time and computation resources to train the model. GRU is another variant of the recurrent neural network. Compared with LSTM, GRU has a more compact structure and fewer parameters, and its performance will not be significantly reduced with the reduction of the model. To test the ability of LSTM and GRU in the same environment for reducing MBA expressions, we replace the LSTM in the recurrent layer of NeuReduce with GRU and keep other configurations unchanged.

Attention Mechanism
The Encoder-Decoder model is the most popular model structure in neural machine translation (Stahlberg, 2019) and has achieved significant per- Embedding Layer

Recurrent Layer
Linear Layer formance. However, as mentioned in Section 3.1, the encoder encodes entire inputs into fixed-length hidden vectors and ignores the difference in priority caused by the brackets in expressions, which leads to the model not being able to make full use of the heuristic information in expressions.
In order to further improve the capabilities of NeuReduce, we draw attention to our architecture. Attention is an improvement of Encoder-Decoder models, which gifts neural networks the ability to distinguish valuable parts from the sequences. The design of attention is complicated, and the model's size increases sharply compared with LSTM. We consider attention as a comparative model for its excellent performance. For inputs of arbitrary length, we use the Embedding layer to encode the input expression into a dense vector, which reduces the number of parameters and facilitates the calculation of context vectors with attention probability weights. We use global attention with Dot-based scoring function and softmax activation layer introduced by Luong et al. (2015) to assign weights to each different character. The time-distributed layer gives final prediction results with the form of vector. The most successful application of attention is the Transformer, the most advanced natural language processing network that is entirely made up of linear layers, attention mechanisms, and normalization. We adopt it as a fundamental component of NeuReduce like the previous three networks, to verify NeuReduce's expression reasoning ability.

Complex MBA form
Simplified form

MBA Dataset
NeuReduce requires a large-scale dataset to train for good performance. Unfortunately, existing MBA researchers only contributed a few MBA examples. We collected all existing specimens and found they are insufficient for training and evaluating NeuReduce. Therefore, we extend the algorithm introduced by Zhou et al. (2007) to build a large-scale, diversified MBA dataset. Our dataset includes 1,000,000 MBA samples, and each sample comprises the complex MBA form and the corresponding simple form. The complex MBA expression is guaranteed to be equivalent to the simple form by the theoretical foundation. Table 1 shows several examples in our dataset. More detailed information of the dataset is discussed as follows.
MBA Generation Approach. Zhou et al. (2007)'s work described a high-level principle for constructing MBA obfuscation rules from the truth tables and the linear equation system. However, their work did not answer practical questions when building a large scale of MBA transformation rules, such as the number of variables in one expression, the length of the MBA corpus, or the cost of generation.
Enlightening by the existing work, we design a functional toolkit for generating MBA formulas. By the theorem, a bitwise expression E n with n variables has 2 2 n different reduced Boolean expression. We first synthesize the 2 2 n distinct Boolean expressions based on the truth tables, such as ¬x ∧ y, x ⊕ y. Then we generate one identity by linear equation system. The method can ensure that the generated rules are syntactic correct and semantically equal since the solid math foundation. Moreover, we verify the equality of each rule through an SMT solver Z3 (Moura and Bjørner, 2008). One example of MBA rule generation is shown bellow, Moreover, MBA expression can be generated by linear combination of multiple MBA rules, such as Expression Format and Complexity. Each rule in the dataset is composed of a tuple in the form of (E c , E g ), in which E c represents complex MBA expression, and E g means the related simplified result as the ground truth. Given the complexity and practicability of MBA expression, the number of different variables ranges from 2 to 10. Moreover, E c and E g are presented as character strings, of which the length ranges from 3 to 100(the maximum exceeds 500).
Scale. In theory, the MBA generation method described above can produce an infinite number of MBA rules. To serve the purpose of training and evaluating NeuReduce in practice, we use it to generate 1,000,000 MBA expressions. Eyrolles (2017) has proven that 2-variable and 3-variable MBA expressions are commonly used in practical software obfuscation. Therefore, we split the dataset into three parts: 800,000 samples of 2-variable and 3-variable MBA expression, the other 200,000 multiple-variable MBA expressions are for testing the model's adaptability and generality.

Experiment Settings
In this section, we present our experimental setup in detail, including the dataset settings, peer tool baselines, evaluation metrics, and configurations of model training.

Dataset Settings
First, we are interested in exploring NeuReduce's learning and generalization ability. We uniformly sampled MBA expression from the dataset to compose two training sets, Train s and Train l . Train s includes 100,000 MBA expressions to train four different NeuReduce models, and Train l containing 1 million rules is used to test how much the performance of NeuReduce has improved with more training samples. Table 2 illustrates the statistics of the training and testing dataset. In these two training sets, we set 95% of data for training and 5% for validation. The Test dataset is separately generated rather than sampled from the training dataset, which ensure that every one test sample is different from the one in training dataset. We use the following three features to measure the complexity of an MBA expression.

Peer Tools for Comparison
We investigate and collect existing start-of-the-art MBA reduction tools: Arybo 5 , SSPAM 6 , and Syntia 7 . We download the three open source tools  from GitHub and run them on the same dataset as the comparison baselines. Arybo is a tool for applying Bit-Blast to simplify MBA expressions written in Python. SSPAM (Symbolic Simplification with Pattern Matching) is a Python tool which applies pattern matching to do simplification. Syntia generates input-output samples from the obfuscated code, and then produces a simple expression by MCTS(Monte Carlo Tree Search)-based program synthesis.

Evaluation Metrics
We propose three metrics-accuracy, complexity, and solving time-to evaluate the complexity of NeuReduce and baseline tools.
Accuracy. Accuracy means the expression E p generated by the neural network is equivalent to the ground truth E g . One case is that E p is the same as E g , which the output of model is correct. The others is that the format of E p is different from E e , we use SMT solver to check equivalence between E p and E g . Let C p be the total number of samples, C eq be the number of the same one as E g , C sq be the number of one that is semantically equivalence with E g , the definition of accuracy is shown below, Higher accuracy means the tool can generate more number of correct simplified expression. However, accuracy cannot reveal the comprehensive ability of one tool. For example, the bit explosion method can ensure that every one reduction expression is correct, but the result is hard for humans to understand.
Complexity. Another metric for evaluating MBA expression simplification is complexity or readability. For a reduced expression, the higher complexity means the lower readability for human to understand the simplification expression. We use the length of the expression (the number of characters in the string) to indicate the complexity of a expression. Shorter expression means lower complexity and higher readablity for human to understand it.
Solving Time. The last metric is to test the efficiency of a tool, the solving time of reducing a MBA expression. One MBA simplification tool is not practical due to its solving time is unbearable. We set 40 minutes as a practical timeout threshold for a simplification process. If the tool does not return one result within the period, we will label it as time out.

Training Configurations
We use the same setting to train four different neural network-based NeuReduce. Adam (Kingma and Ba, 2014) is employed as our optimizer with loss function categorical crossentropy. The initial learning rate of the model is set to 10 −2 , and we dynamically adjust it from 10 −2 to 10 −6 based on the losses of validation set. We train our models on NVIDIA Titan Xp GPUs for 1000 epochs with 1024 batch size.

Results and Analysis
We use the small-sized training set T rain s to train the four different neural networks -LSTM, GRU, Attention LSTM, and Transformer. After training, we compare the models with existing reduction  tools on the Test dataset, which contains 10, 000 MBA expressions and related simplified forms.
The evaluated results are shown in Table 3. Arybo does not output any wrong result, because Arybo uses the Bit-Blast method, which maps each variable to bit and then simplifies it. Although Arybo can ensure the correctness of simplified MBA expression, it suffers from high performance cost. The solving time of Arybo is up to 640s, and 90% of the MBA expressions can not be simplified in 40 minutes. Another problem with Arybo is that its reduction result is more complicated than the original one -the average length of reduction results is 20k, which is unreadable and unacceptable for security experts.
Since the simplification rules of complex MBA expressions are not included in SSPAM's pattern matching library, SSPAM cannot simplify 85% of MBA expressions on Test dataset.
Syntia can simplify one MBA expression in 10 seconds, but only 1576 MBA expressions can be correctly simplified by it. Syntia's output largely relies on the quality of input-output samples. Therefore, Syntia is hard to handle complex MBA expressions.
After training, NeuReduce can output grammatically correct expression in 1 second. NeuReduce can simplify at least 71% of MBA expressions on Test dataset, and its simplification result is acceptable for humans. From the table, the accuracy of Attention-based model is slightly lower than the one of GRU-based. From the aspect of expression representation, GRU-based NeuReduce uses a sparse 0/1 Matrix to encode expressions, while Attention mechanism uses dense vectors. The dense vector can reduce the number of model parameters, but it may lack useful information input to the model. On the other hand, Attention mechanism can effectively allocate a large weight to critical information when processing long texts and filter out useless information. However, each character is essential for a correct MBA expression. The experiment shows that Transformer-based model can simplify more MBA expressions, but GRU-based model can output expression faster.
To compare the output of these methods intuitively, we extract one MBA expression that can be simplified by all peer tools and NeuReduce from the Test dataset and the reduced results are shown in Table 4. Even though all methods can output a correct solution, the answers of Arybo and SS-PAM are not as concise and simple as Syntia and NeuReduce. 8 Moreover, we want to know how much the performance of NeuReduce improves when training it with more samples. We used the T rain l , as introduced in Section 5, to train the LSTM-based and GRU-based NeuReduce. The architecture and configuration of the NeuReduce are the same as described in Section 3. After 40 hours of training for each model, we evaluate them on the Test dataset. The evaluation results show that their accuracy has a great promotion than before, 96.43% accuracy for LSTM-based NeuReduce and 97.16% accuracy for GRU-based NeuReduce.

Related Work
Recent research has applied machine learning to perform mathematical reasoning. Evans et al. (2018) shows how to use tree neural network to predict one logic entails another logic. The work is different from NeuReduce since their task is to determine the implicit relationship of two propositional logic, which is a partial order, rather than to predict the equality between two expressions. Ling et al. (2017) and Kushman et al. (2014) uses neural networks to extract mathematical problems from text and output correct answers. Their work is more focused on natural language understanding of math problems, rather than purely reasoning the logical equivalence of different expressions. Saxton et al. (2019) is an extensive survey of mathematical reasoning. They provide a dataset containing a variety of mathematical samples from algebra problems to probability calculation. Their work well proves that state-of-the-art neural networks can work well in mathematical reasoning problem. However, the sample of expression reduction in their work only involves simple exponential equation reduction, which is not matched to the MBA expression.
There has also been a recent interest in solving mathematical problems. Zaremba et al. (2014) shows how to use a recurrent neural network to extract mathematical identities with a novel grammar framework. Kaiser and Sutskever (2015) uses a convolutional neural network to solve the problem of addition and multiplication with excellent generalization capabilities. Selsam et al. (2018) uses a message-passing network with a bipartite graph structure to determine satisfiability in formulas of conjunctive normal form. The other relevant re-search works are shown in Allamanis et al. (2017); Bartosz et al. (2019); Arabshahi et al. (2018).

Conclusion
Mixed Boolean-Arithmetic (MBA) transformation, using arithmetic and bitwise operations to translate expressions, have been applied in software obfuscation. This paper introduces a new method, NeuReduce, to simplify complex MBA expression by recurrent neural network. Due to the insufficient number of existing MBA expressions for training our neural network, we first extend a method to generate MBA expressions and develop a largescale MBA expression dataset, including 1,000,000 diversified complex MBA samples and their simplified expressions. Four neural network models-LSTM, GRU, Attention LSTM, Transformer-are trained and tested on the dataset. The evaluation results show that, compared with state-of-the-art tools, NeuReduce has the highest accuracy with negligible overhead. Our experiments also show that NeuReduce's performance can be further improved when training on more samples.