End-to-end NLP Pipelines in Rust

The recent progress in natural language processing research has been supported by the development of a rich open source ecosystem in Python. Libraries allowing NLP practitioners but also non-specialists to leverage state-of-the-art models have been instrumental in the democratization of this technology. The maturity of the open-source NLP ecosystem however varies between languages. This work proposes a new open-source library aimed at bringing state-of-the-art NLP to Rust. Rust is a systems programming language for which the foundations required to build machine learning applications are available but still lacks ready-to-use, end-to-end NLP libraries. The proposed library, rust-bert, implements modern language models and ready-to-use pipelines (for example translation or summarization). This allows further development by the Rust community from both NLP experts and non-specialists. It is hoped that this library will accelerate the development of the NLP ecosystem in Rust. The library is under active development and available at https://github.com/guillaume-be/rust-bert.


Introduction
Natural language processing (NLP) has undergone a rapid transformation over the last few years. Modern architectures based on the Transformers (Vaswani et al., 2017), leveraging efficiently the large amount of data available for unsupervised pretraining, have enabled significant progress for a variety of tasks including sentiment analysis, question answering, summarization or translation. These research efforts have been accompanied by the development of a rich Python ecosystem enabling a democratization of these technologies for both practitioners and users, from tokenization to deep learning architectures. The Transformers library  is an example of a library propos-ing APIs at various levels to either promote further development of NLP or their integration in higher level applications.
The adoption of these technologies in other programming languages has unfortunately not been as fast, for example in Rust. Rust (Klabnik and Nichols, 2018) is a promising modern static, strongly typed language that offers execution speeds similar to C. Its built-in memory safety design makes it an attractive alternative to C++ for the development of productive machine learning systems. Rust does not include a garbage collector but instead relies on strict ownership rules for the variables, dropping them when going out of scope. Its modern implementation of the strings data model that complies with UTF-8 standards is especially relevant to NLP applications. Finally, Rust includes a powerful utility called cargo to manage external dependencies. This allows the development of open-source ecosystems, similar to Python's PyPI (Python Packaging Authority, 2000) or Java's Maven (Miller et al., 2010).
Rust is a modern programming language for which the foundations of a machine learning ecosystem are still being built. A number of initiatives including array manipulation (rust-ndarray Team, 2011), low-level CUDA libraries and deep learning framework bindings for Tensorflow (Tensorflow Project, 2016) or Torch (Mazare, 2019) are now maturing. However, there is still a lack of end-to-end, ready to use libraries leveraging stateof-the-art NLP models. The proposed library aims at filling this gap and exposes both Transformersbased architectures to NLP practitioners in Rust and pipelines that are ready for integration in Rust-based back-ends. The proposed library, rust-bert, is available at https://github.com/ guillaume-be/rust-bert or https://crates. io/crates/rust-bert and is shared under Apache 2.0 license.

Related Work
This work leverages the rich open-source resources available in Python. Especially relevant is the Transformers library , of which large sections of the proposed Rust library were ported from. The model architectures and layers naming have been aligned with the Transformers implementation, and Rust-compatible pre-trained weights are available in Hugging Face's Model Hub (Hugging Face, 2019). The general API for the high-level and ready-to-use pipelines has been strongly inspired by the SpaCy library (Honnibal and Montani, 2017).

Architecture Design
The library exposes three main features: • Language models implementation, covering state-of-the-art architectures including for example BERT (Devlin et al., 2019) or GPT2 (Radford et al., 2019).
• Ready-to-use pipelines, combining these models with pre-and post-processing routines.
• Utilities to load external resources, including a converter from PyTorch (Paszke et al., 2019) pickled model files to a C-array format.  The language models and the pipelines are separated in different modules. Within the models, a sub-module is defined for each model (for example, BERT) with individual files for the major model components (for example, its attention mechanism). This promotes readability and modularity of the code base ( Figure 1).
An important design aspect of the library is related to the choice of abstractions. Rust does not implement the concept of classes and inheritance in a similar way to Python. Rather, data is arranged in structs that may implement associated methods in an impl block or shared behaviour via traits. As opposed to Python, layers do not inherit from a shared nn.Module because Rust requires a strict definition of the names and types of the inputs and outputs (those may differ significantly from model to model). As a consequence the registration of the model parameters in the variable store is done manually: While the model architectures have been generally ported from the Python Transformers' library, the proposed work is innovative in its handling of shared behavior. Models and configurations share capabilities using Traits. This includes for example the possibility for a model to be used as a conditional text generator by implementing the Lan-guageGenerator trait. A given model implements the trait by providing model-specific methods (e.g. prepare inputs or reorder cache). The complex text generation post-processing steps (beam search, sampling, non-repetition rules...) and the generation routine can then be readily leveraged by this model. Shared behavior is also required for the ready-touse pipelines that implement logic valid for a wide range of language models. Here the mechanism instead relies on Enums wrapping specific models in a shared abstraction. A given pipeline takes a Model Enum, a Tokenizer Enum and a Configuration Enum as inputs. The pipeline calls generic functions that are implemented by the enum (for example a forward pass). Each variant of the enum defines how the forward method is implemented. Note that this allows defining a common interface to models expecting a different set of inputs. This pattern is similar to dependency injection (while the traits are closer to inheritance) and has benefits of a greater flexibility in the interface for model loading and forward methods and reduced coupling between the model and the pipelines.

Capabilities Overview
The library exposes an API at two different levels: the language models themselves, allowing to build NLP pipelines from scratch, and end-to-end pipelines that can readily be integrated in higher level applications.
A rust implementation for a wide range of language models has been implemented, including BERT ( A large user base of NLP technologies also benefits from the availability of state of the art, end-toend pipelines requiring little to no familiarity with NLP to be integrated in higher level applications. To answer these needs of the Rust community, the following capabilities have been implemented: • Translation between 8 language pairs using either Marian (Junczys-Dowmunt et al., 2018) or T5 (Raffel et al., 2019) models.
• Summarization using a BART (Lewis et al., 2020) model trained on the CNN / Daily Mail summarization dataset (See et al., 2017).
• Question Answering using a DistilBERT  model trained on the SQuAD dataset (Rajpurkar et al., 2016).
• Sentiment Analysis using a DistilBERT model trained on the SST-2 dataset (Socher et al., 2013) • Named Entity Recognition for English, German, Spanish and Dutch trained on CoNLL03 (Tjong Kim Sang and De Meulder, 2003) and CoNLL02 (Tjong Kim Sang, 2002) datasets These pipelines can be created and used in a few lines of code without prior knowledge in NLP. While the implementation of the language models is a prerequisite, the availability of powerful end-to-end pipelines is key to a broader adoption of NLP technology in Rust. These pipelines can easily be integrated with server back-ends running Rust with queuing and batching of incoming requests (Walsh, 2020).

Benchmarks
This library was developed with the primary goal of making state of the art NLP capabilities available to the Rust community rather than speeding up inference. Nevertheless, Rust is a high performance language with execution speeds matching C or C++. Efficient predictions using NLP systems has become a key subject of research and engineering development over the past few months. Several methods have been investigated to improve the model predictions performance, including for example pruning, quantization and Huffman Coding ( (Han et al., 2016), (Shen et al., 2020)), distillation , graph optimizations and layer fusing (Nvidia, 2020) or optimized runtimes such as ONNX . The high performance of the state of the art models usually comes with a significant computational cost.
It should be noted that the proposed library is based on bindings (Mazare, 2019) to LibTorch (Paszke et al., 2019), and therefore limited benefits can be expected from the tensor operations. These are executed in the CUDA layer that is effectively shared with the Python-based models. The following investigates if these high performance features of the language translate into benefits for the proposed NLP pipelines.
Benchmarks between Python and Rust are shown in Figure 2 using a Turing RTX2070 GPU with a AMD 2700X CPU. For all experiments the average time relative to Python is reported with the standard deviation. For all prediction tasks, the Transformers  library (v3.2.0) is used as a reference. All experiments are run for 10 iterations, with various number of samples (provided in brackets). For reference, the Python absolute execution time per iteration is provided.
The loading benchmarks represent the average time required to load models into the GPU buffer. Significant benefits can be observed for Rust. This is probably caused by the simpler serialization format based on C-arrays for Rust, and may be advantageous for event-driven applications loading models on a per-request basis (short warm-up time).
The forward pass results vary between applications. As expected, pipelines with very simple pre-and post-processing steps offer virtually identical performance (for example sentiment analysis).
Significant benefits can be observed for question answering, coming entirely from the tokenization process (At the time this document was prepared, the Transformers'  question answering pipeline did not leverage Rust-based tokenizers yet). The performance of pipelines involving complex post-processing steps (text generation with sampling and beam search) can show significant benefits. Marian-based translation models (Tiedemann and Thottingal, 2020) exhibit a 40% speedup (in line with the native C++ implementation (Junczys-Dowmunt et al., 2018)). The T5 implementation is faster for small effective batch sizes (with a beam size of 6) but slower for larger batches, indicating optimization potential remains. In general it was observed that the actual model forward pass (tensor operations) is comparable albeit slightly slower in Rust than in Python. A last experiment (large matrix multiplication) shows the Rust LibTorch bindings seem to be 1 to 2% slower than the PyTorch equivalent.

Conclusion
Rust is a promising language for the development of NLP systems. Its concurrency capabilities, memory safety features and modern strings data model make it a good alternative to C++ for production systems. While evolving quickly, the Rust NLP open-source ecosystem still lags behind Python rich set of libraries. Complementing the availability of high performance tokenizers, rust-bert makes state-of-the-art language models and end-to-end NLP pipelines available to the Rust community.

Acknowledgments
The list of contributors to the rust-bert project is available on the project repository.