NeuroNER: an easy-to-use program for named-entity recognition based on neural networks

Named-entity recognition (NER) aims at identifying entities of interest in a text. Artificial neural networks (ANNs) have recently been shown to outperform existing NER systems. However, ANNs remain challenging to use for non-expert users. In this paper, we present NeuroNER, an easy-to-use named-entity recognition tool based on ANNs. Users can annotate entities using a graphical web-based user interface (BRAT): the annotations are then used to train an ANN, which in turn predict entities’ locations and categories in new texts. NeuroNER makes this annotation-training-prediction flow smooth and accessible to anyone.


Introduction
Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities can be used in various downstream applications such as patient note de-identification and information extraction systems. They can also be used as features for machine learning systems for other natural language processing tasks.
Early systems for NER relied on rules defined by humans. Rule-based systems are timeconsuming to develop, and cannot be easily transferred to new types of texts or entities. To address these issues, researchers have developed machinelearning-based algorithms for NER, using a variety of learning approaches, such as fully supervised learning, semi-supervised learning, unsupervised learning, and active learning. NeuroNER is based on a fully supervised learning algorithm, which is the most studied approach (Nadeau and Sekine, 2007). * These authors contributed equally to this work.
Fully supervised approaches to NER include support vector machines (SVM) (Asahara and Matsumoto, 2003), maximum entropy models (Borthwick et al., 1998), decision trees (Sekine et al., 1998) as well as sequential tagging methods such as hidden Markov models (Bikel et al., 1997), Markov maximum entropy models (Kumar and Bhattacharyya, 2006), and conditional random fields (CRFs) (McCallum and Li, 2003;Tsai et al., 2006;Benajiba and Rosso, 2008;Filannino et al., 2013). Similar to rule-based systems, these approaches rely on handcrafted features, which are challenging and time-consuming to develop and may not generalize well to new datasets.
More recently, artificial neural networks (ANNs) have been shown to outperform other supervised algorithms for NER (Collobert et al., 2011;Lample et al., 2016;Labeau et al., 2015;. The effectiveness of ANNs can be attributed to their ability to learn effective features jointly with model parameters directly from the training dataset, instead of relying on handcrafted features developed from a specific dataset. However, ANNs remain challenging to use for non-expert users. Contributions NeuroNER makes state-of-theart named-entity recognition based on ANN available to anyone, by focusing on usability. To enable users to create or modify annotations for a new or existing corpus, NeuroNER interfaces with the web-based annotation program BRAT (Stenetorp et al., 2012). NeuroNER makes the annotationtraining-prediction flow smooth and accessible to anyone, while leveraging the state-of-the-art prediction capabilities of ANNs. NeuroNER is open source and freely available online 1 .

Related Work
Existing publicly available NER systems geared toward non-experts do not use ANNs.
Furthermore, in many cases, the NER systems assume that the user already has an annotated corpus formatted in a specific data format. As a result, users often have to connect their annotation tool with the NER systems by reformatting annotated data, which can be time-consuming and errorprone. Moreover, if users want to manually improve the annotations predicted by the NER system (e.g., if they use the NER system to accelerate the human annotations), they have to perform additional data conversion. NeuroNER streamlines this process by incorporating BRAT, a widelyused and easy-to-use annotation tool.

System Description
NeuroNER comprises two main components: an NER engine and an interface with BRAT. Neu-roNER also comes with real-time monitoring tools for training, and pre-trained models that can be loaded to the NER engine in case the user does not have access to labelled training data. Figure 1 presents an overview of the system.

NER engine
The NER engine takes as input three sets of data with gold labels: the training set, the validation set, and the test set. Additionally, it can also take as input the deployment set, which refers to any new text without gold labels that the user wishes to label. The files that comprise each set of data should be in the same format as used for the annotation tool BRAT or the CoNLL-2003 NER shared task dataset (Tjong Kim Sang and De Meulder, 2003), and organized in the corresponding folder.
The NER engine's ANN contains three layers: • Character-enhanced token-embedding layer, • Label prediction layer, • Label sequence optimization layer.
The character-enhanced token-embedding layer maps each token to a vector representation. The sequence of vector representations corresponding to a sequence of tokens is then input to label prediction layer, which outputs the sequence of vectors containing the probability of each label for each corresponding token. Lastly, the label sequence optimization layer outputs the most likely sequence of predicted labels based on the sequence of probability vectors from the previous layer. All layers are learned jointly. The model architecture is detailed in .
The ANN as well as the training process have several hyperparameters such as character embedding dimension, character-based tokenembedding LSTM dimension, token embedding dimension, and dropout probability. All hyperparameters may be specified in a configuration file that is human-readable, so that the user does not have to dive into any code. Listing 1 presents an excerpt of the configuration file.   Figure 1: NeuroNER system overview. In the NeuroNER engine, the training set is used to train the parameters of the ANN, and the validation set is used to determine when to stop training. The user can monitor the training process in real time via the learning curve and TensorBoard. To evaluate the trained ANN, the labels are predicted for the test set: the performance metrics can be calculated and plotted by comparing the predicted labels with the gold labels. The evaluation can be done at the same time as the training if the test set is provided along with the training and validation sets, or separately after the training or using a pre-trained model. Lastly, the NeuroNER engine can label the deployment set, i.e. any new text without gold labels.

Real-time monitoring for training
As training an ANN may take many hours, or even a few days on very large datasets, NeuroNER provides the user with real-time feedback during the training for monitoring purpose. Feedback is given through two different means: plots generated by NeuroNER, and TensorBoard.
Plots NeuroNER generates several plots showing the training progress and outcome at each epoch. Plots include the evolution of the overall F1-score over time, confusion matrices visualizing the number of correct versus incorrect predictions for each class, and classification reports showing the F1-score, precision and recall for each class.
TensorBoard As NeuroNER is based on Ten-sorFlow , it leverages the functionalities of Tensor-Board. TensorBoard is a suite of web applications for inspecting and understanding TensorFlow runs and graphs. It allows to view in real time the performances achieved by the ANN being trained. Moreover, since it is web-based, these performances can be conveniently shared with anyone remotely. Lastly, since graphs generated by Ten-sorBoard are interactive, the user may gain further insights on the ANN performances.

Pre-trained models
Some users may prefer not to train any ANN model, either due to time constraints or unavailable gold labels. For example, if the user wants to tag protected health information, they might not be able to have access to a labeled identifiable dataset.
To address this need, NeuroNER provides a set of pre-trained models. Users are encouraged to contribute by uploading their own trained models. NeuroNER also comes with several pre-trained token embeddings, either with word2vec (Mikolov et al., 2013a,b) or GloVe (Pennington et al., 2014), which the NeuroNER engine can load easily once specified in the configuration file.

Annotations
NeuroNER is designed to smoothly integrate with the freely available web-based annotation tool BRAT, so that non-expert users may create or improve annotations. Specifically, NeuroNER addresses two main use cases: • creating new annotations from scratch, e.g. if the goal is to annotate a dataset for which no gold label is available, • improving the annotations of an already labeled dataset: the annotations may have been done by another human or by a previous run of NeuroNER.
In the latter case, the user may use NeuroNER interactively, by iterating between manually improving the annotations and running the NeuroNER engine with the new annotations to obtain more accurate annotations. NeuroNER can take as input datasets in the BRAT format, and outputs BRAT-formatted predictions, which makes it easy to start training directly from the annotations as well as visualize and analyze the predictions. We chose BRAT for two main reasons: it is easy to use, and it can be deployed as a web application, which allows crowdsourcing. As a result, the user may quickly gather a vast amount of annotations by using crowdsourcing marketplaces such as Amazon Mechanical Turk (Buhrmester et al., 2011) and Crowd-Flower (Finin et al., 2010).
One limitation of NeuroNER is that it does not allow overlapping annotations in the BRAT format. However, NeuroNER is not restricted to named-entity recognition: it may be used for any sequence labeling, such as part-of-speech tagging and chunking.

System requirements
NeuroNER runs on Linux, Mac OS X, and Microsoft Windows. It requires Python 3.5, Tensor-Flow 1.0 (Abadi et al., 2016), scikit-learn (Pedregosa et al., 2011), and BRAT. A setup script is provided to make the installation straightforward. It can use the GPU if available, and the number of CPU threads and GPUs to use can be specified in the configuration file.

Performances
To assess the quality of NeuroNER's predictions, we use two publicly and freely available datasets for named-entity recognition: CoNLL 2003 and

Model
CoNLL 2003 (Stubbs et al., 2015) was released as part of the 2014 i2b2/UTHealth shared task Track 1. It is the largest publicly available dataset for de-identification, which is a form of named-entity recognition where the entities are protected health information such as patients' names and patients' phone numbers. 22 systems were submitted for this shared task. Table 1 compares NeuroNER with state-of-theart systems on CoNLL 2003 and i2b2 2014. Although the hyperparameters of NeuroNER were not optimized for these datasets (the default hyperparameters were used), the performances of Neu-roNER are on par with the state-of-the-art systems.

Conclusions
In this article we have presented NeuroNER, an ANN-based NER tool that is accessible to nonexpert users and yields state-of-the-art results. Addressing the need of many users who want to create or improve annotations, NeuroNER smoothly integrates with the web-based annotation tool BRAT.