SeNsER: Learning Cross-Building Sensor Metadata Tagger

Sensor metadata tagging, akin to the named entity recognition task, provides key contextual information (e.g., measurement type and location) about sensors for running smart building applications. Unfortunately, sensor metadata in different buildings often follows distinct naming conventions. Therefore, learning a tagger currently requires extensive annotations on a per building basis. In this work, we propose a novel framework, SeNsER, which learns a sensor metadata tagger for a new building based on its raw metadata and some existing fully annotated building. It leverages the commonality between different buildings: At the character level, it employs bidirectional neural language models to capture the shared underlying patterns between two buildings and thus regularizes the feature learning process; At the word level, it leverages as features the k-mers existing in the fully annotated building. During inference, we further incorporate the information obtained from sources such as Wikipedia as prior knowledge. As a result, SeNsER shows promising results in extensive experiments on multiple real-world buildings.


Introduction
Sensor metadata tagging aims at understanding the context (e.g., sensor function and location) of a sensor from its name, which is essential to any smart building technologies (Wang et al., 2018). As illustrated in Figure 1, sensor metadata is typically a concatenation of esoteric abbreviations, each encoding specific information about the sensor, including what they measure/control, where they are located, how they are related to each other, etc. For example, a sensor name SODA1R430 ART conveys: the building name (SOD), air conditioning equipment ID (A1), room ID (R430), and the measurement type, which is area room temperature (ART). Running any application would require such Sensor name tagging aims to partition a sensor name into segments (shown in color) that encode key contextual information about sensors. Different buildings adopt distinct vocabularies and naming conventions. contextual information; for example, to detect overcooled rooms, one needs the temperature and the target temperature set for each room. Currently, learning a metadata tagger for a new building in practice requires extensive human annotations, thus remaining a bottleneck in deploying smart building techniques widely and efficiently (Wang et al., 2018). This is due to the fact that sensor metadata is curated by building-specific vendors, and that their naming conventions vary drastically across buildings, as shown in Figure 1. Anecdotally, annotating one sensor name may cost several hundred dollars, and it takes weeks to do so for one typical building with thousands of sensing and control points. This manual approach is clearly neither economical nor scalable, and it calls for an automated solution.
As there usually exist buildings that are already tagged, leveraging this information could potentially expedite the tagging process in a new building. Thus, in this paper, we seek to answer the following question: Can we learn a sensor metadata tagger for a new building based on its raw metadata and some existing fully annotated building(s)?
Our problem faces unique challenges, despite its similarity with named entity recognition (NER) (Tjong Kim Sang and De Meulder, 2003). First, lacking pre-processing tools (e.g., tokenizer) for the building domain, we have only raw character sequences as input to work with (rather than "word" sequences as input), and thus state-of-the-art NER models (Akbik et al., 2018;Peters et al., 2018;Devlin et al., 2018) do not apply. The choice of taggers is therefore confined to only those working at the character level. Secondly, the heterogeneity of sensor names in source and target buildings hurts the performance of existing character-level taggers (e.g., Char-LSTM-CRF in Figure 2), resulting in unsatisfactory results. Last, one building typically has "only" a few thousand sensor names, and each sensor name has fewer than two dozen characters; however, there are more than 100 types for tagging.
Recognizing these challenges, we propose a novel framework -SeNsER. At the character level, together with the tagging objective function on the source building, we train bidirectional neural language models using sensor names from both source and target buildings; we expect such co-training to regularize the feature learning process for our tagger so that the model can be better applied to the target building. In addition, we propose to learn k-mer (i.e., a substring of length-k) representations of the source building and align them with those of the target building, as there exist common character patterns across buildings similar to "words" in human language. For example, "T" or "temp" would almost always appear in sensors related to temperature. These aligned k-mers complement the language model as "word"-level information, namely, what phrases look like in sensor names. Moreover, during inference, because of a strong connection between raw names and entity types, we incorporate information (e.g., what an abbreviate stands for) obtained from resources such as Wikipedia as prior knowledge to narrow the gap between the limited input data and a large number of target classes.
In summary, our major contributions are: • We study an important problem of exploiting existing annotated buildings to help train a sensor metadata tagger for a new building. • We propose a novel framework, SeNsER, which leverages neural language models to regularize the feature learning process and utilizes k-mers from the source building to help annotate the target building, aided by prior knowledge extracted from sources such as Wikipedia.
• We conduct extensive experiments on real buildings consisting of thousands of sensor names. SeNsER achieves over 79% and 67% F 1 in chunking and tagging, respectively -a notable 13-point improvement in tagging over the best compared method. Reproducibility. We release our code and datasets on GitHub 1 .

Related Work
We review the literature from two fields, namely, sensor name tagging and named entity recognition. Sensor Metadata Tagging. The problem of tagging sensor metadata has seen increasing interest from the smart building and sensing communities, mainly following the active learning paradigm (Settles, 2009) to reduce manual labeling effort. These methods iteratively select "representative" metadata examples for a human to annotate and progressively craft custom regular expressions (Bhattacharya et al., 2015) or construct classical learning models such as logistic regression (Hong et al., 2015b;Ma et al., 2020) and conditional random fields (Balaji et al., 2015;Koh et al., 2018;Lin et al., 2019), in order to tag the sensor names. Despite the promising results, all these methods rely on building-specific domain knowledge and human effort, which often do not generalize across buildings.
Another attempt based on transfer learning (Hong et al., 2015a) leverages the information from existing buildings to classify sensor measurement type only, which is a sub-problem of sensor tagging. It is primarily built upon sensory timeseries data and therefore cannot generalize to other contextual information, such as the location and relationship with others. By contrast, we aim to understand all the information in the metadata. Named Entity Recognition (NER). Our sensor metadata tagging problem can be viewed as a kind of NER task, while our tagging happens per character. Most of, if not all, NER models consume words as the basic unit and detect entity boundaries as a subset of word boundaries. However, in our problem, due to the lack of pre-processing tools, the input only contains raw character sequences. Such difference makes most of the recent neural NER models (Peters et al., 2018;Devlin et al., 2018;Akbik et al., 2018;Huang et al., 2015;Lample et al.,  (1) co-training of language models using both source and target buildings to guide feature learning, (2) k-mer-based "word"-level information to assist with alignment, and (3) prior knowledge obtained from external sources such as Wikipedia to help inference. 2016; Ma and Hovy, 2016;Liu et al., 2018b;Kuru et al., 2016) not directly applicable. After sifting through compatible modules from these models, the best applicable existing neural NER model becomes Char-LSTM-CRF, as we shall describe in Section 4. It performs well under the intra-building setting but poorly under our cross-building setting.
Our idea of introducing language models as regularization is inspired by LM-LSTM-CRF (Liu et al., 2018b). LM-LSTM-CRF imposes a language model objective on the NER's training set as additional guidance for feature extraction. In this paper, we further propose to train the language models using both source and target buildings, so as to better generalize the features learned from the source building to the target building.

Problem Formulation
In this paper, we study the cross-building metadata tagging problem. The input involves two buildings: (1) a fully annotated source building, and (2) a new target building with no annotation. The metadata of a sensor is a sequence of characters, denoted as is the i-th character and M is the length of the sequence. We denote the annotation of token x i as y i . Similar to NER, the annotations follow the BIOES labeling scheme (Ratinov and Roth, 2009), but at the character level. We define a segment of sensor name to be a substring expressing certain context (e.g., building name, room, measurement type, etc) about the sensor, as illustrated in Figure 1. Given a segment in the sensor name, its beginning, middle, and ending characters are labeled as B-type, I-type, and E-type, respectively. Segments with only one character are labeled as S-, and characters not belonging to any segment will be marked as O. All BIES labels are followed by a particular class. It is noteworthy that there are more than 100 classes for the different segments in sensor names, such as building name, room, heating (a sensor type), etc. Our goal is to learn a tagger for the target building, which can partition the sensor name into correct segments and classify them into the right classes.

Char-LSTM-CRF
As mentioned earlier, the best applicable existing neural tagging model to our problem is Char-LSTM-CRF, which was proposed as a strong baseline in (Liu et al., 2018a). Since SeNsER builds upon Char-LSTM-CRF, we briefly revisit this model to be self-contained.
As illustrated by the top part of Figure 2, Char-LSTM-CRF takes as input a character sequence X = (x 1 , x 2 , . . . , x M ), and applies bidirectional LSTMs to every character's embedding, obtaining f i and r i for the i-th character. Then, it gets the contextualized representation z i of the i-th character by concatenating the two embedding vectors: Finally, it uses a Conditional Random Field (CRF) layer (Lafferty et al., 2001) to capture the label dependency, which defines the probability of generating the label sequence Y = (y 1 , y 2 , . . . , y M ), namely, where Y(Z) is the set of all possible label sequences, φ(y j−1 , y j , z j ) = exp(W y j z j + b y j−1 ,y j ), and W y j and b y j−1 ,y j are the weight and bias parameters in the CRF layer, respectively. During training, we maximize the likelihood of generating the ground-truth label sequences, hence the following loss function: where Y i is the label sequence and Z i is the embedding for the i-th training example (i.e., sensor name). For inference, we use the Viterbi algorithm (Viterbi, 1967) to decode the best explanation given Z.

Our SeNsER Framework
As shown in Figure 2, our SeNsER framework builds upon Char-LSTM-CRF and further enhances it with (1) cross-building language models as regularization, (2) k-mer alignment as "word"-level complement, and (3) tailored decoding using a domain-specific dictionary as prior knowledge.

Language Models as Regularization
To address the heterogeneity between the source and target buildings, we propose to co-train the character-level neural language models (Char-LMs) on the raw sensor names from both buildings in addition to the tagging objective. Here, "co-training" means that the LSTM modules are shared between our bidirectional Char-LMs and the Char-LSTM-CRF tagging model, and that their parameters will be updated by two objectives together. We shall note that we only incorporate the raw sensor names, but not their labels for a target building in training the language model (Char-LMs). This way, the LSTM modules will also be regularized by the raw sensor names in the target building, significantly improving generalizability when we apply the trained tagger to the target building.
The forward Char-LM defines the generative probability of a character sequence as Denoting the representation after reading x 1 , . . . , x i in the forward Char-LM as f LM i , P f w (x i |x 1 , . . . , x i−1 ) can be written as We apply softmax to f LM i−1 to obtain this probability. Inspired by previous work (Liu et al., 2018b), we adopt a highway layer to further introduce nonlin- where is element-wise product, g() is a nonlinear transformation such as ReLU in our experiments, W H and b H are two parameters in the highway layer, and t = σ(W H f i + b T ) is called transform gate and (1 − t) is called carry gate. Here σ() is some nonlinear function such as sigmoid.
Similarly, one can define r LM i and P bw (x i |r LM i+1 ). Adding the two directions together, the loss function of the language model part becomes: The contextualized representation z i of character x i is also revised accordingly. The f i and r i are passed through two high-way units and become f H i and r H i , respectively. Now, after enabling this co-training, it becomes: Joint Optimization. We jointly optimize the Char-LSTM-CRF and Char-LM via where λ ∈ [0, 1] is a weight balancing the effect of Char-LSTM-CRF and Char-LM on training. To ensure the model is not overfitted in the source building, in practice, we always start with λ = 1 and linearly decrease it as the training proceeds.

K-Mers as "Word"-level Complement
So far, SeNsER is solely built upon character-level information. We observe that some k-mers (i.e., substrings of length-k) (Compeau et al., 2011) express the same meaning regardless of buildings, e.g., "T", "tmp", or "temp" almost always appear in sensor names related to temperature. Therefore, we propose to leverage such meaningful k-mers to complement the representation produced by the language model (i.e., z i defined in Eq. (1)) as "word"level information. Specifically, in the source building, we obtain a k-mer vocabulary using the sensor names and their annotations -every ground-truth segment in a sensor name becomes a k-mer. We then apply word embedding techniques (e.g. word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014)) to learn representations of these k-mers. During training in the source building, we use its annotations to align every character with the k-mer it appears in. During inference in the target building, we match a raw sensor name string to the k-mers in the vocabulary by trying to cover as many characters as possible with the most informative k-mer combinations, where each k-mer is scored by its inverse "document" frequency (IDF); in our context, a "document" is a sensor name. Given a sensor name, for its character x i , after aligning it with a k-mer we incorporate this k-mer representation k i into z i , i.e., z i = f H i ; r H i ; k i . Through a dynamic programming algorithm, we can partition a raw string in the target building into pieces and maximize the total IDF of each k-mer. Characters that fail to be matched in this way will be matched to "<unk>".

Inference with Domain Knowledge
In order to accommodate more than 100 tagging types, given the fact that a certain segment of a sensor name indicates its measurement type, we propose to develop a domain-specific abbreviationphrase matching model and employ it as additional prior during CRF decoding. We next discuss (1) how to build this matching model, and (2) how to incorporate it into the CRF layer. Abbreviation-Phrase Matching. To get the most likely abbreviations of the type phrases, we propose a new character-level text similarity model based on Siamese Network (Bromley et al., 1993). The structure of the similarity model is depicted in Figure 3. Specifically, type phrases and abbreviations are embedded into a common latent space considering both the characters and their absolute positions. Then, we apply two 1D convolutional neural networks (CNNs) to encode the context. To capture the mutual information between two sentences, we adopt the co-attention idea (Ye and Ling, 2019) and apply it at the character level. After max-pooling, we get the final representations for the type phrase and abbreviation. We feed the concatenation of these two representations into a Multi-Layer Perceptron (MLP) with nonlinear activation to get a similarity score.
In order to train this model, we scraped a domainspecific abbreviation dataset from Wikipedia and technical documents in the building domain, which contains 574 abbreviations and 737 full names. We split the dataset into train, validation, and test sets with a 80%-10%-10% ratio of abbreviations, and 1:1 positive-negative pairs are sampled during training and testing. Following a prior work on learning text similarity (Neculoiu et al., 2016), we adopt the contrastive loss function for training. As binary classification evaluation (0.5 as a threshold), our trained model can on average achieve 98% test accuracy in matching an abbreviation to the full phrase, demonstrating its efficacy. Finally, we train a model on the entire abbreviation dataset we scraped and then obtain a set of potential tagging labels for each abbreviation with corresponding similarity scores.
We release our code and dataset for abbreviationphrase matching on Github 2 . Additional Prior in CRF Decoding. In order to assign each character x i a similarity score, we conduct a substring search around it to assign it to an associated abbreviation. Specifically, we check all the substrings within ±2 positions around x i (inclusive), i.e., all substrings of x [i−2:i+2] , and check the similarity between these substrings and different tagging labels. The longest and most similar substring match will be assigned as the associated abbreviation for x i . The similarity scores between this abbreviation and tagging labels are then propagated to sim(x i , y i ). We incorporate this similarity into the CRF decoding stage as follows: The Viterbi algorithm (Viterbi, 1967) still applies without any computational overhead.

Empirical Evaluation
In this section, we empirically evaluate SeNsER and compared models on real-world buildings. We first introduce the datasets and experimental settings. Then, we present chunking and tagging results. Finally, we present some case studies about kmer embedding and typical mistakes of our model.

Datasets
To evaluate SeNsER, we collect the sensor names from three office buildings on two different campuses, and the building names are anonymized as A, B, and C. The ground-truth labels of sensor names are created by the building vendors, which we subsequently convert to the character-level IOBES labels. The details of each building are summarized in Table 1. Buildings A and B are on the same campus contracted with the same vendor, thus exhibiting similar naming conventions; yet their sensor names still contain unique tags due to different sensors and equipment deployed, and variations also exist even in the "codes" used for the same type of sensors, as illustrated in Table 1. Since it is impossible for the model to predict for classes out of the training set, we thus only keep the overlapping classes between a pair of source and target buildings in evaluation. In other words, given a pair of buildings, if a class exists only in either of the two buildings, we will mark it as an "other" class. As a result, a total of 70 classes, consisting of 69 regular classes and one "other" class, remain in our experiments between buildings A and B.
Building C is located on a second campus and is commissioned by a different vendor than A and B's; we use it to examine the generalizability of our method. There are only 4 classes in building C that appear in either building A or B, so there is not much difference between chunking and tagging. Therefore, we only evaluate chunking when training models based on building A and B and testing them on building C.

Metrics and Compared Methods
We evaluate the performance of SeNsER with regard to chunking and tagging using the precision, recall, and F 1 scores, similar to NER tasks.
Specifically, for each sensor name, we get a few predicted triplets, i.e., (position begin , position end , category), and only when both the position and category exactly match the ground-truth annotations does it count as a correct extraction. For chunking, we only consider the positions. Mathematically, we compare two sets of triplets, i.e., the predicted set and the ground-truth set. True positive is the intersection between the two sets. The remaining triplets in the predicted and ground-truth sets are considered as false positive and false negative, respectively.
We compare SeNsER with the following methods as baselines: • CRF. As the most straightforward baseline, we compare SeNsER with a standard CRF which is trained on the source building and applied to the target building. Particularly, 6 features are used in total, including is x i a digit, is x i a letter, is x i±1 a digit, is x i±1 a letter. • Char-LSTM-CRF. As we described in Section 4, it first applies bidirectional LSTMs to every character's embedding and further feeds it into the CRF layer, and finally outputs labels for each character. As a sanity check, we also examine two methods: • Delimiter. Sensor names usually contain delimiters such as "-" and ".". Therefore, as a straightforward option for chunking, we segment sensor names at the positions of delimiter and then calculate the precision, recall, and F 1 . • Dictionary (Dict). For this method, we use the dictionary created in §5.3 and decode the type of label using the Viterbi algorithm. We also evaluate ablations of our model. SeNsER-Dict only keeps the use of the dictionary comprised of abbreviation-phrase pairs during inference by removing the k-mer alignments from SeNsER, and likewise, SeNsER-Kmer keeps only k-mer alignments by removing the use of the dictionary. We shall note that, technically, Char-LSTM-CRF is also the ablated version of SeNsER with none of the proposed components used, namely, co-training, k-mers matching, and dictionary as prior.
We only use Char-LSTM-CRF as our NER  Table 2: Cross-building tagging and chunking performance (%). "X → Y" denotes to train a tagger on building X and test on building Y. All results are averaged over 5 runs. We omit the standard deviations as they are all ≤ 2%. model because we study char-level tagging with limited training data. Other models (e.g., BERT) typically require large-scale corpus for pretraining and word-level input, which are not available in the building domain we study. Regular expression (regex) could be a solution to our problem, but they need to be exhaustive in covering all the possible patterns, which requires deep building-specific domain knowledge and significant manual effort at great costs to create on a per-building basis. Moreover, regex for tagging patterns cannot transfer across buildings, which is our goal in this work. Therefore, regex is neither an economical nor scalable solution.
For a fair comparison, all baselines use the same amount of human labels. Because of the considerable amount of human effort needed for regexes, we do not include it for comparison in this work. The Delimiter method can be viewed as a special kind of regexes, with a minimum amount of human effort.

Experimental Setup
During training, 80% of the sensor names in the source building are used as the training set and the remaining 20% is used as a development set; testing is performed on the sensor names in the target building. Mini-batch stochastic gradient descent with momentum is used for training all the neural models. For all three models, the batch size, momentum, and learning rate are set to 10, 0.9 and η t = η 0 1+ρt , where η 0 is the initial learning rate and ρ = 0.05 is the decay ratio. We apply dropout with a ratio of 0.5. Models are trained for a maximum of 200 epochs, and early stop happens when the current best F 1 score on development set does not increase for 15 epochs.
The dimension of randomly-initialized character embedding and character-level LSTM state is set to 30 and 150, respectively. For word embedding of k-Mers, we apply word2vec (Mikolov et al., 2013) on these "words" and the dimension of embedding is set to 30. Other word embedding techniques (e.g., Glove (Pennington et al., 2014)) also work for this part. In language models, we set the dimension of LSTM state to 300. For the parameter λ, which balances the effect of Char-LM and Char-LSTM-CRF during training, it is initialized to 1 and decreases along the training process until it reaches a particular minimum value. This way, during the multitask training of Char-LM and Char-LSTM-CRF, the model in the early epochs will focus more on learning an effective LM for understanding the sequence characteristics, which benefits the learning of Char-LSTM-CRF in the later stage of training and transfer learning.

Cross-Building Performance
We summarize the cross-building chunking and tagging performance in Table 2. In general, our experimental results suggest that transferring from A to B is better than B to A. The main reason is that building A contains more types of metadata labels (i.e., 157) than building B (i.e., 134). SeNsER would be more effective if trained on a dataset with various sensors and applied to a dataset with relatively fewer types of sensors. Besides, we observe that the majority of correct chunks obtained by the delimiter method are the building names, which appear almost in all the metadata sequences at the beginning, followed by a delimiter. However, room or floor segments usually contain delimiters such as " " and "-", and thus will be incorrectly segmented by this method. As a sanity check, we also directly apply the dictionary built upon online documents such as Wikipedia, which consists of abbreviation codes used in the building domain. As the dictionary is not exhaustive, solely matching based on the abbreviations in the dictionary can only uncover a small fraction of segments, hence the limited chunking and tagging results.
As a common solution to NER, CRF with handcrafted features achieves decent chunking results (58.78% F 1 on average), yet struggles with tagging (43.70% F 1 on average), since the "codes" used in building A and B vary. To demonstrate the efficacy of the proposed k-mer-based alignments and dictionary as prior knowledge, we also incorporate them into the standard CRF as CRF-Kmer and CRF-Dict. As we see from the results, both can enhance a standard CRF in chunking and tagging.
Char-LSTM-CRF significantly improves over CRF by learning the features to represent the generative pattern in sensor names, achieving 78.20% and 54.52% on average in F 1 for chunking and tagging, respectively. Compared to Char-LSTM-CRF, SeNsER-Kmer additionally employs the kmer-based alignment procedure to help identify segments in sensor names in the target building. We see that it improves tagging by 6.83 points in F 1 on average. On another front, SeNsER-Dict incorporates as prior knowledge during inference the dictionary of abbreviations-phrases pairs. Similar to what we have observed for the case of CRF, this knowledge clearly benefits both chunking and tagging on the two buildings. Finally, employing both the k-mer alignments and dictionary of abbreviation-phrase pairs, SeNsER considerably outperforms the best baseline by 8.61 points in chunking on building B, and by an average 12.65 points in tagging on both buildings.
The superior performance of SeNsER confirms the synergy between language models for capturing contextual information and k-mers for substring alignments in different buildings as well as a dictionary as prior knowledge (especially with a limited vocabulary).

Case Study
Similar K-mers. K-mers have demonstrated their power in recognizing the class of name segments, i.e., tagging. Here, we present a case study about the learned k-mer embedding results. It will provide some insights into the usefulness of our k-mer alignment. In Table 3, we present three random kmers from our vocabulary and retrieve their top-5 similar words according to cosine similarity. The results are reasonable, containing semantically correlated k-mers. For example, heating equipment commonly pairs with the corresponding cooling equipment to condition a room/zone, and involves reheating, measurements of supply airflow, and velocity pressure. Typical Mistakes. The most common mistakes in our inference occur in the building name segments. Our SeNsER can effectively learn the common features of different buildings such as temperature and equipment operating status. However, the building names vary a lot in different buildings and share no similar features; for example, recall the examples in Table 1, the building name phrases are EBU3b, ap&m, and SDH. Without human input, it is difficult for our model to correctly infer the meaning of such segments. However, as a possible future direction to pursue, based on the frequency, we could infer with a high probability that a segment is likely to be the building name, and therefore query a human for a one-time input to label all such segments.

Generalizability
We also examine the generalizability of our method, i.e., how it would perform when applied to a building with a completely distinct vocabulary and naming convention. In particular, we train a tagger using the sensor names and annotations in building A and B and apply it to building C.
Note that, this is an extremely difficult task: Building A and B still share similar naming conventions (recall the examples in Table 1), despite moderately varied vocabulary; however, by contrast, building C almost completely differs in the naming convention and vocabulary. For example, "room temperature" is denoted as "ZN.T" in A and B but as "RMT" in C; in addition, due to the different vendors used, the types of equipment installed also vary significantly in Building C, compared to Building A and B. Due to the disparate vocabularies, tagging Building C based on the information in A and B is nearly impossible, and we thus only take the prefixes of the tags produced by the tagger (i.e., B-, I-prefixes) to evaluate the chunking results.
The results are summarized in Table 4. Delimiter-based chunking method can achieve 35.29% in F 1 , with the hits mainly being the first segments of the metadata string denoting building names, which do not vary in the building. It is noteworthy that Char-LSTM-CRF performs worse than CRF, which indicates that learning solely based on data from buildings A and B may even hurt the performance on building C. SeNsER is able to score a 78.18% F 1 , best among all, in spite of the distinction between the source and target. Upon closer inspection, due to the employed Char-LMs, SeNsER can recognize the segments for sensor types and room IDs correctly.

Conclusions and Future Work
In this paper, we study the problem of automated cross-building sensor metadata tagging, a key to enabling any smart building applications. Capitalizing on the intuition that sensor names are created following some underlying rule, though varying across buildings, we design SeNsER. SeNsER builds upon Char-LSTM-CRF and guides the sensor name feature learning using both source and target buildings, well preparing them for interpreting the metadata in the target building. We further leverage a k-mer-based matching procedure to provide "word"-level information, as well as a dictionary comprised of prior knowledge about sensor names, to boost the tagging performance. Promising experimental results demonstrate the synergy among neural language models, k-mersbased alignments, and the use of prior knowledge.
As future work, we plan to further collect more domain-specific text data, e.g., sensor datasheets, which helps provide more information about different naming conventions and abbreviations. We then can integrate such information into our model to make it generalize better.