Learning to Classify Human Needs of Events from Category Definitions with Prototypical Instantiation

We study the problem of learning an event classifier from human needs category descriptions, which is challenging due to: (1) the use of highly abstract concepts in natural language descriptions, (2) the difficulty of choosing key concepts. To tackle these two challenges, we propose LeaPI, a zero-shot learning method that first automatically generate weak labels by instantiating high-level concepts with prototypical instances and then trains a human needs classifier with the weakly labeled data. To filter noisy concepts, we design a reinforced selection algorithm to choose high-quality concepts for instantiation. Experimental results on the human needs categorization task show that our method outperforms baseline methods, producing substantially better precision.


Introduction
Training accurate text classifiers often requires a large amount of manually labeled data, which is expensive to collect. In contrast, humans can often perform well on a classification task by only reading category descriptions, which are easy to obtain. It is desirable if computers can automatically learn classifiers from class descriptions. In this work, we aim to learn an event classifier automatically from unlabeled events by using human needs category descriptions as supervision. Human needs categories have been proposed to explain why an event is positive or negative (Ding and Riloff, 2018a;Li and Hovy, 2017). For example, event "I had cancer" is negative because it violates Health needs, while "I had steak" is usually positive because it matches Physiological needs. Human needs categorization of events (Ding and Riloff, 2018a) is a task to classify events into eight categories associated with human needs (Maslow et al., 1970) in psychology: Physiological, Health, Leisure, Social, Finance, Cognition, Emotion, and None. However, learning a classifier from human needs category descriptions directly is challenging. First, human needs category descriptions often consist of highly abstract concepts. As shown in Fig. 1, the Physiological and Leisure needs are defined using abstract concepts (e.g., "food", "leisure activities") to cover all instances of them. As demonstrated in our experiments, it is not easy to represent the meanings of these abstract concepts accurately using existing methods. Second, it is not clear how to automatically choose key concepts without accessing manual labels.
In this work, we tackle these two challenges, and propose LEAPI, a method to automatically Learn a classifier from human need descriptions with Prototypical Instantiation. As shown in Fig.1, we first generate candidate key concepts from human needs descriptions (e.g., "food"). Then we automatically assign human needs category labels to events that contain prototypical instances of key concepts with the hypothesis that prototypical instances are accurate representations of abstract concepts. For example, we may assign "Physiological Needs" class label to event " I, had, eggs, " because "egg" is a prototypical instance of the key concept "food". Finally, we train a human needs classifier using the weakly labeled data. Since the automatically generated concepts are noisy (e.g., "person" is a general term, and may not be a good key concept for recognizing Physiological Needs), we propose a reinforced concept selection algorithm to automatically choose high-quality concepts for instantiation. Experimental results show that our method outperforms baselines, producing substantially better precision.

Related Work
There is a growing interest in studying affective events. Some of the previous work (Goyal et al., 2013;Deng and Wiebe, 2014;Ding and Riloff, 2016;Reed et al., 2017;Ding and Riloff, 2018b) aim to recognize the affective polarity of events. Recently, there have been many research work focusing on studying human needs and motives (Paul and Frank, 2019;Rashkin et al., 2018;Ding and Riloff, 2018a;Ding et al., 2019;Otani and Hovy, 2019) to achieve a deeper understanding of sentiment and emotion. However, all these work focused on building classifiers using manually labeled data, or using manual mapping rules  from existing lexicons such as LIWC (Pennebaker et al., 2007), which requires a significant amount of manual effort.
Our work is related to zero-shot learning for text classification (Yin et al., 2019). As Yin et al. (2019) pointed out there are two different settings of zeroshot learning: (1) label-partially-unseen: in which part of the labels are still available for training, and many methods Xia et al., 2018;Rios and Kavuluru, 2018) have been proposed under this setting; (2) label-fully-unseen: in which all labels are unseen, and it is also called dataless classification in previous work (Chang et al., 2008;Song and Roth, 2014). Our work of learning a classifier from human needs category descriptions is similar to the second setting of label-fully-unseen.
Researchers have also proposed methods (Srivastava et al., 2017;Hancock et al., 2018) to learn classifiers from natural language explanations. These methods require both crowdsourced labels and corresponding explanations of the labels, which are not directly applicable to our problem. One key difference between their work and ours is that their methods convert explanations to logical forms or labeling rules as supervision literally, while our work aims to learn classifiers from conceptual descriptions by considering the hyponyms of abstract concepts. For example, in our work we need to  Figure 2: Flow of our method LEAPI understand that the concept "food" in Fig.1 means all instances of food, not just the word "food". Our work is also related to reinforcement learning which has been used in many NLP applications such as relation classification (Feng et al., 2018;Qin et al., 2018), and sentiment analysis (Wang et al., 2019).

Learning Classifiers from Descriptions
Our goal is to design an automatic method to learn a classifier from human needs descriptions and unlabeled events. The key idea is to generate weak labels by instantiating abstract concepts with prototypical instances. Fig.2 shows the basic flow of our method. First, we generate candidate concepts from category descriptions and collect prototypical instances for each concept. Then, we automatically assign human needs labels to unlabeled events that contain prototypical instances of concepts corresponding to the labels. Finally, we train a classifier using the weakly labeled events. To filter noisy concepts, we also design a reinforced selection method to choose high-quality concepts for instantiation.

Concept Extraction
We hypothesize that key concepts mentioned in the human needs category descriptions can be used to categorize events for their implied human needs. In our work, we use the human needs categories proposed by , which are motivated by Maslow's Hierarchy of Needs (Maslow et al., 1970) and Fundamental Human Needs (Max-Neef et al., 1991) in psychology. We use the manual annotation guidelines described in  as our human needs category descriptions. Since the original guidelines are short and brief, we rewrote them into self-contained sentences. We include the descriptions in the Appendix.
We notice that the subject and object in a description sentence often are key concepts. Therefore, we extract subjects and objects from each description sentence as key concept candidates for each category by using nsubj and obj dependency relations generated by Stanford CoreNLP tool (Manning et al., 2014). For each pair of concepts a1 and a2 corresponding to the subject and the object, we construct 3 concept rules: Has(a1), Has(a2), and Has(a1∧a2). Each rule will assign its class label to an event if the event matches the rule.

Prototypical Instantiation
Based on the intuition that the meaning of an abstract concept can be represented with its prototypical instances. For each concept, we collect 20 most frequent instances of the concept from Probase (Wu et al., 2012) as its prototypical instances. 1 If a concept is not in Probase, we use the most similar concept from Probase based on cosine similarity computed using word embeddings. Then we automatically assign human needs labels to unlabeled events using the constructed concept rules. In our work, we use the events previously extracted by Ding and Riloff (2018b) as our unlabeled events, in which each event is represented as a 4-field tuple, i.e., Agent, Predicate, Object, PrepositionPhrase . An event matches a concept rule if its fields are prototypical instances of the concepts in the rule. If an event matches a rule, it will receive a human need label associated with the rule. If an event receives different labels, its final label is the majority vote. These weakly labeled events are used to train the final event classifier.

Human Needs Classifier of Events
Though the weakly labeled events can be accurate, its coverage may not be high. Therefore, we train a simple logistic regression on the weakly labeled events obtained in the last step using event embedding as features. Same as (Ding and Riloff, 2018a), the embedding of an event is computed as the average of embeddings of words in the event.

Reinforced Concept Selection
We notice that the automatically generated concepts are noisy. As shown in Fig.1 "person" extracted from the definitions of Physiological Needs 1 We also collect 200 most confident sentiment words from the SemEval-2015 English Twitter Lexicon (Kiritchenko et al., 2014) as prototypical instances of "sentiment words" concept. is a general term, weak labels generated using the rule based on this concept can be very noisy for training the final classifier. Therefore, we propose a reinforced concept selection method to select high-quality concepts for instantiation. Since concepts are used via concept rules, we perform the selection among the concept rules. Specifically, we formulate the concept rule selection as follows: given a set of concept rules c i and its corresponding human need label l i pairs, i.e., C = {(c 1 , l 1 ), (c 2 , l 2 ), ..., (c n , l n )}, our goal is to select a subsetĈ of high-quality concept rules.
State. We use s i to denote the state of each (concept rule, label) pair (c i , l i ), and represent it with a dense embedding, which is computed as the element-wise product of concept rule and label embeddings. The embedding of a concept rule is the average of word embeddings of all concept words in the rule. Label embeddings are just the embeddings of label names.
Policy Network. We use two layer neural network as our policy function, which is defined as: where f θ = W 2 ReLU(W 1 s i + b 1 ) + b 2 , the action a i indicates whether a concept rule is selected (a i = 1) or not (a i = 0), σ is the sigmoid function, and the parameters are θ = {W 1 , b 1 , W 2 , b 2 }.

Algorithm 1: Reinforcement Learning Algorithm for Concept Rule Selection
Input: concept rule and label pairs C, max episode M , sampling times T , and learning rate α, and parameters of policy network θ initialize parameter θ; for epoch m=0 to M do for sampling time t=0 to T do sample selection action for each pair in C; estimate reward rt with selected concepts; end estimate the baseline b = 1 T rt ; adjust rewardrt = rt − b ; update θ ← θ + α t ir t∇θ log π θ (si|ai) end Policy Optimization. We formulate the concept rule selection as a policy optimization problem in which we aim to find a policy that can select a subset of rules with maximum reward U (θ), where U (θ) = E a 1 ,...,an r(a 1 , ..., a n |s 1 , ..., s n ) − b We define the reward r to be the macro F1 score of event classification on the development dataset. For each trajectory, we only receive one reward when selection for all concepts are finished. To reduce the variance, we adjust rewards with a baseline b, which is computed as the average of rewards of sampled trajectories. In our experiments, we use the REINFORCE algorithm (Williams, 1992) to optimize our policy network. The detailed concept rule selection algorithm is shown in Algorithm 1.

Experimental Setting
Our experimental setup is same with the label-fullyunseen type of zero-shot learning (Yin et al., 2019), in which all labels are unseen. In our experiments, we used the 542 events with officially annotated human needs labels by  as our test set, and used another distinct set of 300 events labeled in preliminary studies as our development set for hyperparameter tuning. We also used 30K 2 unlabeled events 3 as our unlabeled data.
We compared our method with the following methods.
Majority: We used the majority label of human needs classes as the predictions for testing events.
ESA: We implemented ESA (Gabrilovich and Markovitch, 2007) using the 2019/01/20 Wikipedia dump. To predict an event's label, we first map both events and human needs category descriptions into sparse vectors represented using Wikipedia page titles. Then, for each event, we compute its cosine similarity with each category and predict its label as the most similar one.
Word2Vec: We computed the embeddings of events and category descriptions as the average of embeddings of words in them. Then, we predicted an event's label as the most similar category based on its cosine similarities with all categories.
BERT: We used the pre-trained BERT model (bert-base-uncased) (Devlin et al., 2018) to compute the embeddings of words in events and descriptions, then used the average to compute final embeddings. Same as Word2Vec, we used cosine similarity to predict the labels of events.
Entail: We also experimented with three pretrained entailment models that are trained on: MNLI (Williams et al., 2018), GLUE RTE , FEVER (Thorne et al., 2018), and their ensemble model proposed in (Yin et al., 2019). We first manually converted both human needs names and descriptions into hypotheses according to Yin et al. (2019), then we used pre-trained entailment models to predict if an event entails or not-entails any of the hypotheses. If it entails, we assign the corresponding label to the event.
Implementation Details For ESA, Word2Vec, and BERT, we used cosine similarity for prediction. Since the None category is defined to categorize events that do not belong to other classes, its category description does not contain key concepts that can be used to identify events for this class. We predicted an event as None if its similarities with other categories are < τ , which was selected on the dev set.
For our human needs classifier, we used the LR classifier in scikit-learn (Pedregosa et al., 2011) with default parameters. Since the None category description is not meaningful, we randomly selected K events from unlabeled data as training samples for this class. Our reinforced policy network has around 10k parameters, with a hidden layer size of 32. We used Word2Vec (Mikolov et al., 2013) as our word embeddings. In our experiments, the maximum epoch number is M =200, and we manually searched for the hyperparameters on the development set from the following ranges: learning rate α ∈ {1e-2, 1e-3, 1e-4}, number of None class events K ∈ {100, 300, 500}, sampling times T ∈ {10, 30, 50}. The best hyperparameters are α=1e-3, K=300, and T =30. Table 1 shows the performance of our method LEAPI and baseline methods that directly used human needs category descriptions for prediction. Results show that Word2Vec performed best among baselines. Without reinforced concept selection (RCS), our method achieved similar F1 score of 31.3 on the dev set as Word2Vec, and F1 score of 35.6 on the test. With RCS, our method averagely selected 56 from the 89 candidate concept rules, and obtained significantly better results than Word2Vec, yielding F1 gains of over +15% on both dev and test sets. Our method also significantly improved the precision from 32.7→55.8 on dev, and from 33.3→51.6 on the test. Table 2 shows the performance of the best baseline and Cognition classes respectively. We examined the predictions on these two categories, and found that the semantic meanings of many events in these two categories are often expressed by event predicates (e.g., "forgot" in "I forgot him", and "resign" in "I want to resign"). But, our method only focused on noun concepts, and the concepts of event predicates were not used, which can be improved in future work by extracting and instantiating concepts for event predicates.

Comparison with Manually Selected Concepts
To investigate the quality of automatically selected concepts, we also evaluated our method and baselines using the concepts (total 29) that were man-ually generated by authors 4 . Results are shown in Table 3. The automatic concepts were selected using our method. We find that our method LEAPI achieved much better performance than baselines using manual concepts. It further improved the F1 from 46.5→51.3 compared to that using automatic concepts. Compared to the results using category descriptions directly (Table 1), both ESA and Word2Vec achieved better performance using both automatically and manually selected concepts, demonstrating the importance of concept selection. We also notice that, with manual concepts, ESA and Word2Vec did not perform better compared to that using automatic concepts. One possible reason is that they can not accurately represent the meanings of the selected concepts, which is the motivation of our work to instantiate abstract concepts with prototypical instances.  Table 3: Results on test set using automatically and manually selected concepts. Results of automatic concepts are the means and standard deviations across 10 random seeds.

Conclusion
In this work, we proposed a zero-shot learning method to learn a classifier from human needs category descriptions by instantiating abstract concepts with prototypical instances. We also proposed a reinforced concept selection method to select high-quality concepts for instantiation automatically. Our experimental results demonstrate that our method achieved significantly better performance than baselines. In our work, we also noticed that the semantics of some events are composed of several concepts. Therefore, in the future, it would be worthwhile to explore the compositional concepts from category descriptions further to improve the performance of human needs categorization of events.