Aspect Extraction from Product Reviews Using Category Hierarchy Information

Aspect extraction abstracts the common properties of objects from corpora discussing them, such as reviews of products. Recent work on aspect extraction is leveraging the hierarchical relationship between products and their categories. However, such effort focuses on the aspects of child categories but ignores those from parent categories. Hence, we propose an LDA-based generative topic model inducing the two-layer categorical information (CAT-LDA), to balance the aspects of both a parent category and its child categories. Our hypothesis is that child categories inherit aspects from parent categories, controlled by the hierarchy between them. Experimental results on 5 categories of Amazon.com products show that both common aspects of parent category and the individual aspects of sub-categories can be extracted to align well with the common sense. We further evaluate the manually extracted aspects of 16 products, resulting in an average hit rate of 79.10%.


Introduction
E-commerce provides a whole new way for shopping that product reviews posted by some consumers can help others make their purchase decisions. One important task about online product review is to extract the properties of products, known as aspects. Aspect extraction has many applications, such as opinion mining (Liu, 2012;Liu et al., 2015), summerization (Bagheri et al., 2013;Hu and Liu, 2004), helpfulness prediction (Yang et al., 2016;Yang et al., 2015) and recommendation (Reschke et al., 2013;Jakob, 2011).
Statistical topic modeling, such as LDA (Blei et al., 2003) and its variants, has been shown to be successful for aspect extraction (Titov and Mc-Donald, 2008;Zhao et al., 2010;Jo and Oh, 2011;Mukherjee and Liu, 2012;Moghaddam and Ester, 2013). Topic modeling clusters words based on their co-occurrences in sentences and documents to generate topics, each of which is a probabilistic distribution over words. Because words that co-occur are often about the same topic, which could talk about one aspect of a product, one or more aspects can be then associated with one or more topics. Earlier work of topic modeling is fully unsupervised while recently knowledge bases (KB) begin to be incorporated into semisupervised schemes (Wang et al., 2014;Zhai et al., 2010;Chen et al., 2014).
However, existing approaches have limitations. First, the aspects usually become terms strongly associated with specific group of products (e.g., "multitouch" of touchscreen laptops), instead of the true ratable features of products (e.g., "battery life" for all laptops and even all portable electronic devices) (Titov and McDonald, 2008). Second, existing approaches require sufficient amount of corpora while many products do not have enough reviews, known as the cold-start problem (Moghaddam and Ester, 2013). For example, around 1/3 of the product categories used in our experiment from Amazon.com Review Dataset (McAuley and Leskovec, 2013) have less than 100 reviews. Third, current approaches do not provide a good balance between child category aspects and parent category aspects. Therefore, we develop a new aspect extraction approach, called categorical LDA (CAT-LDA), by leveraging the hierarchy relationship between products. We hypothesize that reviews of each subcategory (e.g., gaming laptops) all contribute to the topics of its corresponding general category (e.g., laptops), but with different weights. As a result, aspects of a specific sub-category of products will be the combination of its unique aspects and the aspects from its parent (and thus shared with its siblings). This modeling also provides an approach to cold-starting problem by allowing aspects to be inherited from the parent category or transferred from sibling (sub-)categories.
Unlike most of the existing work modeling at the product item level, our model is based on the product category level. It can be easily extended to product item level by creating one node for each item and attaching them to the leaf nodes on the category hierarchy. Factorized LDA (FLDA) (Moghaddam and Ester, 2013) is based on the category level, but it only considers specific categories where all items in one category share a set of aspects. Our approach extends by modeling aspects in both the general and specific categories. Our model also relaxes the assumption in multi-grain LDA (MG-LDA) (Titov and McDonald, 2008) that only local topics contribute to product aspects, aligning better with common sense. Aspects at different layers are all related with each other through the product tree. For example, all portable electronic devices have a common aspect: battery life.
Empirical study is based on reviews from 5 general categories of Amazon.com Review Dataset (McAuley and Leskovec, 2013). The model we propose can generate human ratable product aspects from both general categories and sub-categories. We evaluate the extracted aspects for 16 product items of 9 categories against the annotations from (Hu and Liu, 2004;Ding et al., 2008;Liu et al., 2015). Promising experimental result shows 79% hit rate on manually annotated aspects.

Problem Formulation
In the context of the product aspect extraction, an aspect is an attribute or feature of a product item mentioned in reviews. Previous work of aspect extraction focuses on either an aspect term mentioned in review text or an aspect category which groups many aspect terms together (Zhai et al., 2010). Here we focus on the latter. However, we will show that our model is also able to detect aspect terms from an unseen text in Section 4.
In this paper, we propose a generative topic model with two layers of hierarchy: the general categories and the sub-categories. For example, "pocket watches" is a subcategory under the general category "watches". Product hierarchy information (also called product tree, Figure 2 as an example for "watches") can be extracted from online shopping websites, e.g., Amazon.com. For the sake of simplicity, we flatten the product tree into the two layers. General categories are at the top of product hierarchy and any category under it in the product hierarchy is its sub-category. It is still an open question to design a unified model to extract aspects by considering all the hierarchical layers.
Our goal is to identify the aspects of both general categories and sub-categories. We hypothesize that reviews under the same general category share some common aspects because of the similarity among them. But because of the difference among them, each subcategory has its unique aspects.

Methodology
According to our hypothesis, when composing a review, a consumer considers aspects of both the general category and the subcategory that the product belongs to. Such generative process can be represented in the graphical model as in Figure 1. We refer to "aspect" as "topic" in the context of topic modeling. Denote P as the set of general categories and C the set of sub-categories. Each general cate-gory p ∈ P has a topic distribution θ p while each sub-category c ∈ C has a topic distribution θ c . When generating a sentence, a topic distribution is picked first using a switch x following Bernoulli distribution µ. Like in standard topic modeling, each topic t is a distribution over words, denoted as ϕ t . Further, there is a set of background words whose distribution is denoted as ϕ b . To choose between background words and topic words, we assume another switch y following Bernoulli distribution π.  When a sentence is generated, given its subcategory c and its general category p, we first sample a value for switching x based on µ. Let θ = θ c (e.g., "wrist watches" or "pocket watches" in Figure 2) if x = 0 (i.e., picking the topic of a sub-category), otherwise θ = θ p (e.g., "watches" in Figure 2) (i.e., picking the topic of a general category). A topic z is chosen based on the topic distribution θ. For each word position in the sentence, first sample a value for switching y based on π and then pick the word based on the word distribution ϕ t of the topic z if y = 0, or from background word distribution ϕ b otherwise. Figure 2 illustrates the generative process using watches as an example, showing top 3 aspects and their probabilities.
All distributions θ c , θ p , ϕ t , ϕ b are generated from Dirichlet priors with hyperparameters α, α , β, and β , respectively. The generation process is: where N d means the number of words in document d, "Dir" refers to "Dirichlet", and "Multi" refers to "Multinomial". Each multinomial distribution is governed by some symmetric Dirichlet distribution. We use Gibbs sampling to perform model inference and present the sampling formulas as follows.
Let τ be the set of hyperparameters {α, α , β, β , µ, π}, c, p be the sub-category and general category of document d's n-th aspect. We collapse out all the θ c , θ p , ϕ t , and ϕ b , and jointly sample switch x d and aspect label z d as follows: where n t x=0,c is the number of times topic t and sub-category c co-occur, and n t x=1,p is the number of times topic t and general category p co-occur.
Similarly, we sample y d,n as follows:

Experiment
Reviews from 5 categories (details in Table 1

Qualitative Results
We select top topics at different levels and manually examine if they can be aligned with some  , return, shipping, back, days, received, item, order, ordered, refund... Shipping great, product, arrived, fast, quality, shipping, easy, advertised, received, delivery... certain aspects. Because the top ranked topics are equivalent to the topics mentioned the most in reviews, we can treat these topics as the most important aspects. For better representation, we also manually assign an "aspect" label to each topic.
Top words for the top topics discovered in each general category are presented in Table 2 in the form of one topic per line, along with the top ranked words in this topic. For space sake, only three topics are presented. They align well with the product aspects in our common sense.
For example, Value is the most cared aspect of baby product buyers, followed by Service and Return. The electronics products have the same highest ranked aspects, but in a different order. Unlike other categories, the top aspects for Software are Product, Support and Install, which are unique aspects of software in our common sense. Table 3 shows the top five topics and top words among all categories. Not surprisingly, Value, Return and Shipping are still the most important aspects for customers who shop online. Review, basically "the reviews from other customers", is also mentioned frequently, indicating that customers are indeed influenced by the reviews of others. In the end, people like to talk about their Experience and compare to that with other retailers, local or online.  Lastly, we are interested in top topics for specific categories. Due to space limit, we pick Laptop Computers to study (Table 4). Quite unlike topics for general category, the top topics for Laptops are very product related: Spec, Design, System, Warranty and Screen.

Quantitative Results
We then quantitatively study whether our model can really extract aspects. The ground truth is the sentence-level manual aspect annotations in a combined dataset from (Hu and Liu, 2004;Ding et al., 2008;Liu et al., 2015), which contains 10,993 reviews of 17 products in total. The aspects are annotated at sentence level. Among them, we select 16 products that can be linked to the 5 general categories used to train our model above. The 16 products belong to 9 categories (Table 5). Note that not all sentences are annotated, we only predict the sentences with human annotations. For comparison, MG-LDA (Titov and Mc-Donald, 2008) is used as the baseline.
We first attach each product to its closest category in the category hierarchy. For each sentence with manual aspect annotations, the model described above is used to find its most like topic. Then we select 3 words from the sentence with the highest probability under the detected topic as highlighted words, hoping that highlighted words can cover the aspect terms annotated manually. However, the manual annotations can also involve words not in the sentence. So we also include the top 3 topic words of the detected topic because they are the best words to describe the topic. We say a "hit" if the highlighted words and top 3 topic words of a sentence cover all manually annotated aspect words, and a "miss" otherwise. For example, given a camera review Also as someone who at least knows a little bit about the technical work of taking a photo i really miss having manual controls. Words manual controls are annotated as aspect terms. The highlighted words extracted by CAT-LDA are photo, manual and controls, and the topic words are control, controls, remote. It is a "hit" because the aspect terms are covered by highlighted words and topic words. The hit rates of different products are given in Table 5. To be fair, sentences used for the quantitative test are not used to train the topic models.
Because MG-LDA is not originally designed for extracting aspects for general categories, we train one MG-LDA model for each category in Table 5 to avoid introducing a disadvantage for MG-LDA 1 . Similar to above, we first find the closest category for each product in the category hierarchy and then train a model on all reviews of this category.
The result of CAT-LDA is very promising, with an average hit rate of 79.10% among all 9 categories of products. Physical products of computer or electronics type have very high hit rates, with the highest 91.21% for PC monitors. The low hit rates of diaper champ and software are due to the lack of components, especially descriptive ones, and their limited functionality. CAT-LDA leads MG-LDA in all of 9 categories of products with an average hit rate improvement of 12%.
The results can be further improved if we consider synonyms words of aspect terms or adding more features like Part-of-Speech tags and dependence rules (Hu and Liu, 2004;Yu et al., 2011). Because it is not the main focus of this paper, we leave it as future work.

Conclusion
In this paper we propose a generative model for aspect extraction leveraging product category hierarchy. Our hypothesis is that any product's aspects are a mixture of aspects from its parent category and aspects unique to itself. Topic models built in this way can successfully balances the aspects of a product itself and its parent category. Experimental results show 79% hit rate on manually annotated aspect terms of 16 products covering 9 categories.