On the Faithfulness for E-commerce Product Summarization

In this work, we present a model to generate e-commerce product summaries. The consistency between the generated summary and the product attributes is an essential criterion for the ecommerce product summarization task. To enhance the consistency, first, we encode the product attribute table to guide the process of summary generation. Second, we identify the attribute words from the vocabulary, and we constrain these attribute words can be presented in the summaries only through copying from the source, i.e., the attribute words not in the source cannot be generated. We construct a Chinese e-commerce product summarization dataset, and the experimental results on this dataset demonstrate that our models significantly improve the faithfulness.


Introduction
The fast growing of the e-commerce market makes information overload, which hinders users finding the products they need and slows e-commerce platform upgrading the marketing policies. Product summarization technique facilitates user and e-commerce platform with text that contains the most valuable information about products, which is of practical value to address the problem of information overload.
In the e-commerce scenarios, unfaithful product summaries that are inconsistent with the corresponding product attributes, e.g., generating "cotton" for a "silk dress", mislead the users and decrease public credibility of the e-commerce platform. Thus, the faithfulness is a basic requirement for product summarization.
Recently, sequence-to-sequence (seq2seq) methods show promising performances for general text summarization tasks (Rush et al., 2015;Chopra et al., 2016;Zhou et al., 2017;Li et al., 2020b), which has been adopted to text generation tasks in the field of ecommerce (Khatri et al., 2018;Daultani et al., 2019;Chen et al., 2019;Li et al., 2020a). Although applicable, they do not attempt to improve the faithfulness for product summarization. In this paper, we aim to produce faithful product summaries with heterogeneous data, i.e., textual product description and product attribute table, as shown in Figure 1.
First, as the words in product attribute tables are explicit indicators for the product attributes, we propose a dual-copy mechanism that can selectively copy the tokens in textual product descriptions and product attribute words into the summaries. Second, for the product attribute words, we only allow them appearing in the summaries through copying from the source. In this way, the attribute words not belonging to a certain product cannot be presented in the corresponding summary. Thus, the generated summary will not contain incorrect attributes that contradict the product.
Our main contributions are as follows: • We propose a e-commerce product summarizer that can copy the tokens both from textual product descriptions and the product attribute table.
• We design an attribute-word-aware decoder that guarantees the attribute words can be presented in the summaries only through copying from the source.  Figure 1: The framework of our model.
• We construct a Chinese e-commerce product summarization dataset that contains approximately half a million product descriptions paired with summaries, and the experimental results on this dataset demonstrate the effectiveness of our model.

Overview
We start by defining the e-commerce product summarization task. The input is a textual product description and a product attribute table, and the output is a product summary. As shown in Figure 1, a product attribute table, a.k.a. product knowledge base (Shang et al., 2019), contains a wealth of attribute information for a product. To explore the guidance effect of the product attribute table for producing faithful attribute words, we propose a dual-copy mechanism that can selectively copy tokens from both the textual product description and the product attribute table. To guarantee the unfaithful attribute words not be presented in the summary, all the attribute words are only allowed to appear in the summary through copying. Figure 1, k 1,1 ="Display", k 1,2 ="mode", k 2,1 ="Motor", k 2,2 ="type", v 1,1 ="LED", v 1,2 ="digital", v 1,3 ="display", v 2,1 ="Variable, and v 2,2 ="frequency".

Dual-Copy Mechanism
A bidirectional LSTM encoder converts x, k, and v into hidden sequence h x , h k , and h v . Then, a unidirectional LSTM decoder generates the hidden sequence s as follows: where s t is a hidden state at timestep t, and c x t is the source sequence context vector that is generated by attention (Bahdanau et al., 2015) mechanism as follows: where α x t,i is the attention for i-th word in the source at timestep t. Similarly, we can get the attention over each attribute word α v t,i,j . We calculate the attribute attention and attribute context vector as follows: Our model is based on the pointer-generator network (PGNet) (Gu et al., 2016;See et al., 2017) that predicts words based on the probability distributions of the generator and the pointer (Vinyals et al., 2015). The generator produces vocabulary distribution P gen over a fixed target vocabulary as follows: The dual-pointer copy the word w from both the source sequence and attribute table. The copy distribution from the source sequence is obtained by the attention distribution over the source sequence: We adopt a coarse-to-fine attention (Liu et al., 2019) to calculate the final copy distribution from attribute word sequence with the guidance of attribute-level semantics.
The overall copy distribution is a weighted sum of the source sequence copy distribution and attribute word copy distribution: The final distribution combines P gen and P copy : where λ t ∈ [0, 1] is the generation probability for timestep t: The loss function L is the average negative log likelihood of the ground-truth target word y t for each timestep t:

Only-Copy Strategy for Attribute Words
To avoid generating summaries inconsistent with the product attributes, we constrain the attribute words can be presented in the summary only through copying from the source, so that the wrong attribute words cannot be generated. To achieve this goal, for each attribute word y att , we set P gen (y att ) = in Equation 11, where = 1e-9.
We design a heuristic method to collect the attribute words. We find through data analysis that a attribute word in the target is tend to be present in the source. Thus, for each source-target pair, we retrieve each target word in the source, and extract mismatched words as the general word candidates. To guarantee the precision of general word extraction, we regard Chinese words with character-level intersections as the matched words. We regard the target words that are almost never recognized as the general word candidates as the attribute words. The set of attribute words will be released along with our dataset.

Dataset
We collect the dataset from a mainstream Chinese e-commerce platform. Each sample is a (product textual description, product attribute table, product summary) triplet. The product summaries are generated by thousands of qualified experts, and the auditing groups of the e-commerce platform strictly verify the quality. We collect 361,158 and 92,886 samples for the categories of Home Appliances and Bags, respectively. For Home Appliances, we randomly select 10,000 samples as the validation set and test set. For Bags category, we randomly select 5,000 samples as the validation set and test set. The remaining samples are used as the training set. The average number of the Chinese characters for the source text and the summary are 325.54 and 79.24, respectively. The average count of product attribute is 13.87.

Experimental Results
We compare the following baselines. Lead is a simple baseline that takes the first 79 characters of the input as the summary. Seq2seq model is a standard attention-based seq2seq model without copy mechanism. PGNet is the Pointer-Generator Network. C-AttriTable denotes concatenate the source text and attribute words in attribute table as the input. AttriTable denotes the dual-copy mechanism.
AttriVocab denotes only-copy strategy for attribute words. Details about experiment settings can be found in our code. Table 1 shows the results for different models. Generally, the PGNet with attribute tables and the only-copy strategy for attribute words achieves the highest ROUGE score. Although "AttriTable" and "AttriVocab" aim to improve the faithfulness, they exhibit a acceptable performance for the ROUGE score. Considering that ROUGE evaluations are criticized for its poor correlation with human judgment especially for evaluating correctness of the generated summaries (Novikova et al., 2017;Kahn Jr et al., 2009), we perform a human evaluation towards faithfulness and readability.

Human Evaluations
Three expert annotators are involved to evaluate the faithfulness with 0 or 1 and the readability ranging from 1 to 5 (5 is the best) for 100 instances sampled from the test set. From Table 2, we can find that only 64.33% of the summaries are faithful to the source for PGNet, illustrating that faithfulness is an urgent problem for e-commerce product summarization task. "AttriTable" and "AttriVocab" strategies solve the unfaithfulness to a large extend. For the readability, all models obtain comparative results, and we can conclude that our "AttriTable" and "AttriVocab" strategies do not influence the fluency. The statistical significance test using a two-tailed paired t-test gets p-value<0.01.

Conclusion
We present an e-commerce product summarization model that aims to improve the consistency between the generated summary and the product attributes. We propose two strategies. First, we introduce a dual-copy mechanism that can selectively copy words from both the textual product descriptions and the product attribute table, which makes our model inclined to produce faithful attribute words existing in the product attribute table. Second, we design a heuristic method to recognize the attribute words, and unfaithful attribute words are not allowed to be presented in the summaries through generating from the target vocabulary. We construct a large-scale Chinese e-commerce product summarization dataset, and our dataset and code are available 1 .

Acknowledgments
This work is partially supported by National Key R&D Program of China (2018YFB2100802).