Non-ingredient Detection in User-generated Recipes using the Sequence Tagging Approach

Recently, the number of user-generated recipes on the Internet has increased. In such recipes, users are generally supposed to write a title, an ingredient list, and steps to create a dish. However, some items in an ingredient list in a user-generated recipe are not actually edible ingredients. For example, headings, comments, and kitchenware sometimes appear in an ingredient list because users can freely write the list in their recipes. Such noise makes it difficult for computers to use recipes for a variety of tasks, such as calorie estimation. To address this issue, we propose a non-ingredient detection method inspired by a neural sequence tagging model. In our experiment, we annotated 6,675 ingredients in 600 user-generated recipes and showed that our proposed method achieved a 93.3 F1 score.


Introduction
At the present time, many people upload their recipes to the Internet. For example, over 6.7 million recipes have been uploaded to Cookpad, 1 one of the largest recipe sharing services in the world. Most of the recipes on the service are posted by ordinary users. Figure 1 shows an example of a recipe. Note that we use Japanese examples in this study because approximately half of the recipes on Cookpad are written in Japanese. As seen in the figure, a recipe generally consists of a title, ingredient list, and steps. An ingredient list is a set of items that have an ingredient name and quantity.
However, some items in an ingredient list in a user-generated recipe are not actually edible ingredients in a user-generated recipe. For example, the third item (seasoning) in Figure 1 Figure 1: Example of a recipe. The N/A means that the user (i.e., recipe author) has not written the information.
not an ingredient but the heading for the following ingredients. In a user-generated recipe, people freely use the ingredients field to describe ingredients. This noise makes it difficult for computers to use recipes for a variety of tasks, such as calorie estimation.
In this paper, we propose a method to detect non-ingredient items from an ingredient list in a user-generated recipe. Inspired by a sequence tagging approach, our method solves the problem by predicting a label (ingredient or non-ingredient) for each item in an ingredient list sequentially. In our experiment, we annotated 6, 675 ingredients in 600 recipes from Cookpad and investigated the performance of our method using the recipes.

Related Work
The increase in the number of recipes on the Internet has led to an increase in studies on these data, such as recipe analysis (Sasada et al., 2015;Hiramatsu et al., 2019), recipe organization (Kiddon et al., 2015;Jermsurawong and Habash, 2015), and recipe generation (Salvador et al., 2019;Kiddon et al., 2016). Additionally, many recipe-related corpora and datasets have been published recently to promote studies on recipes (Mori et al., 2014;Harashima et al., 2016;Salvador et al., 2017;Yagcioglu et al., 2018).
Among such recipe-related studies, the following two previous works focused on informal text in user-generated recipes, like our study. Harashima and Yamada (2018) converted ingredients written in an user-generated recipe into their canonical forms in an ingredient dictionary. However, in that study, there was no assumption that non-ingredients appear in a recipe, unlike our study.
By contrast, Inuzuka et al. (2018) distinguished non-steps written in a user-generated recipe from actual steps, such as an advertisement for the author's recipe books, which are not related to cooking. Our study focuses on ingredients in a recipe, unlike their work; that is, we distinguish noningredients written in a user-generated recipe from actual ingredients.
Our study is the first to pay attention to noningredients in a user-generated recipe. This contributes to a variety of recipe-related studies, particularly based on ingredients in a recipe, such as calorie estimation (Harashima et al., 2020), recipe clustering (Nadamoto et al., 2016), and reciperelated term detection (Chung, 2012).

Task Definition
The primary task in this study is to classify an item in an ingredient list as an ingredient or noningredient. In this work, we define non-ingredient items based on edibility. Figure 2 shows examples of ingredient lists in user-generated recipes. " " (white sauce) in Figure 2(a) is not an ingredient but a heading. The items below it are ingredients for white sauce. An item without a quantity is likely to be a non-ingredient. By contrast, " " (favorite vegetables) in Figure 2(b) is an actual ingredient. As shown by this example, an item without a quantity is not always a non-ingredient. "(↑ )" ((↑ you can use butter)) in Figure 2(c) is used as a comment, which mentions the previous ingredient " " (margarine). " " (bamboo skewers) in Figure 2(d) is a non-ingredient because it is not edible. In some recipes, kitchenware appears on the ingredient list, like this example. The goal  of this study is to detect these inedible items as non-ingredients.

Proposed Method
In this study, we detect non-ingredient items in an ingredient list using a neural sequence tagging model, shown in Figure 3.

Ingredient Representations
First, we convert each item in the ingredient list into its ingredient representation, which consists of an ingredient name representation and additional features. The former is obtained as follows: TF-IDF: We compute TF-IDF vectors for each item in the ingredient list. The term frequency and inverse document frequency are given as where n i,j is the number of words t i in the j th ingredient name, d is the set of tokenized words in the ingredient name, and D is the set of all ingredient names in the recipe dataset. We tokenize each  char-CNN: Instead of TF-IDF, we can also use a CNN-based sequence encoder (Zhang and Wallace, 2017) to obtain the character-level features of ingredient names. We compute the features using different kernel sizes (2, 3, 4, 5) and concatenate them.
Note that we do not use pre-trained embeddings such as GloVe or fastText for the ingredient name representation because these embeddings did not show a good performance in our preliminary expemriments.
In addition to the ingredient name representation, we use additional features of the ingredient name and quantity: character count and ingredient name frequency. The ingredient name frequency is computed from the recipe dataset, which is also used for the TF-IDF calculations. We count the ingredients with the same name and use the logscaled value as the name frequency.
Finally, we concatenate the ingredient name representation and additional features into one vector to create the ingredient representation.

Model
We use the sequence tagging model shown in Figure 3. Each time step corresponds to each item in the ingredient list. The model takes all items in the ingredient list as its inputs in the order that they appear in the recipe. Whereas the inputs of our model are ingredient representations described in the previous section, the outputs of our model are # of recipes 600 # of ingredients 5,829 # of non-ingredients 846 binary predictions, each of which represents an ingredient or non-ingredient. By performing a non-ingredient detection task as a sequence tagging problem, the model can make predictions by taking items before and after the target item into account. In many recipes, related ingredients are usually written close to each other. As shown in Figure 2(a), if an item in an ingredient list is used as a heading, related ingredients are listed below it.

Dataset
In our experiment, we chose 600 recipes from Cookpad. More precisely, we collected recipes whose ingredient lists contained items without quantity information because such items tended to be non-ingredients in our preliminary investigation. Each ingredient in the recipes was labeled as an ingredient or non-ingredient by three domainexpert annotators. The gold labels were decided by majority vote. Table 1 shows the statistics of our dataset.

Methods
In our experiment, we compared the performance of the following methods using 10-fold cross vali-

dation:
Random forest (baseline model): We used RandomForestClassifier included in scikit-learn as a baseline model. The input of the random forest model was the ingredient representation described in the previous section. This model predicted a label for each item in an ingredient list independently.
BiGRU model (our model): We used two-layer bidirectional GRU (BiGRU) (Cho et al., 2014). The dimension of the BiGRU hidden layer was 128. We trained the BiGRU model for 50 epochs using the Adam optimizer. Table 2 shows the experimental results. We evaluated the methods using F1, precision, and recall. As shown in Table 2, the BiGRU + char-CNN model with the ingredient frequency achieved the highest F1 score of 93.3. The BiGRU-based model was better than the random forest, so this result suggests that a sequence labeling approach is effective for the non-ingredient detection task.

Results and Discussion
The ingredient frequency improved the F1 scores for both the random forest and BiGRU models. Table 3 shows the most frequent ingredient names, which were calculated from approximately 3 million recipes from Cookpad. As shown in Table 3, many ingredient names that occurred frequently in recipes were actual ingredients, so ingredient name frequency is important for ingredient detection. The ingredient frequency can be an alternative feature of an ingredient dictionary which is usually rarely available.
In most cases, an item without a quantity in the ingredient list was used as a heading or comment, as described in Section 3. When we predicted an item without a quantity as a non-ingredient, the F1 score was 85.3 in our further investigation. However, some items such as vegetables or fruits, had no quantities although they were actually ingredients. Using the ingredient name frequency, it became possible to predict such items as ingredients properly because names of vegetables or fruits frequently occurred in our recipe dataset.

Conclusion
In this paper, we introduced a non-ingredient detection task for user-generated recipes and proposed a neural model based on the sequence tagging approach. We used a BiGRU-based model to predict a label for each ingredient over an ingredient sequence. To evaluate our method, we constructed a dataset that contained 6,675 ingredients of 600 recipes from Cookpad. Our experimental results showed that the proposed method achieved a 93.3 F1 score in the task. In future work, we plan to verify the effectiveness of our method for downstream tasks, such as calorie estimation.