Shaping a social robot’s humor with Natural Language Generation and socially-aware reinforcement learning

Humor is an important aspect in human interaction to regulate conversations, increase interpersonal attraction and trust. For social robots, humor is one aspect to make interactions more natural, enjoyable, and to increase credibility and acceptance. In combination with appropriate non-verbal behavior, natural language generation offers the ability to create content on-the-fly. This work outlines the building-blocks for providing an individual, multimodal interaction experience by shaping the robot’s humor with the help of Natural Language Generation and Reinforcement Learning based on human social signals.


Introduction
Humor is an important aspect in human interaction. It regulates conversations, increases interpersonal attraction and trust. For embodied conversational agents, including social robots, humor makes interactions more natural, enjoyable and increases credibility and acceptance (Nijholt, 2007). Canned jokes are the first type of humor that come to mind. In Human-Robot Interaction (HRI), they are used for entertainment purposes like stand-up comedy and joke telling. Moreover, there are several types of conversational humor (Dynel, 2009) which are employed in human conversation. Generation of such humorous contents from the computational perspective is hard because it usually requires human creativity, not only because it is often context-dependent. Several research projects already investigated generation of humor for chat bots and joke generation.
Natural Language Generation (NLG) is a key component for social robots to generate humor-ous contents on-the-fly, as it opens up the possibility to react to user input and to generate utterances without the need to prepare scripted content in advance. The expression of humor also requires to incorporate other modalities in the presentation, being mainly gestures, gaze and facial expressions.
Keeping the diversity of interaction contexts, tasks and human preferences in mind, social robots should not only express humor, but also adapt it accordingly. We propose an approach to realize this by combining NLG and Reinforcement Learning (RL) to adapt the robot to the individual user's preferences. Being able to dynamically generate and present humorous content in a multimodal manner is one step to explore how to increase perceived social intelligence and naturalness of interactions. As an example for the NLG part we focus on ironical contents here.
First, we outline related work covering humor from the perspective of language, gestures, gaze and facial expressions, as well as adaptive social robots. Afterwards, we look at how to implement expression of multimodal irony by combining NLG with non-verbal behaviors. Finally, we propose the use of RL in combination with human social signals to optimize parameters for aforementioned robot modalities automatically, resulting in personalized interaction experiences for the human user.

Related Work
We split up related work in two research areas: (1) computational humor and experiments, which investigate how to generate and present jokes, as well as the role of humor for robots (2) adaptation of social robots with focus on Reinforcement Learning.

Humor
Several experiments for generation of humor in text form include e.g. the "Light Bulb Joke Generator" (Attardo and Raskin, 1994), "JAPE" and "STANDUP" for punning riddles (Binsted and Ritchie, 1997;Black et al., 2007) and "HA-HACRONYM" for humorous acronyms (Stock and Strapparava, 2002), only to name a few. When looking at entertainment, Sjöbergh and Araki (2008) found that jokes presented by robots are rated significantly funnier than their text-only equivalents. Further scenarios include Japanese Manzai (Hayashi et al., 2008), stand-up comedy (Nijholt, 2018;Knight, 2011;Katevas et al., 2015) and joke telling (Weber et al., 2018), where the robot presents scripted contents to the audience. Apart from canned jokes, there are many types of conversational humor (Dynel, 2009). For embodied conversational agents, humor is one aspect which contributes to the naturalness of an interaction: it can help to solve communication problems and to increase acceptance of natural language interfaces when used sparingly and carefully (Binsted et al., 1995). Appropriateness plays an important role, as humor will yield misunderstanding in the wrong situation (Nijholt, 2007).
In the context of robots, research by Mirnig et al. (2016) comes to the conclusion that positively attributed forms of humor (self-irony) are rated significantly higher than negative ones (Schadenfreude) when it comes to robot likability. Their results also indicate a general positive effect of humor and an interaction effect between user personality and preferred type of humor. Results from recent studies by Mirnig et al. (2017) indicate that adding unimodal verbal or non-verbal, humorous elements to non-humorous robot behavior does not automatically result in increased perceived funniness. They point out that humor is multilayered and results from several modalities.

Social Adaptation
Social robots, which adapt their behaviors to human users, are used in a variety of settings. Aly and Tapus (2016) employ NLG with a NAO robot for user-robot personality matching. Both gestures and speech are adapted to the human's personality profile while the user can get information about several restaurants from the robot. Another approach is used by Tapus et al. (2008): the authors use RL to optimize the robot's personality in the context of post-stroke rehabilitation therapy. They use scripted utterances in the context of exercises.
RL is used often as machine learning framework for adaptation of social robots' behaviors. For example, it is used to learn social behavior (Barraquand and Crowley, 2008), for student tutoring (Gordon et al., 2016), to maintain long-term user engagement when playing games (Leite et al., 2011) and intervention for children with autism spectrum disorder (Liu et al., 2008).
Different data is used to provide the RL feedback signal (reward), including task-related information like user performance (e.g. in exercises/games) and human social signals. Tactile (Barraquand and Crowley, 2008) or prosodic (Kim and Scassellati, 2007) feedback, interaction distance, gaze meeting, motion speed, timing (Mitsunaga et al., 2008), gesture and posture (Najar et al., 2016;Ritschel et al., 2017), or gaze direction (Fournier et al., 2017) are used in different scenarios. Another option is to use physiological data from ECG (Liu et al., 2008) or EEG (Tsiakas et al., 2018). In the context of humor, smile and gaze (Leite et al., 2011;Gordon et al., 2016;Hemminghaus and Kopp, 2017), as well as laughter (Hayashi et al., 2008;Knight, 2011;Katevas et al., 2015;Weber et al., 2018) are used, as these are contemporary human reactions serving as an indication whether a joke is good or bad from the perspective of the human listener.

Adaptive Robot Humor with NLG
To shape the humor of a social robot, both humorous content as well as an adaptation approach to the human's preferences is presented. Since language plays an important role for communicating information, we take a look at NLG for generating ironical statements, combined with multimodal markers including facial expression, gaze or gestures. In combination, these can result in humorous contents and elicit human social signals, which can serve as indication whether the robot's behavior is pleasing or not.

Generating Ironical Statements
Computational creation of creative, humorous content is very hard. However, there are many findings concerning types and multimodal markers of humor (Burgers and van Mulken, 2017), especially for irony (Attardo et al., 2003), which can result in humor, too. We focus on ironical con-13 Figure 1: Generating ironical statements in multiple stages Figure 2: Overview of the adaptation process tents here because the generation task can be realized as illustrated in Figure 1. First, Natural Language Processing (NLP) is used to check whether the input utterance can be transformed in an ironical statement. Then, NLG allows to convert the original utterance by inverting and applying linguistic markers. Apart from the semantic content of an ironical utterance, the way in which it is presented plays a crucial role. While written text may use direct, typographic or morpho-syntactic markers to help the reader to identify ironical content, linguistic, paralinguistic and visual markers are of special interest. Finally, these should be expressed by a robot with non-verbal behavior. Otherwise, irony might not be perceived by the listener. Facial expressions that indicate irony include raised or lowered eyebrows, wide open eyes, squinting or rolling, winking, nodding, smiling or a "blank face". Moreover, there are different acoustic parameter modulations. However, these are not consistent and differ from language to language.
The mentioned findings form a good starting point to implement expressive multimodal humorous contents for social robots by emphasizing spoken words generated by NLG with matching gaze, facial expressions and gestures in real-time.

Adaptation Process
Adaptation of humorous contents is often based on human social signals, primarily by sensing vocal laughter and smile to estimate the spectator's amusement. This applies to the aforementioned Japanese Manzai, standup comedy and joke telling scenarios. These experiments adapt the presented contents and their delivery in terms of animation, sound or voice parameters, but without generating content on-the-fly with the help of NLG. Figure 2 outlines our suggested adaptation mechanism for learning about which humor the user prefers. It is based on the general idea of including human social signals in the learning process of the robot (Ritschel, 2018). The user's social signals are captured via camera and microphone. Signal processing allows to extract user smile and vocal laughter, similar to the operationalization in Weber et al. (2018). This information can be used to shape the reward of the machine learning process. RL is used to manipulate the generation of the humorous content by altering parameters for NLG and animation, e.g. resulting in the use of ironical comments in one situation or not. Actually, there are many options what actually can be learned, including humor types or parameters of animation generation, e.g. to optimize non-verbal aspects of joke presentation, which might have different effects when expressed by a robot than by a human. By incorporating the user's feedback in terms of smile and laughter, the agent is able to learn how to make the user laugh by means of language, facial expression, gaze or gestures. Combining NLG with the generation of additional multimodal behaviors allows social robots to add humorous elements in conversations. It provides the opportunity to personalize and adapt the interaction experience to the individual preferences of the human user.

Conclusion
We have outlined the important role and opportunities of NLG to increase the credibility and acceptance of the robot and the naturalness of interactions. Generating contents on-the-fly allows to add humorous elements on demand. We have described an adaptation process to realize individualized interaction experiences for the human user. By incorporating human social signals in the RL process the robot can optimize the presentation of humorous contents depending on interaction context, task and human preferences. 14