Synthesizing the finger alphabet of Swiss German Sign Language and evaluating the comprehensibility of the resulting animations

,


Introduction
Sign languages are natural languages and, as such, fully developed linguistic systems. They are often the preferred means of communication of Deaf 1 signers.
Sign languages make use of a communication form known as the finger alphabet (or, manual alphabet), in which the letters of a spoken language 2 word are fingerspelled, i.e., dedicated signs are used for each letter of the word. The letters of the alphabet of the most closely corresponding spoken language are used, e.g., English for American, British, and Irish Sign Language; German for German, Austrian, and Swiss German Sign Language, etc. Figure 1 shows the manual alphabet of Swiss German Sign Language (Deutschschweizerische Gebärdensprache, DSGS). Some fingerspelling signs are iconic, i.e., their meaning becomes obvious from their form. Most manual alphabets, like the one for DSGS, are one-handed, an exception being the two-handed alphabet for British Sign Language.
Tools for learning the finger alphabet of a sign language typically display one still image for each letter, thus not accounting for all of the salient information inherent in fingerspelling [3]: According to Wilcox [4], the transitions are more important than the holds for perceiving a fingerspelling sequence. The transitions are usually not represented in sequences of still images.
More recently, 3D animation has been used in fingerspelling learning tools. This approach "has the flexibility to shuffle letters to create new words, as well as having the potential for producing the natural transitions between letters" [3]. The difference between an animation and a still-only representation is shown in Figure 2 for the example of the American Sign Language (ASL) fingerspelling sequence T-U-N-A [5].
This paper reports on the work in synthesizing the finger alphabet of DSGS as a first step towards a fingerspelling learning tool for this language. Sign language synthesis is an instance of automatic sign language processing, which in turn forms part of natural language processing (NLP) [6]. The contribution of this paper is twofold: Firstly, the process of creating a set of hand postures and transitions for the DSGS finger alphabet is explained, and secondly, the results of a study assessing the comprehensibility of the resulting animations are reported. The comprehension rate of the signing avatar was highly satisfactory at 90.06%.
The remainder of this paper is organized as follows: Section 2 gives an overview of previous work involving linguistic analysis (Sections 2.1 to 2.3) and synthesis (Section 2.4) of fingerspelling. Section 3 explains how we produced a set of hand postures and transitions for DSGS fingerspelling synthesis. Section 4 presents the results of the study assessing the comprehensibility of synthesized DSGS fingerspelling sequences.

Domains of use
Fingerspelling is often used to express concepts for which no lexical sign exists in a sign language. Apart from that, it may serve other purposes: In ASL, fingerspelling is sometimes applied as a contrastive device to distinguish between "the everyday, familiar, and intimate vocabulary of signs, and the distant, foreign, and scientific vocabulary of words of English origin" [7]. Fingerspelling is also used for quoting from written texts, such as the Bible. In Italian Sign Language, fingerspelling is used predominantly for words from languages other than Italian [7].
Padden and Gunsauls [7], looking at 2164 fingerspelled words signed by 14 native ASL signers, found that nouns are by far the most commonly fingerspelled parts of speech, followed by adjectives and verbs. Within the noun category, occurrences of fingerspelling were evenly distributed among proper nouns and common nouns.

Frequency of use and speed
Frequency of use and speed of fingerspelling vary across sign languages. ASL is known to make heavy use of the finger alphabet: 10 to 15% of ASL signing consists of fingerspelling [7]. Native signers have been shown to fingerspell more often (18% of the signs in a sequence of 150 signs) than non-native signers (15% of the signs). Within the first group, native signers with a more advanced formal education (college or postgraduate level) have been demonstrated to use more fingerspelling (21% of the signs in a sequence of 150 signs) than native signers at the high school level (15% of the signs) [7]. In ASL, fingerspelled words continue to be used even after lexical signs have been introduced for the same concepts [7]. Some fingerspelled words have also been lexicalized in this language: For example, the sign FAX is performed by signing -F-and -X-in the direction from the subject to the object. This is different from the fingerspelled word F-A-X, which is not reduced to two fingerspelled letters and does not exhibit directionality [7].
Compared to 10 to 15% in ASL, British Sign Language (BSL) has been shown to contain only about 5% fingerspelling [8]. In BSL, fingerspelled words are typically abandoned once lexicalized signs have been introduced for a concept.
In DSGS, fingerspelling is even less common than in BSL.
As Boyes Braem and Rathmann [9] pointed out, "few DSGS signers are as yet as fluent in producing or reading fingerspelling". 3 Until recently, DSGS signers used mouthings to express technical terms or proper names for which no lexical sign existed, which partly accounts for the heavy use of mouthing in this language [11]. 4 Nowadays, fingerspelling is used more often in these cases, particularly by younger DSGS signers. In addition, it is applied with abbreviations. Keane and Brentari [13] reported fingerspelling rates between 2.18 and 6.5 letters per second (with a mean of 5.36 letters per second) based on data from different studies. The speed of ASL fingerspelling is known to be particularly high [7], whereas fingerspelling in DSGS is much slower: Accordingly, in a recent focus group study aimed at evaluating a DSGS signing avatar, the seven participants, all of them native signers of DSGS, found the default speed of fingerspelling of the avatar system to be too high [14].

Comprehensibility
A few studies have looked at the comprehensibility of fingerspelling sequences produced by human signers. Among them is that of Hanson [15], who presented 17 Deaf adult signers (15 of which were native signers) with 30 fingerspelled words and non-words each. The participants were given ten seconds to write the letters of the item presented and decide whether it was a word or a non-word.
Geer and Keane [16] assessed the respective importance of holds and transitions for fingerspelling perception. 16 L2 learners of ASL saw 94 fingerspelled words. Each word was presented exactly twice. Following this, the participants were asked to type its letters on a computer. The findings of the study complement those of Wilcox [4] introduced in Section 1: Ironically, the motion between the letters, which is what experts utilize [4], confuses language learners. It is therefore imperative that study tools help language learners learn to decode motion.

Synthesis
There are three essential elements required for realistic fingerspelling synthesis. These are • Natural thumb motion. Early efforts relied on related work in the field of robotics, however, this proved inadequate as an approximation of the thumb used in many grasping models does not accurately reflect the motions of the human thumb [17].
• Highly realistically modelled hand with a skeletal deformation system. Early systems used a segmented hand comprised of rigid components, and lacked the webbing between thumb and index finger, and the ability to deform the palm.
• Collision detection or collision avoidance. There is no physicality to a 3D model, so there is no inherent method to prevent one finger from passing through another. Collision detection or avoidance systems can prevent these types of intersections and add to the realism of the model.
An early effort used VRML [18] to allow users to create the hand postures representing individual letters of a manual alphabet. Users could type text and see a segmented hand interpolate between subsequent hand postures. All of the joint coordinates were aligned with world coordinates and did not reflect the natural anatomy of the hand. There were no allowances for collision detection or avoidance.
McDonald [19] created an improved hand model that not only facilitated thumb behavior, but for all of the phalanges in the hand. This was coupled with Davidson's [20] initial work on collision avoidance to produce a set of six words which were tested by Deaf high school students. Although they had few problems in identifying the words, test participants found the appearance of the hand off-putting because it was segmented and lacked webbing between the thumb and index finger.
Adamo-Villani and Beni [21] solved this problem by creating a highly realistic hand model with a skeletal deformation system, allowing the webbing to stretch and wrinkle as does a human hand. In 2006, Wolfe et al. [5] integrated the natural thumb movement and a highly realistic hand model with an enhanced system of collision avoidance. The collision system involved an exhaustive search of all possible letter transitions and correcting any that generated collisions through manual animation.
In 2008, Adamo-Villani [22] confirmed that manuallycreated animations for fingerspelling are more "readable" than ones generated through motion capture. The research described in this section focused exclusively on ASL, but several groups have explored animating manual alphabets for other signed languages. In 2003, Yeates [23] created a fingerspelling system for Auslan (Australian Sign Language) that utilized a segmented hand; similarly van Zijl [24] and Krastev [25] generated fingerspelling using the International Sign Alphabet. In addition, Kennaway [26] explored fingerspelling for BSL.
While only a small body of work has dealt with the comprehensibility of fingerspelling produced by human signers, even fewer studies have investigated the comprehensibility of synthesized fingerspelling. Among them is the study of Davidson et al. [20], who presented fluent ASL users with animated fingerspelling sequences at three different speeds to validate their animation approach.
3. Creating a set of hand postures and transitions for DSGS fingerspelling synthesis Section 2.2 discussed the increasing use of fingerspelling in DSGS. To our knowledge, only one fingerspelling learning tool for DSGS exists. 5 This tool displays one illustration for each letter of a fingerspelling sequence as mentioned in Section 1. Ours is the first approach to synthesizing the finger alphabet of DSGS as a first step towards a learning tool for this language.
Synthesizing the DSGS manual alphabet consisted of producing hand postures (handshapes with orientations) for each letter of the alphabet and transitions for each pair of letters. Figure 1 showed the finger alphabet of DSGS. Note that it features dedicated signs for -Ä-, -Ö-, and -Ü-as well as for -CH-and -SCH-.
Because of the similarity between the ASL and DSGS manual alphabets, our work built on a previous system that synthesized the manual alphabet of ASL [5]. In addition to the five new letters or letter combinations cited above, the DSGS manual alphabet contains four handshapes, -F-, -G-, -P-, and -T-, that are distinctly different from ASL. Further, the five letters -C-, -M-, -N-, -O-, and -Q-have a similar handshape in DSGS, but required smaller modifications, such as a different orientation or small adjustments in the fingers. Hence, the DSGS finger alphabet features 14 out of 30 hand postures that needed modification from the ASL manual alphabet. All hand postures were reviewed by native signers.
Like ASL, there was also the issue of collisions between the fingers during handshape transitions. Here, we again leveraged the similarity between ASL and DSGS manual alphabets. The previous ASL fingerspelling system identified the collection of letter pairs, such as the N→A transition in T-U-N-A in Figure 2, which caused finger collisions under naïve interpolation. To remove the collisions, they created a set of transition handshapes that are inserted in-between two letters to force certain fingers to move before others to create the clearance needed to avoid collision. Such a handshape can be seen in the eighth frame of the second row in Figure 2. Details of this method can be found in Wolfe et al. [5]. Because of the overlap between the DSGS and ASL manual alphabets, along with the fact that most of the new or modified hand postures had handshapes that were generally open, in the sense of Brentari's hanshape notation [27], it was possible to use the exact same set of transition handshapes as the original ASL system.

Assessing the comprehensibility of synthesized DSGS fingerspelling sequences
The aim of the study presented here was to assess the comprehensibility of animated DSGS fingerspelling sequences produced from the set of hand postures and transitions described in Section 3.

Study instrument and design
We conducted the study online using a remote testing system, LimeSurvey 6 . This approach has advantages over to face-to-face testing because it affords a large recruitment area and allows participants to complete the survey at any time. The survey was accessible from most web browsers and compatible across major operating systems.
Any person with DSGS fingerspelling skills was invited to participate in the study. The call for participation was distributed via an online portal for the DSGS community 7 as well as through personal messages to persons known to fulfill the recruitment criteria.
Participants accessed the study through a URL provided to them. The first page of the website presented information about the study in DSGS (video of a human signer) and German (video captions that represented a back-translation of the DSGS signing and text). Participants were informed of the purpose of the study, that participation was voluntary, that answers were anonymous, that items could be skipped, and that they could fully withdraw from the study at any time. Following this, they filled out a background questionnaire, which included questions about their hearing status, first language, preferred language, and age and manner of DSGS acquisition. No personally identifyable information was kept.
A detailed instruction page followed, on which the participants were informed that they were about to see 22 fingerspelled words signed by either a human or a virtual human (sign language avatar). Following this, the participants' task was to type the letters of the word in a text box. Figure 3 shows a screenshot of the study interface for each of these cases. The videos of the human signer had been resized and cropped so as to match the animations.
The participants were told that the fingerspelled words they were going to see were names of Swiss towns described in Ebling [14]. In contrast to the studies discussed in Section 2.3, an effort had been made to include only fingerspelled words that denote concepts for which no well-known lexical sign exists in DSGS. This was deemed an important prerequisite for a successful study. The items had been chosen based on the following criteria: • They were names of towns with train stations that were among the least frequented based on a list obtained from the Swiss Federal Railways; • The town names were of German or Swiss German origin; • The town names in the resulting set of items varied with respect to their length (number of letters); and The study items were assigned to participants such that each item appeared as either a video of a human signer or as an animation. Each participant saw 10 videos and 10 animations and items were presented in random order. The study items were preceded by two practice items that were the same for all participants: The first was a video of a human signer fingerspelling S-E-O-N, the second an animation of R-H-Ä-Z-Ü-N-S.
The human signer was a female native DSGS signer (Deafof-Deaf) who had been asked to sign at a natural speed but without using mouthings. This resulted in an average fingerspelling rate of 1.76 letters per second. The same rate was used for the animations. Note that it is below the minimum rate of 2.18 reported by Keane and Brentari [13] (cf. Section 2.2), which again points in the direction of a lower speed of fingerspelling in DSGS.
The participants were informed that they could view a video as many times as they wanted. Limiting the number of viewings 13 was felt to exert undue pressure. This approach was different from the study of Geer and Keane [16] (Section 2.3), who allowed subjects to view a video exactly twice, and Hanson [15], who presumably showed each video once. Not restricting the number of viewings in the present study also meant that there was no limit to the response time for an item. The response time was recorded as metadata.
Once participants had completed the main part of the study, they were asked to provide feedback on the following aspects: • Appropriateness of the rate of fingerspelling; • Comprehensibility of the individual letters and transitions between letters; and • General feedback on the fingerspelling sequences shown On the final page, participants were thanked for their contribution and given the possibility to leave their e-mail address if they wanted to receive information on the results of the study. If provided, the e-mail address was not saved together with the rest of the data to ensure anonymity. All data was stored in a password-protected database.
The entire study was designed so as to take a maximum of 20 minutes to complete. This was assessed through a pilot study with three participants, in which the average time spent to complete the study was 17 minutes.

Results and discussion
The study remained online for one week. During this time, 65 participants completed it, of which 31 were hearing, 24 Deaf, and 6 hard-of-hearing. 4 participants indicated that they did not fall into the three categories proposed for hearing status, referring to themselves as "using sign and spoken language", "deafened", "CODA" (child of Deaf adult), and "residual hearing/profoundly hard-of-hearing". The average time taken to complete the entire survey was 20 minutes and 12 seconds.
For the 20 main study items (excluding the two practice items), 1284 responses were submitted. In relation to the 1300 possible responses (20 items × 65 participants), this meant that a total of 16 responses had been skipped. 8 They were treated as incorrect responses.
For each of the 1284 responses given, we determined whether it was correct, ignoring umlaut expansions (ä→ae, etc.) and differences in case. Table 1 displays the comprehension rates: The mean percentage of correct responses was 93.91% for sequences fingerspelled by the human signer and 90.06% for sequences fingerspelled by the avatar. Also displayed are the binomial confidence intervals at a confidence level of 95%. They indicate a 95% confidence that the comprehension rate of the signing avatar is above 87.75% and below 92.37%. This result is highly satisfactory.
Comprehension rates below 100% for human signing have been reported in previous studies [28,29]. We surmise that in this case, they were due at least partly to the fact that mouthings were absent from the signing performances. While this was a methodological decision made to ensure that what was being measured was core fingerspelling comprehension, several participants alluded to the lack of mouthings in the post-study questionnaire.
A comprehension rate of 100% was obtained for three sequences fingerspelled by the human signer (Realp, Reutlingen, and Sedrun) and also for three sequences produced by the signing avatar (Bever, Hurden, and Mosen).
To obtain information about individual letters that may have been hard to comprehend with the signing avatar, we performed a confusion analysis. The results show that three letters were mistaken for other letters more often in sequences fingerspelled by the signing avatar than in sequences fingerspelled by the human signer: -F-(confused with -T-and -B-), -P-(confused with -G-and -H-), and -R-(confused with -U-). One letter, -H-, was confused more often in sequences fingerspelled by the human signer than in sequences fingerspelled by the signing avatar; it was mistaken with -G-, -L-, and -U-.
A confusion analysis between pairs of letters was also performed to obtain pointers to transitions that potentially needed to be improved. Comprehension was lower for four transitions with the signing avatar than with the human signer: F-I (mistaken for T-I and B-I), L-P (mistaken for L-G and L-H), L-R (mistaken for L-U), and R-I (mistaken for U-I). This overlaps with the qualitative feedback in the post-study questionnaire that asked for letters and transitions that were particularly hard to understand: Several participants mentioned the avatar's transitions into -G-, -I-, -P-, and -Q-as well as the transitions between -D-and -Q-and -L-and -P-. In addition, 12 out of 65 participants deemed the hand orientation of -Q-inaccurate.
In the general comments section, a number of participants remarked that the fingerspelling of the human signer was easier to understand than that of the signing avatar; some participants noted that this was due to the hand appearing too small in the animations. On the other hand, multiple participants commented on the quality of the signing avatar as being "surprisingly good". Repeated mention was made of the impression that short fingerspelled sequences were easier to understand than longer ones, regardless of whether they were signed by a human or an avatar.
One participant encouraged the introduction of speed controls for the signing avatar. In the post-study questionnaire rating of the speed of fingerspelling, the majority of the participants (number of responses: 62) deemed the speed appropriate (56.45%), followed by 35.48% who rated it as being too fast. 4.84% classified it as too slow, and 3.23% deemed it much too fast. No participant rated the speed as being much too slow. The numbers are summarized in Table 2.

Conclusion and outlook
We have presented the first work in synthesizing the finger alphabet of DSGS, an application of natural language processing. We have reported on the process of creating a set of hand postures and transitions as well as on a study assessing the comprehensibility of the resulting animations. The results showed that the comprehension rate of the signing avatar was highly satisfactory at 90.06%. Three of the sequences fingerspelled by the avatar yielded a comprehension rate of 100%.
The speed of fingerspelling chosen for the signing avatar was rated as appropriate by the majority of the participants. At the same time, a lower yet substantial number of participants rated it as being too high, which suggests that introducing speed controls would be beneficial.
The results of the study also offered pointers to aspects of the signing avatar that would benefit from further improvement, such as the hand postures of a number of letters as well as the transitions between some letters.
While the primary aim of the study was to assess the comprehensibility of the newly-created DSGS fingerspelling animations, the data obtained provides a wealth of information that  can be used to inform other research questions. For example, we intend to investigate the individual effects of the variables hearing status, age of DSGS acquisition, and speed-offingerspelling rating on the comprehension scores. The work presented in this paper represents the first step towards a fingerspelling learning tool for DSGS. As a next step, we will complete the development of the tool interface. Following this, we are going to conduct a study that assesses the usability of the interface.