Characterization of Divergence in Impaired Speech of ALS Patients

Approximately 80% to 95% of patients with Amyotrophic Lateral Sclerosis (ALS) eventually develop speech impairments, such as defective articulation, slow laborious speech and hypernasality. The relationship between impaired speech and asymptomatic speech may be seen as a divergence from a baseline. This relationship can be characterized in terms of measurable combinations of phonological characteristics that are indicative of the degree to which the two diverge. We demonstrate that divergence measurements based on phonological characteristics of speech correlate with physiological assessments of ALS. Speech-based assessments offer benefits over commonly-used physiological assessments in that they are inexpensive, non-intrusive, and do not require trained clinical personnel for administering and interpreting the results.


Introduction
Amyotrophic lateral sclerosis (ALS) or Lou Gehrig's Disease, the most common form of motor neuron disease, is a rapidly progressive, neurodegenerative condition. It is characterized by muscle atrophy, muscle weakness, muscle spasticity, hyperreflexia, difficulty speaking (dysarthria), difficulty swallowing (dysphagia), and difficulty breathing (dyspnea). Mean survival time for ALS patients is three to five years from the time it is diagnosed; however, death may occur within months, or survival may last decades.
Most physiological assessments used to determine the functional status of patients with ALS are invasive, involving the use of expensive equipment and requiring trained clinical personnel to administer the tests and interpret the results. This is the case for a number of standardized objective assessments of bulbar function in ALS patients (Green et al., 2013), for example: breathing patterns, articulatory patterns, and voice loudness. These are generally measured by technologies that record chest wall movements, oral pressures and flows, oral movement and strength, and speech acoustics.
This paper lays the foundation for the development of less invasive phonologically-inspired measures that correlate strongly with (more invasive) physiological measures of ALS. Speech impairments eventually affect 80% to 95% of patients with ALS (Beukelman et al., 2011). In fact, Yorkston et al. (1993) noted that speech impairments may be present up to 33 months prior to diagnosis of ALS. Several previous studies  have shown that speech impairments correlate with physiological changes associated with ALS. Thus, we focus on correlating measures based on phonological features with standard physiological measures, thus enabling new, non-invasive measures for assessing the functionality of an ALS patient without significant overhead for personnel training and administration.
To bring this about, we determine the degree of divergence of symptomatic speech from asymptomatic speech taken as a baseline. 1 This determination is based on phonological features in speech, most of which have been previously identified in the literature as being associated with ALS, e.g., monoloudness, hypernasality and distorted vowels, see (Duffy, 2013). These are annotated, for the current study, by specialists, i.e., a phonolo-gist and a speech therapist experienced in working with ALS patients. The degree of divergence is correlated with physiological assessments of ALS, namely %FVC (Forced Vital Capacity) in sitting (%FVC-SIT) as well as supine (%FVC-SUP) positions. 2 The rest of the paper is organized as follows: In Section 2 we discuss related work that motivates and informs our research. Section 3 describes data used for our experiments. A discussion of speech divergence is presented in Section 4. Section 5 presents an assessment of the degree to which divergent characteristics in the speech match the level of progress of the ALS condition. This is followed by a discussion of future work and conclusions in Section 6.

Related Work
A number of past studies have investigated the utility of measuring the "voice signal" in order to answer questions about a speaker's state from their speech (Schuller et al., 2015(Schuller et al., , 2011. One such study attempts to distinguish classes of individuals with various speech impairments, such as stuttering (Nöth et al., 2000), aphasia (Fraser et al., 2014), and developmental language disorders (Gorman et al., 2016). The recognition of impaired speech has been employed to detect Alzheimer's (Rudzicz et al., 2014). Various speech-related features have been employed to detect whether the speech is affected by Parkinson's Disease (Bocklet et al., 2011). Relatedly, variations in speech properties under intoxicated and sober conditions have also been conducted (Biadsy et al., 2011).
Our work differs from prior approaches in that we explore perceivable phonological characteristics through the analysis of language divergences. One of the motivations for using phonological features exclusively rather than also using other features employed in prior studies was that phonological features did not require expensive equipment to collect data from speakers as e.g., a feature like maximum subglottal pressure would require. Since the goal of this work is to develop a measure that is completely based on speech features that can be identified with a simple click on a 2 %FVC-SUP refers to the percent value of the Forced Vital Capacity while the person is in supine position, and %FVC-SIT refers to the percent value of the Forced Vital Capacity while the person is in sitting position. See (Brinkmann et al., 1997;Czaplinski et al., 2006) for additional information about use of FVC in ALS assessments. device such as a phone, we focused on phonological features on which a machine can be trained to analyze automatically. Our focus on correlations with phonological features-tied to the notion of divergence from a baseline-is a significant contribution beyond what has been investigated previously.
The notion of divergence itself is not a new one in natural language processing. The characterization of divergence classes (Dorr, 1994) has been at the heart of solutions to many different problems ranging from word alignment  to machine translation  to acquisition of semantic lexicons (Olsen et al., 1998). Finding the minimal primitive units-and determining their possible combinations-was the foundation for this earlier work. However, in these earlier studies, primitives consisted of properties that were syntactic, lexical, or semantic in nature, whereas the primitives for the current work consist of properties that are phonological in nature. Beukelman et al. (2011), Duffy (2013, Green et al. (2013), and Orimaye et al. (2014) have established that pronunciation varies systematically within categories of speech impairment. (Silbergleit et al., 1997;Carrow et al., 1974) have shown that ALS speech shows deviant characteristics. For example, (Ball et al., 2001) observe that ALS speakers manifest altered voice quality. A number of speaker-level characteristics associated with impaired speech studied in prior work have been leveraged for our speech-related divergence detection. For example, Duffy (2013) specifically has enumerated speaker-level characteristics, such as monopitch and monoloudness. Rong et al. (2015;, and  have previously attempted to identify measures of speech motor function for ALS speech. While certain components of speech such as speaking rate, breathing patterns, and voice loudness have proven too variable to provide a reliable marker (Green et al., 2013), we demonstrate that divergence measurements based on phonological characteristics of speech correlate with physiological assessments of ALS. In addition to speaker-level characteristics and associated properties, our work defines divergence in terms of speech/span-level characteristics, as described in Section 3. Smaller vowel space areas have been found in ALS speech (Turner et al., 1995;Weismer et al., 2001) which suggests that vowels may be distorted in ALS speech. Similarly, Kent et al (Kent et al., 1990) found place and manner of articulation for some consonants, and regulation of tongue height for vowels to diverge from asymptomatic speech; these were expected to result in imprecise consonants and distorted vowels. Caruso and Burton (1987) observed that ALS speakers and asymptomatic speakers exhibited significant differences in stop-gap durations as well as in vowel durations.  have also previously shown a correlation between speaking rate and physiological measures of ALS, specifically ALS Functional Rating Scale (ALSFRS). 3 Our own work differs from this prior work in that we define divergence in terms of a wider range of speech characteristics-beyond speaking rate-and then demonstrate that divergence measures correlate with physiological measures of ALS.

Data: Transcriptions and Phonological Annotations
The data for our experiments consist of recorded speech of 16 recruited subjects with ALS in a clinical setting, collected quarterly for each subject. The subjects range between 35-74 years of age. Their age distribution is as follows: one subject in the 30s, two subjects in their 40s, one subject in their 50s, five subjects in their 60s, and seven subjects in their 70s. Out of the 16 subjects, only one of them is female, the other 15 subjects are male. In terms of race of the subjects, we have the following distribution: White (12), Asian (1), African-American (1), Not reported (2). The criteria for the recruitment of a particular subject are that the subject: (1) has been diagnosed with ALS; (2) is a native monolingual speaker of American English; (3) has bulbar involvement identified during initial ALS inpatient evaluation; (4) has a forced vital capacity (FVC) of greater than 50% of the expected value for age; and (5) has an ALSFRS-R score 4 of 40 or greater. Excluded from the study are those who have received a diagnosis of dementia, FVC of less than 50%, inability to speak, or inability to follow directions.
1. She held your dark suit in greasy wash water all year. 2. Don't ask me to hold an oily rag like that. 3. The big dog loved to chew on the old rag doll. 4. Chocolate and roses never fail as a romantic gift. Speech recordings of the same four sentences, that have been preselected, are made during each (quarterly) visit of each of the patients. 5 The four sentences, presented in Table 1, are selected from the Texas-Instrument/MIT (TIMIT) corpus (Garofolo et al., 1993) and were designed to be phonetically rich, thus providing solid coverage of the phonetic space from each subject. 6 The data also include recordings of four control (asymptomatic) subjects, two of whom are female and two are male, reading the same four TIMIT sentences in the same setting as the symptomatic subjects. These are used as the baseline speech against which divergence scores (defined in the next section) are calculated for the ALS symptomatic speech.
Our hypothesis is that a higher divergence is indicative of the progression of the ALS condition. This study focuses on divergence with respect to asymptomatic speech-taken as a baselineto determine whether the divergence is speaker dependent or whether it is more generally indicative of ALS progression. If the latter, this would help diagnose patients for which no previous/longitudinal data is available. 7 ALS speech data for the 16 subjects was transcribed and annotated via speech-analysis software called Praat (Boersma and van Heuven, 2001) for the 14 phonological characteristics enumerated in Table 2. These characteristics 5 All uses of these data as reported in this paper have been approved by the relevant Institutional Review Board (IRB). 6 Note the TIMIT sentences 1 and 2 are slightly different from the original TIMIT sentences; the original TIMIT sentences are as follows: (1) She had your dark suit in greasy wash water all year; (2) Don't ask me to carry an oily rag like that.

Speaker level characteristics monopitch
Voice lacks inflectional changes; pitch does not change much. monoloudness Voice for which the volume/loudness does not change, hence lacking normal variations in loudness.
Speech/span related characteristics harshness Voice seems harsh, rough and raspy-sometimes referred to as pressed voice-similar to what happens when a person talks while lifting a heavy load. imprecise consonants Consonant sounds lack precision. There may be slurring, inadequate sharpness, distortions, lack of crispness, and clumsiness in transitioning from one consonant to another. For example, a "w" may be produced instead of a "b". distorted vowels Vowel sounds distorted throughout their total duration. For example, a "a" may be produced instead of "i". prolonged phonemes A phoneme (i.e., a consonant or a vowel) is prolonged, i.e., its sound (when it is produced) continues over an unusual period of time. inappropriate silences Pauses that are produced not at syntactic or prosodic boundaries.
hypernasality Vowels that are supposed to be non-nasalized are instead nasalized in speech. strained or strangled quality Tenseness in voice (as with overall muscular tension). Perceived as increased effort, may seem tense or harsh as if talking and lifting at the same time or as if talking with breath held. breathiness Voice seems breathy, weak and thin. May seem like a sighing sound. There may be non-modulated turbulence noise in the produced sound, i.e., audible air escape in voice or bursts of breathiness. audible inspiration/stridor Noisy breathing and wheezing may accompany inhaling. There may be a harsh, crowing, or vibratory sound of variable pitch resulting from turbulent air flow caused by partial obstruction of the respiratory passages. unusual stress Speech sounds where most important parts of a sentence may be de-stressed or all parts of a sentence are stressed as if all are important or speech sounds may be perceived as robotic, with the same stress-where there is no variation in stress throughout sentence/phrase/word/syllable. hoarseness Abnormal voice changes, where voice may sound breathy, raspy, strained, or there may be changes in volume (loudness) or pitch (how high or low the voice is).

vocal fry
Popping or rattling sound of a very low frequency-also known as a creaky voice.  (2013): p248. The phonological annotations were made by two specialists: one of whom was a phonologist and the other was a speech therapist with experience working with ALS speakers. Two classes of phonological characteristics served as the basis of annotations, each with a set of primitive phonological features: speaker level characteristics and speech/span related characteristics. Speaker level characteristics refer to features in speech that are more characteristic of a specific speaker's voice-independent of individual sounds/spans, e.g., monopitch which indicates the lack of inflectional changes in voice. These were annotated only once for each speaker.
Speech/span related characteristics, on the other hand, refer to features in speech that are characteristic of a specific sound or are observed for a portion of speech-as opposed to features that are characteristic of the voice itself. For example, the feature imprecise consonants refers to the portion of speech where a specific consonant is produced imprecisely, it may involve slurring or inadequate sharpness, e.g., producing a "w" instead of a "b". For these annotations, spans in speech were marked over which these features were observed.
For each of these characteristics, the annotators also assigned a 1-10 Likert scale (Likert, 1932) rating to indicate the severity of the characteristic when it is observed, where 10 indicates "very severe" and 1 indicates "negligible."

Divergence in Speech
Understanding the relationship between impaired speech and asymptomatic speech is facilitated by measuring the degree to which symptomatic speech diverges from a baseline. For the current study, asymptomatic speech-which serves as a baseline-was created from a combination of recordings from asymptomatic speakers as described in Section 3. Simplistically, the degree of divergence is defined as the sum of the changes in a speech utterance from its asymptomatic equivalent. For a correlation to be supported, a large number of changes in speech (i.e., a strong divergence from asymptomatic speech) would correspond to advanced progression of the disease. The relationship between impaired speech and asymptomatic speech is characterized in terms of measurable combinations of phonological characteristics that are indicative of the degree to which the two diverge. The degree of divergence can be used as a diagnostic tool at regular intervals for checking the severity of physiological changes.
Multiple methods have been applied in order to calculate divergence scores: 1. Feature count based divergence score: Feature count refers to the number of characteristics observed in the speech samples. 8 The Feature count based divergence score for each ALS speaker is the difference between the feature count for the ALS speech and the feature count for the control asymptomatic speech. Four variations of this score are used based on how the feature count for the control asymptomatic speech is obtained: (a) Average feature count for controls: It is assumed that asymptomatic speakers may display characteristics identified in Table 2 but to a much smaller extent. Thus, taking a simple average of the feature count for control speakers is taken to be most representative of all asymptomatic speakers. The feature count for the control asymptomatic speech is the average of the feature count for all the control speakers. (b) Minimum feature count for controls: The control speaker with the minimum number of characteristics in his/her speech is assumed to be the most asymptomatic. Hence, the feature count for the control asymptomatic speech is the minimum of the feature counts for all control speakers. (c) Gender dependent average feature count for controls: The presence (or absence) of characteristics may be dependent on the gender of the speaker. To calculate divergence scores, it is best if speakers of the same gender are compared. Hence, the feature count for the control asymptomatic speech is the average of the feature count for all the control speakers of the same gender as the ALS speaker. (d) Gender dependent minimum feature count for controls: It is assumed the control speaker with the minimum number of characteristics in his/her speech is the most asymptomatic, but the presence (or absence) of characteristics may be gender dependent. To calculate divergence scores, it is best if speakers of the same gender 8 The counts from the two annotators were combined together in five different ways described in Section 5.1. are compared. Hence, the feature count for the control asymptomatic speech is the minimum of the feature count for all the control speakers of the same gender as the ALS speaker. 2. Total frequency based divergence score: For each speaker, an observed frequency score is computed as an aggregate of the frequencies of all observed characteristics in the speech of the speaker. 9 The average of the observed frequency score of both the annotators for a given speaker is taken as the frequency score for the speaker. The divergence score for an ALS speaker is the difference between the frequency score for the ALS speech and the frequency score for the control asymptomatic speech. The same four variations of this score are examined as described in the case of Feature count based divergence score, depending on how the frequency score for the control asymptomatic speech is obtained. 3. Likert Scale rating based divergence score: It is assumed that each of the characteristics may be as indicative of the condition as other characteristics in various ALS speakers. It is also assumed that the severity of the characteristics indicates progression of ALS. An observed Likert score of the speech samples from a speaker is taken to be an aggregate of the multiples of Likert Scale rating assigned by an annotator for each occurrence of a characteristic with the weight of the characteristic (which is uniformly taken to be 1 for all the characteristics in the current analysis). A Likert score for a speaker is calculated as an average of the two annotators' observed Likert scores of the speech samples from the speaker. A Likert Scale rating based divergence score for each ALS speaker is then taken to be the difference between the Likert score for the ALS speech and the Likert score for the control asymptomatic speech. The same four variations of this score are examined as described in the case of Feature count based divergence score, depending on how the Likert score for the control asymptomatic speech is obtained. For each of the three divergence measures defined above, a higher score indicates that the patients speech diverges from an asymptomatic  speech baseline more than would be indicated by a lower score. Divergence scores are expected to correlate with physiological measures of changes associated with ALS. Increasing divergence scores would thus serve as an indicator of the disease progression, analogous to decreasing physiological outputs (lower scores) associated with ALSthus, the two measures are expected to be negatively correlated. Table 3 presents two physiological assessment scores (%FVC-SUP and %FVC-SIT) and three divergence scores (defined above) for the 16 ALS speakers. 10 The scores are sorted by %FVC-SUP. The table indicates that the %FVC scores tend to drop as the divergence scores go up. As expected, a decrease in %FVC scores indicates disease progression, and similarly, a higher divergence score indicates disease progression.

Dealing with differences in Annotations to Calculate Divergence Scores
Since the nature of the annotated phonological characteristics was such that multiple characteristics might share various aspects of speech, annotators were asked to mark all characteristics that 10 Only the variant (d) for each of the divergence scores computed using the three methods is presented in the table to maintain clarity. Note variant (d) refers to the divergence scores calculated with Gender dependent minimum feature count for controls setting, as described in Section 4 above.   Table 2 were used as heuristics by the annotators, providing additional help in identifying the characteristics. 11 In order to resolve differences across annotations, we used five different methods to combine the two sets of annotations. Table 3 shows a representative combination of the first case below for the feature count measure and the third case below for both total frequency measure and Likert scale measure: 1. Union: The characteristics identified by both the annotators were considered only once. 2. Intersection: Only the features annotated by both the annotators were considered. 3. Max: The maximum of the two annotators' feature counts was used. 4. Min: Minimum of the two annotators' feature counts was used. 5. Avg: An average of the two annotators' feature counts was used.

Association between Divergence Scores and Physiological Scores
To determine whether there was an association between any or all of the divergence scores and the physiological measures of ALS, we correlated the divergence scores with the physiological assessment scores, %FVC-SUP and %FVC-SIT, using Pearson's correlation coefficient. The results are presented in Table 4. For simplicity, we report the correlations in the table as −1 * correlation . Refer Section A for correlations with all the divergence scores.
We observe that while divergence scores do not seem to correlate with the %FVC-SIT score, they do show a moderate correlation with the %FVC-SUP score (0.49 < r < 0.66) with moderate pvalues (p < 0.05). The stronger correlation effect we observe with %FVC-SUP than with %FVC-SIT may be due to higher difficulty in breathing that a patient may experience when (s)he is in supine position than in sitting position.
Consistent with the point above, patients with other pulmonary conditions have also been reported to experience higher difficulty in breathing when in supine position than in sitting or standing positions. Since the patients need to exert higher effort to achieve the same result in supine position than in sitting position, they may not be physiologically able to perform the same in the two positions, i.e., %FVC-SUP may be more sensitive than %FVC-SIT to the condition's progression. Since speech symptoms have also been found to be more readily apparent than other physiological symptoms (Yorkston et al., 1993), this results in a stronger correlation of the speech divergence scores with %FVC-SUP than with %FVC-SIT.
The table also indicates that divergence scores based on a simple measure-counts of features observed in ALS speech-correlate even better with %FVC-SUP scores than divergence scores that are based on slightly more complicated measures such as features' frequency or the Likert Scale ratings.

Conclusion and Future Work
This paper has presented a case for viewing the relationship between impaired speech and asymptomatic speech as a divergence from a baseline. Novel divergence measures have been developed for distinguishing asymptomatic speech from symptomatic speech, and these have been tested for correlations with physiological measures of ALS progression.
These speech divergence measures are a first step toward developing automated speech-based assessments of progression of the ALS condition that are both less expensive and less intrusive than their physiological counterparts. The current approach has enabled the identification speech-based measures that correlate well with other physiological measures currently used to monitor the progression of the ALS condition. The next step is to test if these measures can be used to predict the values for the currently used physiological measures including %FVC.
Also, the current study is based on manual annotations provided by human specialist annotators. Future research will involve exploration of approaches that can be trained to produce such annotations automatically. These could, in turn, be used to calculate divergence scores and eventually to predict values for other physiological measures.
The theoretical groundwork for developing speech-based measures defines speech divergence in terms of clinically-informed phonological speech characteristics associated with ALS symptomatic speech. We presented three methods, with four variants apiece, to compute speech divergence scores for symptomatic speech. We also showed that speech divergence scores are indeed correlated with physiological assessment scores for the progression of the disease.
Future research will investigate other methods to compute divergence between the symptomatic and asymptomatic speech that yield even stronger correlations with the physiological assessments measures. For example, it would be useful to explore whether the proportion of speech that is affected by the characteristics listed in Table 2 has any relation to the progression of the disease. Divergence scores that incorporate characteristics related to a proportion of the span are expected to be strongly correlated with the progression of ALS.
Two possible variants of how one may compute divergence scores based on such proportionrelated information are as follows: (1) Take proportion to be the proportion of speech that is affected by any of the characteristics. 12 One may calculate a divergence score for each ALS speaker as the difference between the proportion of speech of the ALS speaker affected by these characteristics and the proportion of controls' speech affected by these characteristics. An average of the proportion in the annotators' annotations may be used for the calculation of the divergence score.
(2) As a simple analytic, one may also consider proportion-based divergence scores corresponding to each of the characteristics for each ALS speaker. This analytic may be useful for providing a direct relation between a specific characteristic and the progression of the condition. However, it may also be useful to explore divergence classes based on groupings of characteristics that are similarly affected due to the progression of the condition, if any. 13 Some characteristics may be grouped to further explore divergences. Green et al. (2013) grouped features according to the speech subsystem involved (e.g., respiratory, phonatory, resonatory and articulatory). A reviewer also mentioned that gender-specific degree of severity of certain features would be interesting to explore. For example, there seems to be evidence that voicing control is more vulnerable in male patients (Kent et al., 1994). Such findings suggest that characteristics such as gender and possibly age may also need to be considered while developing speech divergence-based measures.
In addition, for the current study, each of the characteristics was treated uniformly with respect to ALS. Future work will explore the hypothesis that certain characteristics are more indicative than others with respect to the progression of ALS.
Finally, while prior studies indicate that prosodic recognition is not affected in ALS speakers (Zimmerman et al., 2007), articulatory or phonatory deficits might alter the correct production of interrogative, imperative, or declariative sentences (Congia et al., 1987). These may be found to be useful in the development of speechbased measures of ALS. Thus, future work will investigate the extent to which these variables would be more or less difficult to analyze automatically.

Acknowledgments
This work is supported in part by the VA Office of Rural Health Program "Increasing access to pulmonary function testing for rural veterans with ALS through at home testing" (N08-FY15Q1-S1-P01346) and Award Number 1I21RX001902 titled "DESIPHER: Speech Degradation as an Indicator of Physiological Degeneration in ALS." Additionally, the work is supported by resources and the use of facilities at the James A. Haley Veterans' Hospital. The contents of this paper represents solely the views of the authors and not the views of the Department of Veterans Affairs or the United States Government. The authors are also grateful for constructive feedback from anonymous reviewers. An additional thank you is extended to the two specialists for their time and expertise in providing us with annotations of the phonological features in ALS speech that enabled us to conduct the analyses reported herein. be used to determine the proportion of speech for a specific speaker that is affected by the characteristics.

A Supplemental Material
The correlation results between each of the three types of speech divergence scores with all of their variants and the %FVC-SUP are presented in Tables 5, 6, and 7. As mentioned before, for simplicity, we report the correlations in the tables as −1 * correlation .