Evaluation of a Runyankore grammar engine for healthcare messages

Natural Language Generation (NLG) can be used to generate personalized health information, which is especially useful when provided in one’s own language. However, the NLG technique widely used in different domains and languages—templates—was shown to be inapplicable to Bantu languages, due to their characteristic agglutinative structure. We present here our use of the grammar engine NLG technique to generate text in Runyankore, a Bantu language indigenous to Uganda. Our grammar engine adds to previous work in this field with new rules for cardinality constraints, prepositions in roles, the passive, and phonological conditioning. We evaluated the generated text with linguists and non-linguists, who regarded most text as grammatically correct and understandable; and over 60% of them regarded all the text generated by our system to have been authored by a human being.


Introduction
The vast majority of doctor-patient interactions in a healthcare setting is through verbal communication. The provision of written information to patients, to complement and augment the face-to-face session, increases the amount of information they retain (Di-Marco et al., 2005). Additionally, studies in health communication identified that patient information is likely to be more effective if it is personalized for a specific patient (Cawsey et al., 2000) and presented in an understandable form and manner (DiMarco et al., 2009). This however assumes that such information is communicated in one's first language, which, however, may not be the case in multilingual societies. Language is important here because problems with language exacerbate literacy difficulties, which are further confounded in situations of health (Di-Marco et al., 2009).
There are several Natural Language Generation (NLG) systems that generate customized patient information (Cawsey et al., 2000;DiMarco et al., 1995;de Rosis et al., 1999;Hussain et al., 2015;Lindahl, 2005;Mahamood and Reiter, 2011). These systems generate text in English, and one strategy to account for other languages could be to translate the generated English text to the target language. However, this requires correct machine translation for medical information, which does not exist for most languages, including our language of interest, Runyankore. Runyankore is a Bantu language indigenous to the south western part of Uganda (Asiimwe, 2014;Tayebwa, 2014;Turamyomwe, 2011), a country where English is the official language, whereas indigenous languages are still predominantly spoken in rural areas. There is therefore a need to investigate NLG for Runyankore.
We have limited our scope to generating patient summaries, drug prescription explanations, and treatment instructions. The kind of text generated could include: the number of pills to be taken, listing the active ingredient(s), what the medication does not contain, and the general classification of the medication (for example, that hydrocodone is an opiate). These are largely knowledge-to-text cases, for which ontologies, such as the medical terminology SNOMED-CT, can easily be used. There are several NLG systems that take ontologies as input, mainly for English (Kaljurand and Fuchs, 2007), but also Latvian and Lithuanian (Gruzitis et al., 2010;Barzdins, 2011), andGreek (Androutsopoulos et al., 2013). These systems apply the template NLG technique. However, as was demonstrated in (Keet and Khumalo, 2017), templates are inapplicable to agglutinating Bantu languages, such as Runyankore.
Runyankore, like other Bantu languages has a complex verbal morphology (14 tenses), noun class system with 20 noun classes, and is highly agglutinative. A noun class (NC) determines the affixes of the nouns belonging to it, and this in turn determines the agreement markers on the associated lexical categories such as adjectives and verbs. To illustrate the agglutinative nature of Runyankore (taken from (Turamyomwe, 2011)): Verb: titukakimureeterahoganu English: 'We have never ever brought it to him' Decomposition: ti-tu-ka-ki-mu-reet-er-a-ho-ga-nu Our previous work for Runyankore NLG (Byamugisha et al., 2016a;Byamugisha et al., 2016b) is not adequate, as it neither covers cardinality constraints (e.g., 'take [exactly] 3 pills'); nor the passive (e.g., 'operated by'); and ignore phonological conditioning of vowels in the agglutination process. Additionally, no evaluation on grammatical correctness of the generated text was done. All these aspects are needed for our scope of medical text generation. We undertook to address these shortcomings by: analyzing further details of the language to also process minimum, maximum, and exact cardinality, and devising algorithms for them; adding the phonological conditioning rules required for the scope; and extending the CFG of (Byamugisha et al., 2016b) with the passive. Their algorithms together with these novel ones were implemented and evaluated with 100 Runyankore speakers in rural Uganda and three Runyankore linguists. Most of the evaluated sentences were regarded as grammatically correct and understandable, and all computer-generated text was considered by a majority to have been written by a human being.
The paper is structured as follows: Section 2 introduces the new rules and algorithms required to generate more expressive text. Section 3 presents the experimental evaluation. We discuss in Section 4 and conclude in Section 5.

New rules and algorithms
Due to the limitations in our previous work for Runyankore NLG, as well as the structure of information in our domain of interest, healthcare, we developed new rules and algorithms to account for these gaps. These rules are added to those which already take care of the basic constructors in ontology languages, being named class subsumption ('is a' ), conjunction ('and' ), negation ('not' ¬), existential quantification ('at least one' ∃), and universal quantification ('all/each' ∀) (Byamugisha et al., 2016a).

Cardinality constraints
In terns of the knowledge-to-text input, our previous language coverage was the description logic (DL) language ALC. Adding qualified cardinality constraints brings the language feature coverage to ALCQ, which is an important fragment of OWL 2 DL (Motik et al., 2009). We describe the verbalization patterns for maximum (≤), minimum (≥), and exact cardinality (=) in this section.
Maximum cardinality is worded in English typically as 'a maximum of', 'not more than', or 'at most'. We use the Runyankore equivalent of 'not more than', -tarikurenga, which is the preferred word use. However, to form the full word for 'not more than', the subject prefix of the concept quantified over is required. We illustrate this in the following example. Consider Axiom 1 below, where 'symptom' has to be pluralized to 'symptoms', which is in noun class (NC) 4. The plural prefix for NC4 is emi (making emicucumo 'symptoms') that has a subject prefix gi that is attached to the -tarikurenga, 'not more than'. Compare this with Axiom 2, where 'courses' amashomo is in NC 6, and therewith goes with the subject prefix ga. Axiom1: Diabetes ≤ 3 has.Symptoms Buri ndwara ya shukari eine emicucumo gitarikurenga 3. 'Every disease of diabetes has at most 3 symptoms' Axiom2: Student ≤ 10 takes.Course Buri mwegi natwaara amashomo gatarikurenga 10. 'Every student takes not more than 10 courses' The algorithm to generate these coordinating elements is included in Algorithm 2.1.
{use the noun class to obtain the subject prefix} 8: {verbalize with the appropriate subject prefix} 9: else 10: n p ← getP lural(n 1 ) {pluralize the noun} 11: {get the plural noun class} 12: {get the plural subject prefix} 13: Result ← " sp p tarikurenga a 1 " {verbalize with the plural noun and subject prefix} 14: end if 15: return Result Minimum cardinality (≥) is typically rendered in English as 'a minimum of', 'not less than', or 'at least'. We apply 'at least' as the Runyankore verbalization, again because this is a more directly translatable version. Like the verbalization of existential quantification (∃) in (Byamugisha et al., 2016a), it uses hakiri for 'at least'. However, unlike ∃ where we always have -mwe for 'one', ≥ has the number instead (unless the number is 1). Thus, a verbalization similar to ∃ is used, also using the subject prefix of the concept quantified over. The examples below illustrate this case; e.g., e is the subject prefix of NC 9, for diguri 'degree': Axiom1: Panado ≥ 4 has.ActiveIngredient Buri Panado hakiri eine ebirungo by'amaani 4 'Every Panado has at least 4 active ingredients' Axiom2: Student ≥ 1 has.Degree Buri mwegi hakiri aine diguri emwe 'Every student has at least 1 degree' Similar to the verbalization of ≤, there is a need to pluralize the noun whenever the number after ≥ is greater than 1 (as is the case in Axiom 1, above). Due to space limitations, we omit the algorithm, as it is similar to Algorithm 2.1.
The English verbalization of exact cardinality (=) is 'exactly'. However, Runyankore does not have a direct translation for 'exactly', but uses 'only' instead. The word for 'only', -onka, requires the subject prefix to form the full word. In the following examples, bw and ky are the subject prefixes of NC 14 and 7, respectively, to which the nouns obujuma 'pills' and ekitabo 'book' belong, respectively: Axiom1: Patient = 2 takes.Pill Buri murweire natwara obujuma 2 bwonka 'Every patient takes only 2 pills' Axiom2: Child = 1 has.Book Buri mwana aine ekitabo 1 kyonka 'Every child has only 1 book' As is the case with ≤ and ≥, the noun is pluralized whenever the number is greater than 1. The algorithm is fairly similar to the others and therefore omitted due to space limitations.

Processing of Prepositions
The presence of prepositions in roles (relations), such as 'works for', and passives, such as 'operated by' changes the pattern in which the role is verbalized. While the algorithms cannot yet deal with any arbitrary preposition, we cover those that appeared in our test ontologies. These include: 'with' na, 'in' omu, 'of' (depends on the NC of the noun), and 'by' w. Our implementation of 'of' and 'by' is limited to the situation where the former is present after a noun, and where the verb is in past tense for the latter. Examples of roles containing these prepositions are: works with → naakora na; offered in → neherezibwa omu; part of → ekicweka kya; and driven by → naavugwa. 'With' and 'in' are translated as na and omu respectively, except when the verb is in the past tense, then the passive is introduced (as is the case with neeherezibwa). This case is similar to the verbalization of 'by' (see also Section 2.3). The verbalization of 'of' is different, as the NC of the noun (NC 7 in the example above) is required to obtain the genitive ekya, which then drops its initial vowel to form kya. Algorithm 2.2 shows the verbalization process of 'of'. The corresponding rules have been added to the ruleset.

The Passive in Context-Free Grammars
In Byamugisha et al., (2016b), we used a CFG for verb conjugation. In order to cater for the passive as explained in Section 2.2, we added a new non-terminal to our original CFG that had 6 nonterminals. The passive is under the 'extensions' grammatical slot (Turamyomwe, 2011), represented here as non-terminal EX, which is placed between the verb stem V S and the final vowel F V . The new CFG is now as follws: This extended CFG was also added to our ruleset.

Phonological Conditioning
Due to the agglutinative nature of Runyankore, the text resulting from our algorithms sometimes con-tains letter combinations that do not exist in Runyankore phonology. When this happens, phonological rules are used to make the required changes that reflect the sound change in language. This is referred to as phonological conditioning, which is considered as a last step of text generation. Table  1 shows a sample of the inputs into this step, under which process it occurs, the output after phonological conditioning, and the reason why this is the case. Note that while some cases that require phonological conditioning appear similar, and would thus be assumed to have the same solution, this is actually not the case. Take the example of baona and baonka, which could be assumed to both be solved by a double vowel, leading to boona and boonka respectively. Instead, each case is assessed individually, due to the presence of the nasal compound nk, which therefore results in boona and bonka.
All algorithms and rules have been implemented and first verified with four ontologies: SNOMED-CT, university 1 , people 2 , and family 3 . We were able to generate text for all axioms which contained our selected constructors.

Evaluation of Generated Text
The typical method of evaluating the performance of NLG systems is to ask subjects to read and judge the generated text, as compared to human-authored text (Bouayad-Agha et al., 2012). Another form of evaluating NLG systems is to present people with a text composed of both human-authored and computer generated text, and ask them to identify which is which (de Rosis et al., 1999;Hussain et al., 2015). We used both methods in our evaluation. Similar to Hussain et al., (2015), we both rated the generated text for grammatical correctness and understandability, as well as distinguished between humanauthored and computer generated text.

Materials and Methods
A questionnaire survey was used to evaluate the generated text. The questionnaire had three main sec- First, vowel elision occurs with ni, then the vowel is doubled due to vowel harmony tions: (1) age, highest level of education, occupation, and first language; (2) 10 generated sentences, which were varied based on the DL constructor being verbalized, as well as the presence of special conditions like prepositions, the passive, and definitions; and (3) 10 sentences, of which 5 humanauthored and 5 computer generated. The study was conducted in Mbarara, a district in Uganda, where Runyankore is ethnically and predominantly spoken. We used purposive sampling by only selecting participants who could read, write, and speak Runyankore. We evaluated with both linguists and non-linguists. We obtained 100 non-linguists from a single village, Mirama. In order to inform our target population that we were looking for study participants, an announcement was made at the local catholic church after the Sunday service. This information was related to the headmaster of a nearby school, who agreed to let his students and staff take part in our study. All our study participants were at least 18 years old. We also contacted 3 linguists from the Department of African Languages, College of Humanities and Social Sciences, Makerere University in Uganda.
We used a modified version of the questionnaire for linguists, which had 4 sections. The first 3 were similar to the questionnaire given to non-linguists; but we added a forth section to evaluate the output from the CFG. This section had 99 conjugated verbs, testing both the standard CFG and deviations from the standard CFG, negation, several verb stems from the ontologies, and phonological conditioning.
Grammatical Correctness and/or understandability The 10 sentences were required to be graded each according to four criteria: grammati-cally correct and understandable, incorrect grammar but understandable, grammatically correct but not understandable, and incorrect grammar and not understandable. Table 2 shows all the sentences in the questionnaire, as well as the DL axioms they verbalize, and the specific constructor whose verbalization we were testing for.
Sentence G originally had 'ProfessorInHCIorAI' and 'AIStudent' as concepts in the axiom. However, 'AI' and 'HCI' were replaced with 'Science' before the evaluation, because they were unfamiliar to the study participants, and this could have negatively affected how the sentence was graded.
Computer Generated versus Human-Authored 10 sentences, with 5 authored by a Runyankore Linguist (H) and 5 computer generated (C), were presented to study participants. They were then required to grade each sentence either as humanauthored or computer generated, based on its construction. The sentences used in this part of the questionnaire are presented in Table 3.1, along with the DL axioms verbalized.
All study participants received a questionnaire containing the questions for evaluating grammatical correctness and understandability, and computer generated vs. human-authored.
Grammatical Correctness and/or Understandability Our preferred outcome during this evaluation was to have all sentences graded as 'grammatically correct and understandable' by more than 50% of study participants. The results are summarised in Figure 1. Sentences A, D, E, G, and H, which evaluated the verbalization of: medical ∃, , =, ∀ with preposition, and ¬, were regarded as 'grammatically correct and understandable' by over 50% of the study participants (66%, 80%, 86%, 71%, and 92% respectively), which was a very positive outcome. Sentences B and J received the highest scores of 'grammatically correct and understandable' among its scores (47% and 38% respectively), but this was followed closely by 'incorrect grammar but understandable' (with 41% for B and 35% for J). Sentences C, F, and I, which evaluated the verbalization of ∃-with preposition, ≥, and ,were regarded as 'grammatically correct and understandable' by only 8%, 34%, and 26% of the study participants, respectively. They were graded for the worst outcome (incorrect grammar and not understandable) by 18%, 13%, and 35% of study partic-ipants respectively. The grading of sentences C, F, and I was due to a lack of vowel assimilation, issues of syntax versus semantics, and wrong pluralization of definitions, respectively. The algorithms were updated to perform vowel assimilation and pluralize definitions accordingly.

Computer Generated Versus Human-Authored
The desired outcome for this part of the evaluation was to have all computer generated sentences graded as human-authored by at least 66% of study participants. Hussain et al., (2015) evaluated for the same, but with 3 professionals, and their best result was that 64% of the overall text were regarded as humanauthored. Our results are summarised in Figure 2. All computer generated sentences (C) except for C4 (with 64%) were above this bar. C1, C2, C3, and C5 were regarded as human-authored by 78%, 71%, 90%, and 97% of study participants respectively. On the other hand, some human-authored text performed under the desired threshold. H2 just made it with 66%, while H3 and H5 were below with 64% and 56% of study participants respectively, regarding them as human-authored. They actually performed worse than computer generated text.
The implication that most study participants (> 60%) regarded all generated text as having been written by a human being is positive outcome.
Results from Linguist We have so far received feedback from one of the 3 linguists we contacted, and we present that feedback here. From the same sentences in Table 2 (except I) for evaluating grammatical correctness and/or understandability, A, D, E, and G were graded as 'incorrect grammar but understandable', due to: the issue of translating medical terminologies, the nature of the text from the axiom, a lack of proper phonological conditioning, Figure 2: Computer vs Human: C1, C2, C3, C5 considered human authored by more than 66%; H2, H3, H5 performed worse than C and the incorrect arrangement of the prepositional phrase respectively. C was graded as both incorrect and not understandable because of an error in the algorithm. This led to a modification of the grammar rules and algorithms. After the poor performance of sentence I in the non-linguists' questionnaire, arising from the translation of 'pet', we used a different axiom for the linguists, in order to focus their evaluation on the verbalization of conjunction. We used: Old Lady ∃ reads.Publication ∀ reads.Tabloid. This was verbalized as: Buri mukaikuru hakiri nashoma ekihandiiko ekishohoziibwe kimwe, kandi naashoma taburoyidi zoona. It was regarded as 'grammatically correct but not un-derstandable' because 'tabloid' was naturalized as taburoyidi. This is however the conventional way of translating loan words, and its negative effect on the grading of the sentence was unexpected. Sentences B, F, H, and J (except for the issue with the nasal compound) were graded as 'grammatically correct and understandable'. Except for H, the rest differ from the evaluation by non-linguists, where F was regarded as 'incorrect grammar but understandable' by 35%, 1% more than those who regarded it as 'grammatically correct and understandable'. Sentences B and J were only marginally regarded as being grammatically correct and understandable. It will be interesting to see whether all linguists will have a similar evaluation. From the same sentences in Table 3.1, C1, C2, H2, C4, and H3 were graded as human-authored; while H1, C3, H4, C5, and H5 were graded as computer generated. The reasons for differences in assessing this between linguists and non-linguists are unclear, as neither group explained their choice. It is however encouraging that more computer generated text was considered human-authored by a linguist.
When evaluating the performance of the CFG, 11 of the 99 conjugated verbs presented in the questionnaire were considered as incorrect: 1 due to a wrong subject prefix, 2 due to the need for an extra suffix, and 8 because vowel harmony was not implemented. The subject prefix and vowel harmony errors have since been fixed. The need for an extra suffix was only identified for the verb stem for 'eat' (ry). We are still investigating this error.

Discussion
Our work here adds to the growing efforts to verbalize ontologies in multiple languages. It has also provided a basis to consider that the underlying theories could be generalizable to other Bantu languages. In our previous work (Byamugisha et al., 2016a;Byamugisha et al., 2016c), we showed that the factors affecting verbalization in isiZulu and Runyankore are the same, and both Runyankore and isiZulu had similar exceptions to pluralizing nouns according to the standard NC table. Further, the passive as a grammatical slot is present in the verbal morphologies of both languages.
Perhaps most interestingly for our domain of in-terest, is that the use of a small sample of SNOMED-CT, a very large healthcare ontology, during testing, helped to investigate how to translate medical jargon to Runyankore. In some cases, the term is maintained, which is the case for common terms like 'Panado'; in others, it is given context, e.g., hydrocodone translated as omubazi gwa hydrocodone 'medicine of hydrocodone'; or a mixed transslation and context approach, e.g., endwara ya shukari 'disease of sugar', where 'sugar' is a common translation for 'diabetes' in a healthcare context; while in extreme cases, the term is defined, e.g., opiate translated as omubazi ogukusinza ogukwejunisibwa kukyendeza obusaasi 'medicine which intoxicates as treatment to reduce pain'. There are several terms for anatomy, diseases, and drugs which are not directly translatable to Runyankore, but this offers a starting point to show alternative ways to handle it. Further, the results from the evaluation of the generated text are very encouraging, both from a linguist and non-linguists. Additionally, important feedback was obtained, which enabled us to make modifications to the initial rules and algorithms.

Conclusion
New algorithms for knowledge-to-text verbalisation into Runyankore have been developed. These include: i) rules and algorithms for cardinality constraints (maximum, minimum, and exact), therewith extending the language coverage, ii) handling some prepositions in complex roles; iii) processing the passive with a CFG, and iv) phonological conditioning. We have evaluated the text generated by the algorithms and rules with a group from the general population on grammatical correctness, understandability, and whether they were computer generated and human-authored text. The data demonstrated that most sentences were evaluated as grammatically correct and understandable, and human authored. We are currently investigating the architecture required to implement this as a NLG system.