Using NLG for speech synthesis of mathematical sentences

People with sight impairments can access to a mathematical expression by using its LaTeX source. However, this mechanisms have several drawbacks: (1) it assumes the knowledge of the LaTeX, (2) it is slow, since LaTeX is verbose and (3) it is error-prone since LATEX is a typographical language. In this paper we study the design of a natural language generation system for producing a mathematical sentence, i.e. a natural language sentence expressing the semantics of a mathematical expression. Moreover, we describe the main results of a first human based evaluation experiment of the system for Italian language.


Introduction
The recent progress of computational linguistic techniques and frameworks had a deep impact in the field of the assistive technologies. For instance, the recent development of commercial platforms for building speech dialogue systems, which are designed for not-impaired people, can also help people with disabilities in daily activities. For example a vocal command can be used to unlock a door in a house. However, for more specialized activities one needs to understand the necessity of specific communities in specific domains.
In the case of mathematical domain, blind people can access to a mathematical expression by using its L A T E X source. However, this process have several drawbacks. First of all, it assumes the knowledge of the L A T E X. Second, listening L A T E X is slow, since L A T E X is verbose. Finally, it is errorprone since L A T E X is a typographical language, that is a language designed for specifying the details of typographical visualization rather than for efficiently communicate the semantics of a mathematical expression. For instance, the simple L A T E X expression $f (x)$ is just a typographical description and so it represents both the function application of f to x, and the multiplication of the constant f for the constant x surrounded by parenthesis.
In this paper we study the design of a natural language generation (NLG) system for producing a mathematical sentence, that is a natural language sentence containing the semantics of a mathematical expression. Indeed, humans, when have to orally communicate mathematical expressions, use their most sophisticated communication technology, that is natural language. However, with respect to other domains, the mathematical domain has a number of peculiarities for speech that needed to be accounted for (see Section 4).
We have three main research goals in this paper. The first goal is answering to the question: what is the linguistic status of a mathematical expression? In other words, we want to investigate about the possibility to use the standard notions of linguistics, primarily syntax, for mathematical sentences. The second goal concerns the possibility to use a standard NLG architecture, that is a sentence planner and a realizer, for the production of a mathematical sentence. The third goal concerns the possibility to simplify the listening of a mathematical expression by using speech features during the speech synthesis. Indeed, in contrast with other fields, the mathematical domain is essentially a spoken domain (Chang, 1983). Indeed, by only listening the audio format of mathematical sentence, that is without accessing to its written form, the standard precedence of the mathematical operators are hardly recognizable. In other words, speech features, as pauses and prosody, can modify the perceived structure of the mathematical sentence in a peculiar way.
The schematic architecture of the developed framework is designed in Figure 1 Enhanced CMML w r i t t e n m a t h . s e n t e n c e audio math. sentence Figure 1: The software architecture for the generation of mathematical sentences. The information flow starts from (1) the L A T E X representation of the expression, (2) its translation in CMML, (3) enhancement of CMML, (4) generation of the written form of the mathematical sentence, (5) production of the audio form of the mathematical sentence. and Somer, 1992). The process of generating a mathematical expression from its L A T E X source is a two-step algorithm. In the first step the L A T E X is analyzed and its semantics is represented in Content MathML (CMML henceforth), a W3C standard 1 for the syntax and semantics of mathematical expressions. In the second step, the CMML representation is used as input of the S2S (Semantics to Speech) module, that is a NLG module, to generate the mathematical sentence and, after the introduction of parenthesis or pauses, its audio format encoding.
The paper is structured as follows. In Section 2 we give a short review of the accessibility problem of mathematical expressions for visually impaired people. In Section 3 we describe the first step of the algorithm, that is the process to extract the CMML representation from its L A T E X representation. In Section 4 we describe our assumptions about the syntactic structures associated to mathematical operators. In Section 5 we describe the second step of the algorithm, that is the NLG of the mathematical expression from its CMML representation. In Section 6 we describe a first human-based evaluation of the system for Italian language performed by four blind people. Finally, Section 7 closes the paper with some considerations and pointing to future work.

Related work
Research to enable people with sight impairments to access mathematical notation has been conducted in two main directions. On one hand, different techniques have been investigated to preserve mathematical notation in a source format, that can be processed by a screen reader, through-out the whole workflow of a scientific document. In particular, nowadays it is possible to embed mathematical expressions in web pages not only as images, which cannot be processed by screen readers, but through MathML or MathJax (Cervone, 2012) and in PDF documents produced from LaTeX through the LaTeX package Axessibility (Ahmetovic et al., 2018). On the other side, many research works have investigated how people with sight impairments can read and understand mathematical notation, along two directions: conversion into Braille and speech reading. Since Braille is not a universal standard, different converters have been developed. The most widespread include conversion from LaTeX to Japanese Braille (Hara et al., 2000), to Nemeth code mostly used in English speaking countries (Papasalouros and Tsolomitis, 2015), to Marburg code mostly used in German speaking countries (Murillo-Morales et al., 2016) and from MathML to Spanish, French and Italian Braille codes (Soiffer, 2016). Nonetheless, Braille cannot support all the notations that can be expressed through LaTeX or presentation MathML (e.g., category theory, computational logic) and it has not a mechanism to introduce new notations hence the converters have a number of limitations. For what concerns speech reading, different techniques have been investigated. First, the most common approach transforms LaTeX or MathML expressions into a readable sentence by mapping a sequence of mathematical symbols to an aural equivalent for English (Raman, 1996), Spanish, German and French (Soiffer, 2007), Polish (Bier and Sroczynski, 2015) and Thai (Boonprakong et al., 2017). This approach is totally unambiguous, but it is very verbose (e.g. multiple nested parentheses can be hardly retained). Moreover, only a limited number of mathematical contexts are managed. Second, in addition to se- quential speech reading, hierarchical exploration of the mathematical expression is provided (Soiffer, 2007;Sorge et al., 2014). This approach reduces the mental workload to retain the chunks of the expression. Third, speech reading is generated in a controlled environment such as a mathematical editor (Waltraud Schweikhardt, 2006;Raman and Gries, 1994). The context is defined by the author, hence the speech reading can be more accurate with respect to the semantics.
The idea to use mathematical sentences for improving the accessibility of mathematical expressions has been previously presented and experimented in (Ferres and Fuentes Sepúlveda, 2011;Fuentes Sepúlveda and Ferres, 2012) for Spanish language. However, in contrast to (Ferres and Fuentes Sepúlveda, 2011;Fuentes Sepúlveda and Ferres, 2012), we use a linguistic-based NLG architecture rather than a template-based one. In particular, by using the SimpleNLG realization engine for Italian, we allow both (i) for portability of the system to other languages, and (ii) a major and simple customization of the mathematical sentences. Indeed, the linguistic nature of the Sim-pleNLG input format allow for a very simple implementation of linguistic operations, as coordination or punctuation insertion (e.g. parenthesis. Cf. (van Deemter et al., 2005) for a discussion on the advantages and limitations of the template-based approach).

From L A T E X to CMML
The first step of our algorithm is the generation of CMML associated to a L A T E X formula. We based this step on an external tool named LatexML (Miller, 2007). However, the CMML obtained from this tool needed to be enhanced by a postprocessing procedure for two distinct reasons.
1. We decided to clean the CMML obtained from LatexML since we wanted to remove some extra characters in the tag names (e.g. the suffix m:). With the same aim, we replaced all non standard characters from the tag values, e.g. some variable names which are written by the LatexML with italics font. Moreover, in certain cases LatexML generate some tags with the open math standard, as the case of conditional-set, that is converted by using the csymbol tag. For the sake of generality, in order to simplify the generation step of the system, we decided to uniform these cases to their corresponding pure CMML tag.
2. There are case in which the typographical origin of the L A T E X creates ambiguity that cannot always correctly solved by LatexML. If $y = f (x)$ is the L A T E X representation of formula y = f (x), LatexML cannot autonomously decide the correct relation between the symbols f and x. One could force its interpretation as a function application (i.e. f is a function and x is his argument) or in alternative could force its interpretation as a multiplication (i.e. both f and x are just variables). Indeed, by default LatexML always assumes latter option. As a consequence, we had to fix it by hand in a number of cases.
The output of LatexML and the enhanced version with the hand fix applied are shown in Fig [x=a] [a] terance 2/3 can be pronounced as two thirds.
Establishing what are the similarities between mathematical language and natural language is important for theoretical and practical considerations since this would imply the existence of linguistic structures which could be exploited for language generation. Also very simple mathematical expression can often be pronunced in different ways which underlay different mathematical sentences and so different syntactic structures, as 2/3 that can also be pronunced with the sentence two over three (Chang, 1983). In contrast, there are cases in which the parsing of a mathematical expression unambiguously conducts to a specific mathematical sentence and to a unambiguous syntactic structure. For example, the mathematical expression |x|, that corresponds to the mathematical sentence the absolute value of x seems obviously to be a noun phrase modified both by an adjective and by a propositional phrase. In contrast, for the mathematical sentence x belongs to A, the most natural syntactic analysis seems to be a declarative sentence. However, there are many cases where the syntactic structure of a mathematical sentence is not so simple to represent. For example, x plus three could be analyzed as composed by the subject x and the object three, but this declarative representation of the sentence clashes with the impossibility to assign the part of speech verb to the word plus in the standard language.
As working hypothesis, we decided to assume a "specialized" syntactic analysis for a number of mathematical objects. For instance, x plus three indicates the action of adding one quantity to another, so it can be represented as a declarative structure. As a consequence, plus can be analysed as verb and this assumption can be extended to all the mathematical sentences.
In this paper we considered only the mathematical structures belonging to the subfield of the mathematical analysis. In particular we considered all the mathematical expressions in an Italian analysis book (Pandolfi, 2013). By using this corpus of expressions and by assuming that all numbers and variables can be treated as nouns and that all arithmetic operators can be treated as verbs, we found eight additional categories for representing all complex mathematical expressions and we defined a specific syntactic construction for each category. We reported these eight categories in Table 1.
We decided to analyse the mathematical sentences of relational operators as copula sentences (a is greater than b), algebraic operators as declarative sentences (a cartesian product b), logical operators as conjunctions (a or b), elementary operators (e.g. radical sign), sequence (e.g. limit), calculus (e.g. integral) as noun phrases (the square root of x), pairs and conditional sets as reduced relatives (the set of x such that x is less than 3). It is worth noting that our syntactic representations for mathematical operators in the analysis domain could have alternative representations or could be specialized in a more refined classification (c.f. (Chang, 1983)). However, we decided to use only eight category for the sake of simplicity.

Building Mathematical Sentences with NLG
Traditional NLG architectures split the generation process into three distinct phases, that are document planning, sentence planning and realization (Reiter and Dale, 2000;Gatt and Krahmer, 2018).
In particular document planning decides what to say and sentence planning and realization decides how to say it. In this project the content of the communication is specified by the input mathematical expression, so the content selection phase is not necessary at all. In Section 5.1 we will give some details on the rule-based sentence planner designed for managing mathematical sentences and in Section 5.2 we will describe the use of the SimpleNLG-it realizer  for the case of mathematical domain.

Building a Sentence Planner for Mathematical Sentences
The input of the sentence planner is a mathematical expression in the form of enhanced CMML, that is a sort of linguistic semantic tree for the mathematical expression. In order to associate a sentence plan, that is a sort of under-specified tree-based syntactic structure, we devised a recursive algorithm that traverses top-down the CMML structure.
We classified all the mathematical expressions into a number of predefined categories, as discussed in the Section 4 (see Table 1). In particular, for each category we designed a prototypical sentence plan that will be used in the recursive process. Each prototype builds a specific linguistic construction (e.g. copula, reduced relative etc.), that is designed for giving syntactic roles to the arguments of the specific mathematical construction. For instance, on the left of the Figure 3, we reported the prototypical sentence plan for the conditional set mathematical structure and on the right of we reported an example of its instantiation. Note that in the produced structures: (1) the leaves of the sentence plan are lemmas rather than words, (2) the syntactic relations among the nodes are expressed using both dependency relations (e.g. subj, complement) as well as constituency nodes (e.g. Prepositional Phrase, PP). This peculiar representation is typical of some re-alization engines. and will be exploited in the next step of the generation.
In order to build a sentence plan for a mathematical sentence, there are two important issues: (i) the perception of precedence of the arithmetic operator and (ii) its most economical non ambiguous representation.
(i) Listening mathematics has some peculiarities with respect to reading it. For instance, division is granted a higher precedence than addition, and during the reading process the expression a+b/c is parsed as a+ b c without ambiguities. A different result arises if one listens the equivalent mathematical sentence a plus b divided by c without reading the expression: we experimented that the most frequent perceived parse is a+b c . After a limited number of experiments in listening arithmetic expressions with distinct (blind and not blind) people, we decided to state as working hypothesis that the precedence of the arithmetic operators are perceived in the reverse order when one listens a mathematical expressions without reading it. We are aware that this speculation should be supported by specific experimental studies but, at the best of our knowledge, we have not been able to find them.
(ii) A different problem concerns the most efficient way to represent the correct precedence of a mathematical expression. In other words, how we can build a mathematical sentence unambiguously equivalent to a + b c ? A trivial but effective solution is to use parenthesis, that is to produce the mathematical sentence a plus open parenthesis b divided by c close parenthesis.
However, the drawback of this solution is the length of the sentence that, for very complex expressions, can augment substantially.
In order to account for the problem of the precedence of the operators and its representation, we modified the sentence planner in two ways. First, we decided to model parenthesis as first-class citizens in the sentence plan, that is we considered open-parenthesis and closed-parenthesis as two new lexical items of the SimpleNLG lexicon which can be used as pre-modifier and postmodifier of a mathematical sentence respectively. Second, similar to (Fuentes Sepúlveda and Ferres, 2012), we allowed to use a speech pause as a synonymous of open/closed-parenthesis items.
In order to experiment the pros and cons of using parentheses and pauses in the understanding of a mathematical sentence, we decided to implement three distinct parenthesization strategies, called parenthesis, pause, and smart.
1. In the parenthesis strategy, all the necessary parentheses are inserted in the sentence plan. Note that a parenthesis has to be considered necessary with respect to the inverted precedence order hypothesis stated above.
2. In the pause strategy, all the necessary pauses are inserted in the sentence plan.
3. In the smart strategy, all the necessary parentheses are inserted in the higher nodes of the sentence plan, and the necessary pauses are inserted close to the leaves of the sentence plan. This is a hybrid strategy that combines parentheses and pauses in order to have a less verbose mathematical sentence.
The evaluation of the performance of these parenthesization strategies is one of the goal of the experimentation described in the Section 6.

Using SimpleNLG for spoken mathematics
In order to produce a spoken mathematical sentences in Italian with the SimpleNLG-it realizer , we needed to account for the construction of a domain specific lexicon for the field of the mathematical analysis. SimpleNLG-it is the Italian porting of the SimpleNLG realizer, that was originally designed only for English (Gatt and Reiter, 2009). As default Italian lexicon, SimpleNLG-it uses a basic vocabulary of around 7000 words, that is a simple lexicon studied to be perfectly understood by most Italian people (Mazzei, 2016). However, for this specific project we needed to augment the basic lexicon with a mathematical specialized lexicon, that contains both new lexical entries (as arcotangente, arctangent), than new associated value for lexical entry that are yet in the basic lexicon (as the noun value for the part of speech of the lemma integrale, integral). This specialized lexicon contains 113 entries that are mostly categorized as nouns (e.g. logaritmo, logarithm), verbs (e.g. intersecare, intersect), adjective (e.g. iperbolico, hyperbolic). In the lexicon, there are only two new instances of adverbs (that are relativamente and propriamente, relative, properly), and only one instance of "prepositional locution" (that is tale che, such that). Finally, we added specific lexical items to realize both parenthesis (that are parentesi aperta and parentesi chiusa, open/closed parenthesis) and speech pause. This latter item will be finally realized by using the SSML (Speech Synthesis Markup Language) tag <break/>, that can be processed by many speech synthesis engines 2 . An example of written mathematical sentence generated for the mathematical expression n √ x = x 1/n is <break time="1000ms"/>La radice n-esima di xè uguale a x elevato a <break time="500ms"/> 1 diviso n (the n-th root of x is equal to x raised to 1 divided by n). The actual version of the mathematical sentence generator has been interfaced with two speech synthesis engines, that are the web service provided by the IBM-Watson framework 3 (W-engine henceforth), and the Espeak API 4 (Eengine henceforth). W-engine is a commercial, closed software based on deep learning, while Eengine is a free, open-source software based on formant synthesis algorithms. Note that for not visual impaired people W-engine sounds more fluent but for visual impaired people sounds more familiar since it is used by the free NVDA screen reader.

Evaluation
In order to have a first evaluation of the generation system, we wanted to test the hypothesis that the system produces mathematical sentences which are understandable by visually impaired people. So, we built a web-based test explicitly designed for this class of users. We designed a questionnaire composed by a 6 multiple choices questions concerning personal data, a core of 25 open questions each one concerning the listening of a mathematical sentence and its comprehensibility, 1 Likertscale question globally comparing L A T E X and system comprehensibility, 1 open question for free comments.
The 25 core questions have a all the same schema: there is a audio file encoding a mathematical sentence and there is a open form for transcribing it. In the compilation instructions, we asked the users to fill this section by using "L A T E X or with other non ambiguous formal representation". The mathematical expressions obtained have been manually translated to CMML for evaluation. We implemented the questionnaire by using the Google Form framework, that was preliminarily judged accessible by a blind person.
We have built the 25 core questions by using 10 mathematical expressions belonging to the analysis book used for developing the generation module (Pandolfi, 2013) (see Table 2). We have selected simple 5 expressions (less than 15 nodes in their MathML representation) and 5 complex expressions (more than 15 nodes). By varying (i) the synthesis engines (W-engine and E-engine), and (ii) the parenthesization strategies (parentheses, pauses and smart), we instantiate 25 possible mathematical sentences of the 10 mathematical expressions in the questionnaire.
In order to score the comprehension of the user we used two distinct metrics on the CMML expressions, that are Exact Match and SPICE (Anderson et al., 2016) metrics. The exact match returns 1 (or 0) value if the starting CMML tree and the perceived CMML are equals (or not). The SPICE score is a tree similarity measure previously used in the context of automatic caption generation. SPICE is obtained by computing the F-score of the overlap between two trees: the overlap is measured by decomposing trees in typed elementary substructures, that are operands, operators and their relations. For instance, the expression x − 1 is decomposed as 1, x, minus, (op: minus, first: x), (op: minus, second: 1) (cf. (Anderson et al., 2016) for more details).
For the experimentation, we recruited 4 visually impaired people with personal invitation without any rewards. All users are Italian mother tongue, have a good knowledge of mathematical analysis and have a bachelor degree (only one related to mathematics). We believe that for a preliminary evaluation of the system, 4 blind people is a valu-able number. Indeed, note that all the users found the complete experimentation quite tiring and they spend around one hour to complete it. Note that in previous work on the generation mathematical sentences (Ferres and Fuentes Sepúlveda, 2011;Fuentes Sepúlveda and Ferres, 2012), the evaluations have been performed without blind people 5 . So, in our knowledge, this is the first NLG study on mathematical sentence with a realistic users' test-set.
We also submitted the questionnaire to a not visually impaired analysis teacher. She found the task of understanding mathematical sentence only by listening extremely hard. We did not report her (very low) scores since it is outside of the main goal of the experimentation, but this poor results show the different perception difficulty among different kind of users.

Results and Error Analysis
In Table 3 and in Table 4 are reported the scores obtained by the users for Exact Match and SPICE measures.
From the first column of   we can see that score variation seems strongly depend by the user. Indeed, User 2 is the only one with a Master Degree related to mathematics. A possible comparison can be done with (Fuentes Sepúlveda and Ferres, 2012), where a user group of 20 not-blind engineers have been exposed to 15 mathematical sentences generated from mathematical formulas obtained from Wikipedia. (Fuentes Sepúlveda and Ferres, 2012) obtained a final score for the exact match between 76 − 79%, that is comparable with the results of Table 3.
The SPICE scores in Table 4 confirms the dependence from the user of the results and evidences that only small portions of the mathematical expressions have been misunderstood. A manual inspection of the results showed that most of misunderstood sentence have structural errors in the perceived tree (11 errors), while are less frequent erroneous symbols (6 errors). For instance, a common mistake regarded the perception of the derivation operator, that is la derivata di f di x (the derivative of f of x).
By considering the Simple and Complex columns of the Table 3 is evident that, not surprisingly, the complexity of the expression have a strong impact on the comprehension and the t-test returns a value of 0.01 (two-tailed p-value). However, the application of t-test to the SPICE values for Simple and Complex expression do not consider them significantly different (0.75 two-tailed p-value).
In Table 5 we reported the averaged values of SPICE for three distinct categorizations of the mathematical expressions, that are parenthesization strategy, synthesis engine and fame. This last category expresses that the last two complex expressions in Table 2 are two well-known mathematical definitions and this fact could affect their comprehension.
With respect to the main goal of searching for specific strategy for simplify the listening of a mathematical expression, we note that the statistical analysis of the values in Table 5 concerning category parenthesization did not suggest any significant variations among the strategies. For instance, by comparing with t-test the SPICE scores of parenthesis and pauses strategies, we obtain a not significant value (0.75 two-tailed p-value). In contrast we could assert as posthoc hypotheses that both synthesis engine and fame have a significant effect on the performance of the system: by applying the t-test we obtained for both 0.02 twotailed p-value. However, new experiments with more users are necessary to confirm these conclu-

Conclusion
In this paper we have presented a study on the generation of mathematical sentences, i.e. sentences in natural language expressing mathematical expressions 6 .
We have described the peculiarities of the mathematical domain with respect to the task of NLG. In particular, we have considered the practical problem of analysing the semantics expressed by a mathematical expression in L A T E X, and its speech production in Italian language by using the SimpleNLG-it realizer and two distinct synthesis engines. We have proposed three different strategies for parenthesization of ambiguous mathematical expressions based on both parenthesis and pauses in speech. Finally, we have conducted a human-based evaluation of the system by using a tree-based measures of similarity. The results of the experimentation suggests (1) a preference for formant-based synthesizer with respect to NNbased synthesizer, (2) the performances on simple expressions are good, but (3) on more structurally complex expressions need improvements In future work we intend to expand the lexicon in order to use English language and so to perform evaluation with a larger number of users and with more complex procedures of parenthesization. In particular, we want to experiment the use of acoustic signals for represents the opening and the closure of nested parentheses. Moreover, we intend to integrate the system into a dialogue system architecture with the idea to allow user to ask for repetition and clarification in the case of very complex expressions. 6 The open-source licensed software can be downloaded at https://bitbucket. org/tesimagistralemonticone/ formula-to-speech/