Formalization of Speech Verbs with NooJ for Machine Translation: the French Verb accuser

The mediocrity of sentences generated by online translators (Jacqueline, 1998; Hutchins, 2001) prompts us to try to find a solution to have more reliable translations. This is a very difficult task due to the ambiguity of natural languages and especially the deficiencies of translation systems in terms of syntactic and semantic knowledge. How can we make automatic translation more reliable and unambiguous? Our main objective will be to generate a text where the translation of French verbs into Arabic will be without ambiguities. In this contribution, we attempt to formalize a particular class of verbs, namely the so-called verbs of speech. We shall limit ourselves to the treatment of the verb accuser ‘ to accuse’ as presented in the Dubois & Dubois-Charlier (1997) electronic dictionary, Les verbes français . We shall take this verb as a prototype to show how NooJ can perform a reliable machine translation and generate a good text without ambiguities.


Introduction
The process of creating translation machines, capable of properly translating verbs, takes place in two stages: a theoretical stage and an application stage. The first phase allows analyzing and interpreting the phenomena studied which is then developed in the second phase whose role is to produce an automatic translation. In this contribution, we will try to automatically treat a class of verbs, namely the so-called verbs of speech.
To do this we have adopted the NooJ platform for the syntactic description of the French verb accuser 'to accuse' as an example of speech verbs. Our two main objectives, which are in the realm of applied linguistics, include:  The production of a system of analysis and recognition of the syntactic patterns of the verb accuser according to the classification of French verbs.
 An adequate and reliable machine translation and the generation of sentences into Arabic.
Our work will therefore be divided into four parts: Derivational and inflectional formalization; Syntactic formalization; Implementation of the verb accuser in NooJ for machine translation; Automatic translation and generation of sentences in Arabic.

Derivational and inflectional formalization
The NooJ software has its own tools for automatic verb analysis and processing (Silberztein, 2003), so we need to formalize the linguistic data, in order that the program can automatically analyze and process the verb accuser in all of its various shades of meaning in our corpus, and then accurately translate them.
We will therefore create the necessary paradigms to link each derivative or verb conjugated to its infinitive form.

Creation of derivational paradigms
We have chosen our verb example from Dubois & Dubois-Charlier's (1997) electronic dictionary, Les verbes français (LVF). This work presents derivational codes which can be adjectival, such as in -able and -ant, or nominal derivations, as in -age, -ment, -ion, -eur, and -ure.
The verb accuser in the LVF has only one nominal derivative: accusateur 'accuser', we have created in NooJ its paradigm that we called N1 = accuser<B2> ateur / N. The LVF dictionary clearly indicated the different derivatives for each verb, but their inflections were not mentioned in the feminine nor in the plural, so we had to create these inflectional paradigms so that NooJ could recognize the inflected derivative forms: accusateur = <E> / m + s | s / m + p | <B3> rice / f s + | <B3> rices / f + p.

Creation of inflectional paradigms
NooJ will automatically be able to recognize the conjugated forms of a verb only when describing the conjugation models indicated in LVF with the NooJ inflectional operators. For that, Max Silberztein 1 matched the conjugation codes of NooJ.
Example: the verb aimer 'to love' is inflected as: For this inflectional paradigm of the verb aimer all the tenses, and moods have been described with all the personal pronouns using NooJ operators and thanks to this model we can conjugate a large number of verbs, such as accuser our example verb.
NooJ can now recognize all the conjugated occurrences of the verb accuser in our corpus, lemmatize them and link them to the list of the various uses.
1 Author of the software NooJ

Syntactic formalization
In this phase, we describe the syntactic schemas, which are written in the form of codes, replace the codes with the verb and its arguments, and assign them semantic features based on their syntactic schemas. Syntactic schemas are defined by the nature of the constituents of the sentence, their properties and their relations, and by the words of the lexicon which enter into the various types of constituents.
a. Accuser direct transitive verb After this phase of derivational, inflectional and syntactic formalization of the verb accuser with NooJ operators, we proceed to the implementation of these formal data within NooJ.

Implementation of the verb accuser in NooJ for automatic translation
In this phase of formalization we show how to integrate the verbal input accuser in NooJ for automatic translation. For this, we created a bilingual French Arabic dictionary and formal grammars for the different constructions of the verb accuser.

Creation of a bilingual Arabic dictionary
This phase of implementation of the verb in a bilingual French-Arabic dictionary aims first of all to reformulate the information of the LVF in terms of NooJ operators. This operation consists of applying the dictionary to automatically translate the text into Arabic. We added the Arabic translation to each verb and all the other words in order to generate sentences into Arabic without ambiguities.
In this phase, we reformulate the information of the LVF in connection with the verb accuser and apply this dictionary to the French-Arabic machine translation (Figure 1).

The creation of formal grammars for the different constructions of the verb accuser
For a reliable automatic translation of sentences containing the verb accuser, we tried to create formal grammars (Figure 2 for T11b0 and P10b0 constructions; Figure 3 for T1907 construct) to remove the ambiguities of the various syntactic constructions.   Boons et al. (1976) claim that the meaning of the verb is related to the type of subject and the complement (human, concrete, non-animated, etc.); we created formal grammars that take into consideration the type of the object and the type of the complement. For the development of a formal grammar capable of correctly recognizing and translating the sentences which contain the verb accuser, we have to add other electronic dictionary resources and dictionaries for the detection of lexical units. We used the electronic dictionaries already integrated into the NooJ platform, such as:  Le DM dictionary of French words (Trouilleux, 2011). This linguistic resource is integrated into the NooJ platform and itcontains 67,997 entries composed of determiners, pronouns, prepositions, conjunctions, numerals, adverbs, nouns, interjections, adjectives and verbs.
The grammar we have constructed must know all possible constructions without any ambiguity. For this reason, we chose to create for each syntactic construct its path independently of the other constructions to remove all the ambiguities using the precise semantic traits (concrete, abstract, human etc.) of each argument.

Automatic translation and generation of verbal predicates into Arabic
The automatic translation process is applied to the communication predicates obtained after the analysis and recognition phases. We have already tried to implement our process of analysis and recognition of syntactic patterns on a newspaper corpus of Le Monde. We obtained results where the patterns and verbs found are disambiguated and annotated simply by the appropriate syntactic constructions (Figure 4).
We can see that the formal grammars allowed us not only to automatically recognize the different constructions, but also to translate the verb automatically into Arabic. However, our system not only allows us to translate a single verb but also the entire sentence.
Therefore, we added the grammars of translation ($v$AR) to each formal grammar. And we obtained the translation into Arabic of the different sentences that contain the verb accuser in its various constructions:  Translation of accuser+T11b0+AR= ‫الم‬ Our system has succeeded in automatically translating the sentences into Arabic taking into account the meaning of the verb which varies according to the construction.

47
The software has thus made it possible to generate sentences naturally into Arabic without syntactic and semantic ambiguities.
The naturalness of the generated text is due to an automatic semantic syntactic analysis of open corpus, based on a broad description of the vocabulary. Our example adds to the examples of Silberztein (2015Silberztein ( , 2016 to confirm that NooJ brings a significant qualitative leap for text generation.

Conclusion
We tried in this contribution to formalize the verb accuser as an example of verbs of speech, and to integrate it into the NooJ software. Thanks to the linguistic richness of the LVF verbal entries which helps the computer tools make a syntactic, semantic and morphological analysis of the verbs, we succeeded in automatically recognizing, extracting, and processing cases of accuser in a corpus of considerable size, the newspaper Le Monde.
This formalization was needed in order to obtain a reliable automatic translation. Thus, we created formal grammars to remove the ambiguity of the syntactic construct, since the meaning of the verb depends on the type of subject and the complement (human, concrete, abstract, etc.).
These grammars have led to the automatic recognition of syntactic constructions, which in turn removes ambiguities and generates sentences into Arabic taking into account the meaning of the verb in its original French context.