Conversing with the elderly in Latin America: a new cohort for multimodal, multilingual longitudinal studies on aging

Many studies have found that language alterations can aid in the detection of certain medical afﬂictions. In this work, we present an ongoing project for recollect-ing multilingual conversations with the elderly in Latin America. This project, so far, involves the combined efforts of psy-chogeriatricians, linguists, computer scientists, research nurses and geriatric care-givers from six institutions across USA, Canada, Mexico and Ecuador. The rec-ollections are being made available to the international research community. They consist of conversations with adults aged sixty and over, with different nationalities and socio-economic backgrounds. Con-versations are recorded on video, transcribed and time-aligned. Additionally, we are in the process of receiving written texts—recent or old—authored by the participants, provided voluntarily. Each participant is recorded at least twice a year to allow longitudinal studies. Furthermore, information such as medical history, educational background, economic level, occupation, medications and treatments is being registered to aid conducting research on treatment progress and pharmacological effects. Potential studies derived from this work include speech, voice, writing, discourse, and facial and


Introduction
The Carolinas Conversations Collection (Pope and Davis, 2011), a project for recollecting conversations with elderly people that live in North and South Carolina, started in 2008. This project was initially supported by the USA National Library of Medicine. For the collection, the conversations were transcribed, marked, time-aligned and made available to the international research community by means of a secured website 1 . The collection has grown steadily since then, having, at present, over 460 conversations with adults over sixty years old, either healthy or suffering from any medical condition. A fourth of these conversations were made with participants afflicted with Alzheimer's disease.
In 2015, we started to increase the coverage of this collection to incorporate different languages. The first additional language to be incorporated is Latin-American Spanish. We are currently adding conversations with new participants; elderly Spanish speakers from Ecuador and Mexico. Additionally, we are incorporating new information and language modalities to increase the robustness of possible studies that may use this corpus. So far, this project has engaged involvement through combined efforts of six institutions across four different countries.

Methodology
The recollections are being made at least twice a year with each participant. In Ecuador, we are working in collaboration with "Universidad Técnica Particular de Loja" (UTPL), and with the "Perpetuo Socorro" Foundation, a home for elderly people. In Mexico, the psychogeriatricians from the Psychiatric Hospital "Fray Bernardino Á lvarez" have agreed to work as our medical experts and advisors for this project. Furthermore, the Foundation and the Psychiatric Hospital have made arrangements to allow us to communicate with their residents, patients and their guardians, and invite them to participate in our Latin American recollections.
In the case of Ecuador, none of the involved institutions has an Institutional Review Board (IRB) for protection of human subjects, or any formal ethics guidelines. For this reason, our institutional IRB took over that role. Consequently, a person authorized via the protocol and having a Canadian or American certification of training in ethics for research with human subjects, must be present, in person, during all recollections. In the case of Mexico, the hospital has its own IRB, and their staff are trained in ethics. This allows them to recollect the conversations without any member of the team from Canada or the USA needing to be present.
Before the recordings, the participants and their caregivers are given a short explanation of the project and its aims. Provided they agree to participate in the project, they sign an informed consent form, and with the help of their primary psychiatric care providers or their primary caregiver, we fill a questionnaire with the medical information of the participant. In this questionnaire we request all the medications that the participants are actively taking, as well as their medical conditions. With first-time participants, we also record their demographic data, such as birth date, gender, educational level, occupation (prior to retirement), first language, and ethnic affiliation. To protect the privacy of the participants, all names are replaced by aliases. In the case of Ecuador, aliases are randomly chosen from a pool of names of characters or writers of classic Latin American novels; in the case of Mexico, they are chosen from names of congresspeople. We select aliases that correspond with the gender of the participants.
The interviewers are the caregivers at the Foundation (Ecuador), and the primary psychiatric care providers (Mexico). All interviews take place in the Foundation's and the psychiatric hospital's facilities. We believe that having free topics, and a familiar interviewer and environment, helps provide a more comfortable atmosphere for the participants.
All our interviewers have been trained with techniques to motivate the participants to talk, even if they are afflicted by some type of cognitive impairment. We've created animated videos and other training materials to instruct interviewers on how to incite free conversations with patients. The strategies that we provide, come from practices that have been developed during the years of experience interviewing elderly participants in North and South Carolina for this collection. These materials are available online 2 to facilitate the longdistance knowledge exchange.
While training the interviewers, we usually start by explaining the context of the project. We then emphasize the importance of letting the participants talk and express themselves as much as possible. We ask the interviewers to be patient and allow the participants some time to process their questions and then answer. We also give them cues such as repeating the last utterance of the participants when they are stuck; giving encouraging feedback and signs of interest, such as making eye contact, responding with interjections, corporal and facial expressions according to the mood of the conversation; and keeping the flow of the conversation by mentioning any information that they have gathered about the participants during the time of knowing them.
The conversations are free in the sense that there is no specific theme to talk about, although the most common topics are the early lives of the participants, their hobbies, their health and their views on life in general. There is no time limit to these conversations. Some of the common questions to start the flow of the conversation are: "Tell us about your life", "What do you like to do?", "How was your childhood?", "Do you have any hobbies?", "Who is accompanying you today?", "Do you have any pet?", "What did you use to do for a living?".
The conversations from Mexico and Ecuador are being manually transcribed and time-aligned by our collaborators of the Linguistic Engineering Group (LEG) at the National University of Mexico. We selected the LEG group due to their vast experience in the creation of corpora 3 in Spanish. The transcriptions are labelled with markings that indicate pauses, interruptions, external noises, participant's noises (e.g., laughter, crying, coughing, hawkings), intonation and emphasis (e.g., whis-pering, yelling), actions (e.g., winking, hand gesturing, finger snapping, clapping), and unconventional pronunciations.
In addition to the recordings of the conversations, at Mexico we are also asking the participants and/or their guardians for copies (digital or physical) of written texts, such as old letters, messages, etc., authored by the participants, recently or in years prior to this study, including letters from their youth or middle age. This is to encourage research in written analysis, such as the famous Nun Study (Snowdon et al., 1996).

Description of the samples
Recollections in Ecuador started in May, 2015.
For the first series we interviewed 12 participants, and recorded a total of 15 conversations. The second recollection was made on January, 2016, and it incorporated 4 new participants and a total of 10 interviews. So far, the cumulative recorded time of conversations in Ecuador is over six hours and 45 minutes, and the average length of the conversations is 16 minutes. The participants' ages range from 70 to 91 years old, with an average age of 83 years old (see Table 1).
We started the recollections in Mexico in February, 2016. For these recollections, the psychiatric care providers interview the participants after their routine consultations. Therefore, all recollections and follow-ups are carried out throughout the year. While writing this paper, the recollections in Mexico have just begun and, so far, they included 9 participants, all female. However, we estimate recording at least one conversation per week. Here the participants' ages range from 61 to 82 years old, with an average age of 69 years old.  Table 1: Socio-demographic overview of the participants of the collection As shown in Table 1, the majority of our participants in all countries are female. We attribute this phenomenon to two main factors: first and foremost, women have shown a significantly higher willingness, in comparison to men, to participate in this project, especially in Mexico. Secondly, the age expectancy of women is higher than men, for which the elderly male population is smaller. We are currently making efforts to increase the number of male participants to balance the sample.

Implications, applications and prospects
The longitudinal, multilingual and multimodal attributes of our collection, as well as the registration and follow up of the medical treatments taken by the participants and their demographic information, will allow researchers to perform a wide variety of studies. Some of these studies have already been tackled before. However in most cases authors have used small monolingual and homogeneous samples that do not allow the possibility of generalizing. Furthermore, many of the datasets used for these studies are not shared to the research community, limiting the advancement of research. Our collection has the advantage of containing a multiethnic sample, not to mention the heterogeneity gained by including participants from three different countries. These attributes will make for robust research that will support the study of intra-language and inter-language variations, as well as intermodal linguistic analyses (see Figure 1). Additionally, it will allow control for alterations attributable to race, demographic factors, specific diseases, medications and treatments. Longitudinal studies will allow following the course of aging in the elderly, and the differences between a healthy versus a pathological decline. This collection also provides data to improve automatic transcription and face recognition for this particular cohort, which tends to present particular challenges. Some of the clearest research possibilities to be performed with this collection are those focused on the improvement of communication with the elderly, and medical applications.

Improving communication
It is important to maintain and preserve communication with the elderly, especially since it has been suggested (Arkin, 2007) that maintaining language-enriched conversations along with exercise can delay the effects of dementia. Our col- lection not only contains the utterances and transcriptions of the elderly participants, but it also includes the entire transcription of the exchanges with the interviewers. This allows performing studies to improve communication by analyzing which strategies prove more successful in promoting conversations with the elderly. Other authors (Davis, 2005; have made a strong emphasis on the importance of preserving communication with elderly people, and have worked in the development of specific communication strategies, particularly with those suffering from dementia. In addition to explicit linguistic barriers, there are other factors that limit our ability to communicate with elderly people. For example, Freudenberg et al. (2015) found out that young people have trouble correctly interpreting facial expressions in the elderly, often perceiving neutral expressions as negative emotions. This in part makes studying emotions in this population a challenge, but in doing so, could provide insights on how to preserve an effective communication with them. However, analysis of emotions have other purposes, since alterations in the expression of emotions can show signs of certain disorders (Hamm et al., 2014;Adams and Oliver, 2011).

Medical applications
Automatic language analysis for studying neurodegenerative diseases in elderly people has been gaining momentum in recent years. Authors like Jarrold et al. (2010) and (2014) (2014) and (2015), Guerrero et al. (2016), López-de-Ipiña et al. (2015, and König et al. (2015), have studied language alterations that may aid in the automatic detection, or even prediction, of Mild Cognitive Impairment and Alzheimer's disease in its mild and moderate stages, with promising results. Additionally, Goberman et al. (2010), Holtgraves et al. (2013), and Cardona et al. (2013), have studied the linguistic features associated with Parkinson's disease. To support the furthering of these types of research, we prioritize the inclusion of participants suffering from different cognitive and mental afflictions (see Table 2).

Conclusions and future work
In this paper we presented a report of our first recollections of conversations with elderly people in Latin America, as well as the characteristics of this ongoing multidisciplinary multicenter research project. We envisage to continue these recollections for the following two to five years. Additionally, we are initiating the necessary collaboration agreements with Canadian institutions to incorporate a cohort with Canadian French-speakers and English-speakers to our collection. With this cohort we will add a new language and an English variation. Furthermore, in Ecuador we are making arrangements to incorporate some elderly Quechua-speakers to our sample. To our knowledge, there is no available research on linguistic analysis of this indigenous population. Finally, we are currently working on our first research using this corpus. We believe that our recollections can be of use for performing speech, voice, writing, discourse, and facial and corporal expressionbased analysis to further our understanding about the progression of cognitive degenerative diseases, and ultimately to help improving our communication strategies with the elderly, thus ameliorating their quality of life.