Jejueo talking dictionary: A collaborative online database for language revitalization

,


Introduction
The purpose of this paper is to present the ongoing development of the Jejueo Talking Dictionary as an example of applying interdisciplinary methodology to create an enduring, multipurpose record of an endangered language.In this paper I examine strategies for gathering extensive data to create a multimodal online platform aimed at a wide variety of uses and user groups.The Jejueo Talking Dictionary project is tailored to diverse user communities on Jeju Island, South Korea, where Jejueo, the indigenous language, is critically endangered and underdocumented, but where the population's smart phone penetration rate is 75% (Lee, 2014) and semispeakers are highly proficient users of technology (Song, 2012).The Jejueo Talking Dictionary is also intended for Jejueo speakers of varying degrees of fluency in Osaka, Japan, where up to 126,511 diasporic Jejuans reside (Southcott, 2013).A third aim of the Jejueo Talking Dictionary is to create extensive linguistic documentation of Jejeuo that will be available to the wider scientific community, as the vast majority of existing documentary materials on Jejueo are published in Korean.The Jejueo Talking Dictionary will serve as an online open-access repository of over 200 hours of natural and ceremonial language use, with interlinear glossing in Jejueo, Korean, Japanese and English.

Language context
Very closely related to Korean, Jejueo is the indigenous language of Jeju Island, South Korea.Jejueo has 5,000-10,000 speakers located throughout the islands of Jeju Province and in a diasporic enclave in Osaka, Japan.With most fluent speakers over 75 years old, Jejueo was classified as critically endangered by UNESCO in 2010.The Koreanic language family consists of at least two languages, Jejueo and Korean.Several regional varieties of Korean are spoken across the Korean peninsula, divided loosely along provincial lines.Jejueo and Korean are not mutually intelligible, owing to Jejueo's distinct lexicon and grammatical morphemes.Pilot research (Yang, 2013) estimates that 20-25% of the lexicons of Jejueo and Korean overlap, and a recent study (O'Grady, 2015) found that Jejueo is at most 12% intelligible to speakers of Korean on Korea's mainland. 1Jejueo conserves many Middle Korean phonological and lexical features lost to MSK, including the Middle Korean phoneme /ɔ/ and terms such as pɨzʌp : Jejueo pusʌp 'charcoal burner' (Stonham, 2011: 97).Extensive lexical and morphological borrowing from Japanese, Mongolian and Manchurian is evident in Jejueo, owing to the Mongolian colonization of Jeju in the 13 th and 14 th centuries, Japan's annexation of Korea and occupation of Jeju between 1910 and 1945, and centuries of trade with Manchuria and Japan (Martin, 1993;Lee and Ramsey, 2000).Several place names in Jeju are arguably Japonic in origin, e.g.Tamna, the first known name of Jeju Island (Kwen ,1994:167;Vovin, 2013).Moreover, several names for indigenous fruits and vegetables on Jeju are borrowed from Japanese, e.g.mik͈ aŋ 'orange'.Mongolic speakers left the lexical imprint of a robust inventory of terms describing horses and cows, e.g.mɔl 'horse'.Jejueo borrowed grammatical morphemes from the Tungusic language Manchurian, e.g the dative suffixal particle *de < ti 'to' (Kang, 2005).

Current status of Jejueo
The present situation in Jeju is one of language shift, where fewer than 10,000 people out of a population of 600,000 are fluent in Jejueo, and features of Jejueo's lexicon, morphosyntax and phonology are rapidly assimilating to 1 In a 2015 study O'Grady and Yang found that speakers of Korean from four provinces on the mainland had rates of 8-12% intelligibility for Jejueo based on a comprehension task of a one-minute recording of Jejueo connected speech.
Korean (Kang, 2005;Saltzman, 2014).Recent surveys on language ideologies of Jejueo speakers (Kim, 2011;Kim, 2013) show that a roughly diglossic situation is maintained by present day language ideologies.In a series of qualitative interviews on language ideologies, Kim (2013:33) finds common themes suggesting that Korean is used as a means of showing respect to unfamiliar interlocutors, as Korean "...is perceived as the language of distance and rationality".Likewise Jejueo is considered appropriate to use whenever interpersonal boundaries, such as distinctions within social hierarchies are perceived less salient than the intimacy and mutual trust two or more people share.(Kim, 2013).Yang's (2013) pilot survey on language attitudes finds that while community members recognize Jejueo as a marker of Jeju identity worth transmitting to future generations, few speakers feel empowered to reverse the pattern of language shift to Korean.There are no longer monolingual speakers of Jejueo on Jeju or in Osaka.The examples below are samples of the same declarative construction produced by a fluent Jejueo speaker in (1), a typical younger Jejueo semi-speaker in (2), and the Korean translation (3).Jejueo morphemes in (2) are in boldface.
(  3) have several cognate forms, the majority of grammatical particles are genetically unrelated.
The accusative particle -ɯl is shared by Korean and Jejueo, although in Jejueo the nominative and accusative markers are most commonly dropped.In example (2) the construction the Jejueo morphemes have been replaced by Korean morphemes, save 'grandmother' and the verbal ending, a pattern typical of nonfluent speakers of Jejueo (Saltzman, 2014).

Jejueo lexicography and sustainability
Because most Korean linguists view Jejueo as a conservative dialect of Korean (Sohn, 1999;Song, 2012), lexical documentation of Jejueo has not been a scientific priority.The few Jejueo lexicographic projects have been carried out in the last 30 years by linguists native to Jeju Island and are all bilingual in Korean and Jejueo.Two large-scale Jejueo-Korean print dictionaries were published (Song, 2007;Kang, 1995), though Kang's oftcited dictionary was given a small distribution to local community centers and libraries, and was not made commercially available.In 2011 Kang and Hyeong published an abridged Korean-Jejueo version of the dictionary.The remaining lexicographic studies of Jejueo are a handful of dictionaries tailored to individual semantic domains, such as 재주어 속담 사전 [Jejueo Idiom Dictionary] ( Ko, 2002), 무가본풀이 사전 [Jeju Dictionary of Shamanic Terms] (Jin, 1991), and 문학 속의 제주 방언 [Jeju Dialect in Literature] (Kang et al., 2010), an alphabetized introduction to the Jejueo lexicon through Jeju folk literature.No major reference materials on Jejueo's lexicon or other linguistic features provide English glossing, although an English sketch grammar of Jejeuo is currently in development (Yang, in preparation).
It is well established that lexicographic materials can contribute significant symbolic support to a given language variety (Corris et al., 2004;Crowley, 1999;Hansford, 1991), particularly for unwritten non-prestige codes.Bartholomew and Schoenhals (1983) note that publication of lexicographic materials may even help indigenous languages be perceived as 'real languages' in the sociolinguistic marketplace.
The lexicographic materials alone, however, do not engender sufficient motivation for a speech community to maintain the use of their heritage language.Fishman (1991) warns against dictionary projects that become 'monuments' to a language rather than stimulating language use and intergenerational transmission.
A recent study by O'Grady (2015) found that the level of Jejueo transmission between generations shows a drastic decline.Given the task of answering content questions based on a one-minute recording of Jejueo connected speech, heritage speakers in the 50-60 age bracket demonstrated a comprehension level of 89%, while heritage speakers between 20 and 29 showed just 12% comprehension, equal to that of citizens of Seoul.In my previous field work in Jeju I found fluent Jejueo speakers and most semi-speakers unmotivated to access available Jejueo lexicographic materials.While these lexicographic works provide extensive data for the scholarly community, they arguably contribute to a growing body of Jejueo documentation and revitalization projects which are discrete, temporary and organized from the 'top-down' without community collaboration.
Sun Duk Mun, a Jejueo linguist with the Jeju Development Institute (JDI), reasons that Jeju parents must take pride in Jejueo and use it in the home, as Jeju teachers should allow Jejueo in classrooms, in order to expand Jejueo's declining domains of use (Southcott, 2015).However, in a highly competitive society where the majority of classroom hours are allocated to the Seoul-based national standard language (Song, 2012), and even entertainment media reflects the nation's emphasis on 'correct' usage of Korean, status planning for Jejueo revitalization is crucial.Beyond Kim's (2013) and Kim's (2011) studies on Jejueo language ideologies, no sociolinguistic research on Jejueo has been conducted, leaving issues like bilingualism, domains of use and the socio-historical factors for language shift speculative at most in the literature.A successful campaign for the reversal of Jejueo language endangerment will hinge on the development of tools for documentation and language learning which reflect the socio-historical background and desires of the speech communities involved.To initiate such a campaign, ideological clarification and collaboration between the Jeju provincial government, Jejueo scholars, native speakers and educators is needed.The aim of the Jejueo Talking Dictionary is to match the diverse desires of Jeju users with collaborative methodology for data collection, as we will see in the next section.

Community-based data collection
A primary goal of the Jejueo Talking Dictionary project is to train language activists in field linguistics to create a sustainable infrastructure for data collection, analysis, publication and archiving.In this way, Jeju community members will drive the scope of the Jejueo Talking Dictionary, in terms of adding the types of linguistic data that are found most useful to Jejueo-speaking communities in Jeju and Osaka.By training community members in linguistic documentation, Jejueo speakers and semispeakers will have a foundation in field linguistics from which to build collaborative networks for crowdsourcing and status planning with Jejueo scholars, the provincial government, educators and elderly fluent speakers.At present, the team of foreign and local linguists developing the Jejueo Talking Dictionary is training local college students and activists at Jeju Global Inner Peace.Members of the team record elderly fluent speakers of Jejueo, annotate the recordings, and upload files into an open-access working corpus of data using Lingsync, a free online program for sharable audio and video files of annotated linguistic data.Linguists from Jeju National University, Jejueo specialists from the Jejueo Preservation Society, and I analyze the Jejueo data and check its accuracy with native speakers, ensuring the quality of the corpus.In September, 2016 we will develop the corpus into a free online program and an Android application for smartphones.

Building an interdisciplinary network
A second goal of the Jejueo Talking Dictionary project is to build an interdisciplinary network for data collection.Our team of linguists from Jeju and abroad, language preservationists, activists and community elders have recruited ethnomusicologists, historians, experts in the indigenous religion and anthropologists to lend their expertise to the collection of lexemes and texts of various genres.In this way we aim to create a methodology of interdisciplinary data collection that builds a multidimensional record of Jejueo, to serve a wide range of uses and user groups.
At present we have incorporated approximately 200 hours of previously unpublished annotated video data including Jeju oral history, shamanic rituals, indigenous music and cuisine preparation.Our team aims to enlist the support of ethnobotanists and ethnozoologists who can assist the team in collecting data on the indigenous flora and fauna of Jeju Island.In the future, data collection can be connected to language revitalization programs, such as master-apprentice programs (see Hinton, 1997), where semi-speakers join speakers of Jejueo in their usual activities farming, foraging for roots, herbs and vegetables in the mountains, picking seasonal fruit, and diving for seafood near Jeju's shores.

Contents of Jejueo Talking Dictionary
The Jejueo Talking Dictionary is intended to serve as a tool for both cultural education and language acquisition.With this in mind, we give equal attention to the collection of archaic and ceremonial speech, and the most frequently used lexemes and expressions.The Jejueo Talking Dictionary will compile existing annotated video corpora of Jejueo songs, conversational genres and regional mythology into a multimedia database, supplemented by original annotated video recordings of natural language use.Lexemes and definitions will be accompanied by audio files of their pronunciation, listings of frequent collocations, and occasional photos, for items native to Jeju.The audio and video data will be tagged in Jejueo, Korean, Japanese and English, allowing users to search or browse the dictionary in any of these languages.Like Korean, Jejueo features complex agglutination of case, TMA and discourse register particles on verbs and nouns (Sohn, 1999).Videos showing a range of discourse types will have interlinear glossing, so that users may search Jejueo particles as well as lexemes and grammatical topics, and find the tools to construct original Jejeuo speech.At the time of writing, we have recorded and annotated approximately 500 audio files of individual lexemes and 300 hours of video data.Below I itemize genres of Jejueo speech we have collected.

Inclusiveness versus usability
As the Jejueo Talking Dictionary is intended to serve a variety of uses, creating a cultural and linguistic repository of Jejueo stands somewhat at odds with developing a userfriendly dictionary for language education.We have found one solution to be to develop separate modules for the dictionary (see Vamarasi, 2013), so that it may be viewed according to individual purposes.In addition to viewing the dictionary page translated in Jejueo, Korean, Japanese and English, users can access separate modules from the main page.At present, these include modules for language lessons, browsing cultural topics, browsing photos, and browsing conversational genres and grammatical features.In the language education module, users access Jejueo lessons based around the most frequently used lexemes in the language, illustrated with photos.For this module we are also developing language-learning games and a 'word of the day' feature.All of the lexical entries in the dictionary and lexical items used in the narrative videos are tagged, so that searching a Jejueo word from any of the modules brings up a textual sample of the lexeme in a grammatical construction, and videos featuring the lexeme in natural language use, where that data is available.Transcripts of all of the videos may be downloaded and printed, and users may select a transcript with interlinear glossing, the Jejueo transcription only, or a translation in Korean, Japanese or English.

Language standardization
It is important to note that the standardization of Jejueo orthography is still ongoing, and among the several regional varieties of Jejueo, none has been designated as the standard.For the Jejueo Talking Dictionary project we adopt the orthographic preferences of the most recent Jejueo lexicographic materials (Kang and Hyeon 2011;Kang, 2007), and list headwords and regional variants in the order assigned in those materials.

Conclusion
An open-ended lexicographic resource such as an online talking dictionary lends itself well to incorporating a variety of data designed to serve diverse uses and user groups.The Jejueo Talking Dictionary can be continuously and cost-effectively modified as we obtain more data and gain feedback on the usability of the dictionary.We aim for the Jejueo Talking Dictionary to be an accessible multipurpose repository of the Jejueo language, where the content and the collection of linguistic data are both driven by the Jeju community.With Jejueo in a state of critical endangerment, incorporating community members in the development and dissemination of language-learning materials is key.