Dafydd Gibbon

2016

pdf bib abs
Legacy language atlas data mining: mapping Kru languages
Dafydd Gibbon
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

An online tool based on dialectometric methods, DistGraph, is applied to a group of Kru languages of Côte d’Ivoire, Liberia and Burkina Faso. The inputs to this resource consist of tables of languages x linguistic features (e.g. phonological, lexical or grammatical), and statistical and graphical outputs are generated which show similarities and differences between the languages in terms of the features as virtual distances. In the present contribution, attention is focussed on the consonant systems of the languages, a traditional starting point for language comparison. The data are harvested from a legacy language data resource based on fieldwork in the 1970s and 1980s, a language atlas of the Kru languages. The method on which the online tool is based extends beyond documentation of individual languages to the documentation of language groups, and supports difference-based prioritisation in education programmes, decisions on language policy and documentation and conservation funding, as well as research on language typology and heritage documentation of history and migration.

2014

pdf bib abs
Annotation Pro + TGA: automation of speech timing analysis
Katarzyna Klessa | Dafydd Gibbon
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports on two tools for the automatic statistical analysis of selected properties of speech timing on the basis of speech annotation files. The tools, one online (TGA, Time Group Analyser) and one offline (Annotation Pro+TGA), are intended to support the rapid analysis of speech timing data without the need to create specific scripts or spreadsheet functions for this purpose. The software calculates, inter alia, mean, median, rPVI, nPVI, slope and intercept functions within interpausal groups, provides visualisations of timing patterns, as well as correlations between these, and parses interpausal groups into hierarchies based on duration relations. Although many studies, especially in speech technology, use computational means, enquiries have shown that a large number of phoneticians and phonetics students do not have script creation skills and therefore use traditional copy+spreadsheet techniques, which are slow, preclude the analysis of large data sets, and are prone to inconsistencies. The present tools have been tested in a number of studies on English, Mandarin and Polish, and are introduced here with reference to results from these studies.

2012

pdf bib abs
ULex: new data models and a mobile environment for corpus enrichment.
Dafydd Gibbon
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The Ubiquitous Lexicon concept (ULex) has two sides. In the first kind of ubiquity, ULex combines prelexical corpus based lexicon extraction and formatting techniques from speech technology and corpus linguistics for both language documentation and basic speech technology (e.g. speech synthesis), and proposes new XML models for the basic datatypes concerned, in order to enable standardisastion and data interchange in these areas. The prelexical data types range from basic wordlists through diphone tables to concordance and interlinear glossing structures. While several proposals for standardising XML models of lexicon types are available, these more basic pre-lexical, data types, which are important in lexical acquisition, have received little attention. In the second area of ubiquity, ULex is implemented in a novel mobile environment to enable collaborative cross-platform use via a web application, either on the internet or, via a local hotspot, on an intranet, which runs not only on standard PC types but also on tablet computers and smartphones and is thereby also rendered truly ubiquitous in a geographical sense.

2010

pdf bib abs
Medefaidrin: Resources Documenting the Birth and Death Language Life-cycle
Dafydd Gibbon | Moses Ekpenyong | Eno-Abasi Urua
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Language resources are typically defined and created for application in speech technology contexts, but the documentation of languages which are unlikely ever to be provided with enabling technologies nevertheless plays an important role in defining the heritage of a speech community and in the provision of basic insights into the language oriented components of human cognition. This is particularly true of endangered languages. The present case study concerns the documentation both of the birth and of the endangerment within a rather short space of time of a spirit language, Medefaidrin, created and used as a vehicular language by a religious community in South-Eastern Nigeria. The documentation shows phonological, orthographic, morphological, syntactic and textual typological features of Medefaidrin which indicate that typological properties of English were a model for the creation of the language, rather than typological properties of the enclaving language, Ibibio. The documentation is designed as part of the West African Language Archive (WALA), following OLAC metadata standards.

2009

pdf bib
Gesture Theory is Linguistics: On Modelling Multimodality as Prosody
Dafydd Gibbon
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

2008

pdf bib abs
An Automatic Close Copy Speech Synthesis Tool for Large-Scale Speech Corpus Evaluation
Dafydd Gibbon | Jolanta Bachan
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The production of rich multilingual speech corpus resources on a large scale is a requirement for many linguistic, phonetic and technological tasks, in both research and application domains. It is also time-consuming and therefore expensive. The human component in the resource creation process is also prone to inconsistencies, a situation frequently documented in cross-transcriber consistency studies. In the present case, corpora of three languages were to be evaluated and corrected: (1) Polish, a large automatically annotated and manually corrected single-speaker TTS unit-selection corpus in the BOSS Label File (BLF) format, (2) German and (3) English, the second and third being manually annotated multi-speaker story-telling learner corpora in Praat TextGrid format. A method is provided for supporting the evaluation and correction of time-aligned annotations for the three corpora by permitting a rapid audio screening of the annotations by an expert listener for the detection of perceptually conspicuous systematic or isolated errors in the annotations. The criterion for perceptual conspicuousness was provided by converting the annotation formats into the interface format required by the MBROLA speech synthesiser. The audio screening procedure is complementary to other methods of corpus evaluation and does not replace them.

2006

pdf bib abs
Feature-based Encoding and Querying Language Resources with Character Semantics
Baden Hughes | Dafydd Gibbon | Thorsten Trippel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.

pdf bib abs
A BLARK extension for temporal annotation mining
Dafydd Gibbon | Flaviane Romani Fernandes | Thorsten Trippel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The Basic Language Resource Kit (BLARK) proposed by Krauwer is designed for the creation of initial textual resources. There are a number of toolkits for the development of spoken language resources and systems, but tools for second level resources, that is, resources which are the result of processing primary level speech resources such as speech recordings. Typically, processing of this kind in phonetics is done manually, with the aid of spreadsheets multi-purpose statistics software. We propose a Basic Language and Speech Kit (BLAST) as an extension to BLARK and suggest a strategy for integrating the kit into the Natural Language Toolkit (NLTK). The prototype kit is evaluated in an application to examining temporal properties of spoken Brazilian Portuguese.

pdf bib abs
Discourse functions of duration in Mandarin: resource design and implementation
Dafydd Gibbon | Shu-Chuan Tseng
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A dedicated resource, consisting of annotated speech tools, and workflow design, was developed for the detailed investigation of discourse phenomena in Taiwan Mandarin. The discourse phenomena have functions which are associated with positions in utterances, and temporal properties, and include discourse markers (NAGE, NA, e.g. hesitation, utterance initiation), discourse particles (A, e.g. utterance finality, utterance continuity, focus, etc.), and fillers (UHN, hesitation). The distribution of particles in relation to their position in utterances and the temporal properties of particles are investigated. The results of the investigation diverge considerably from claims in existing grammars of Mandarin with respect to utterance position, and show in general greater length than for regular syllables. These properties suggest the possibility of developing an automatic discourse item tagger.

2004

pdf bib
Securing Interpretability: The Case of Ega Language Documentation
Dafydd Gibbon | Catherine Bow | Steven Bird | Baden Hughes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib abs
WALA: A Multilingual Resource Repository for West African Languages
Dafydd Gibbon | Firmin Ahoua | Eddi Gbéry | Eno-Abasi Urua | Moses Ekpenyong
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

The West African Language Archive (WALA) initiative has emerged from a number of concurrent projects, and aims to encourage local scholars to create high quality decentralised repositories documenting West African languages, and to make these repositories available to language communities, language planners, educationalists and scientists via an internet metadata portal such as OLAC (Open Language Archive Community). A wide range of criteria has to be met in designing and implementing this kind of archive. We discuss these criteria with reference to experiences in documentation work in three very different ongoing language documentation projects, on designing an encyclopaedia, on documenting an endangered language, and on creating a speech synthesiser. We pay special attention to the provision of metadata, a formal variety of catalogue or housekeeping information, without which resources are doomed to remain inaccessible.

pdf bib
Concept-based Queries: Combining and Reusing Linguistic Corpus Formats and Query Languages
Felix Sasaki | Andreas Witt | Dafydd Gibbon | Thorsten Trippel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Consistent Storage of Metadata in Inference Lexica: the MetaLex Approach
Thorsten Trippel | Felix Sasaki | Dafydd Gibbon
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Co-authors

Venues

lrec18
coling2
eacl2
paclic1
ws1
show all...

cl1

Dafydd Gibbon

2016

2014

2012

2010

2009

2008

2006

2004

2002

2000

1999

1992

1991

1988

1987

Co-authors

Venues