Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics

Lars Borin, Anju Saxena, Taraka Rama, Bernard Comrie


Abstract
Like many other research fields, linguistics is entering the age of big data. We are now at a point where it is possible to see how new research questions can be formulated - and old research questions addressed from a new angle or established results verified - on the basis of exhaustive collections of data, rather than small, carefully selected samples. For example, South Asia is often mentioned in the literature as a classic example of a linguistic area, but there is no systematic, empirical study substantiating this claim. Examination of genealogical and areal relationships among South Asian languages requires a large-scale quantitative and qualitative comparative study, encompassing more than one language family. Further, such a study cannot be conducted manually, but needs to draw on extensive digitized language resources and state-of-the-art computational tools. We present some preliminary results of our large-scale investigation of the genealogical and areal relationships among the languages of this region, based on the linguistic descriptions available in the 19 tomes of Grierson’s monumental “Linguistic Survey of India” (1903-1927), which is currently being digitized with the aim of turning the linguistic information in the LSI into a digital language resource suitable for a broad array of linguistic investigations.
Anthology ID:
L14-1175
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3137–3144
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/159_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Lars Borin, Anju Saxena, Taraka Rama, and Bernard Comrie. 2014. Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3137–3144, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics (Borin et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/159_Paper.pdf