publication . Article . 2018

Bridging the Gaps between Digital Humanities, Lexicography, and Linguistics: A TEI Dictionary for the Documentation of Mixtepec-Mixtec

Bowers , Jack; Romary , Laurent;
English
  • Published: 01 Jan 2018
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; This paper discusses the digital dictionary component in an ongoing language documentation project for the Mixtepec-Mixtec language (iso 639-3: mix). Mixtepec-Mixtec (Sa’an Savi ‘rain language’) is an Otomonguean language spoken by roughly 9,000–10,000 people in the Juxtlahuaca district of Oaxaca and in parts of the Guerrero and Puebla states of Mexico. Creating a digital dictionary for an under-resourced language entails a number of challenges that require unique and nuanced encoding solutions in which a delicate balance between the linguistic content, data structure, potential linked resources, and editorial metadata must be found. Herein we demonstrate how we use TEI to create a reusable, extensible, and machine readable language resource with an emphasis on how our solutions using a combination of novel and established TEI dictionary structures enable us to address our specific needs for Mixtepec-Mixtec and also provide a relevant roadmap for similar under-resourced language projects.
Subjects
free text keywords: Dictionary encoding, TEI, Mixtec, Digital humanities, Language documentation, [SCCO.LING]Cognitive science/Linguistics, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], [ INFO.INFO-DL ] Computer Science [cs]/Digital Libraries [cs.DL], [ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL]
Any information missing or wrong?Report an Issue