publication . Article . 2018

Bridging the Gaps between Digital Humanities, Lexicography, and Linguistics: A TEI Dictionary for the Documentation of Mixtepec-Mixtec

Bowers, Jack; Romary, Laurent;
English
  • Published: 01 Jan 2018
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; This paper discusses the digital dictionary component in an ongoing language documentation project for the Mixtepec-Mixtec language (iso 639-3: mix). Mixtepec-Mixtec (Sa’an Savi ‘rain language’) is an Otomonguean language spoken by roughly 9,000–10,000 people in the Juxtlahuaca district of Oaxaca and in parts of the Guerrero and Puebla states of Mexico. Creating a digital dictionary for an under-resourced language entails a number of challenges that require unique and nuanced encoding solutions in which a delicate balance between the linguistic content, data structure, potential linked resources, and editorial metadata must be found. Here...
Subjects
free text keywords: [ INFO.INFO-DL ] Computer Science [cs]/Digital Libraries [cs.DL], [ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL], Dictionary encoding, TEI, Mixtec, Digital humanities, Language documentation, [SCCO.LING]Cognitive science/Linguistics, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
34 references, page 1 of 3

Auer, Sören, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web, edited by Karl Aberer et al. Lecture Notes in Computer Science, Vol. 4825. Berlin and Heidelberg: Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540-76298-0_52

Austin, Peter K. 2006. Data and language documentation. In Essentials of Language Documentation, edited by Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel, 87-112. Berlin: Mouton de Gruyter.

Bański, Piotr, Jack Bowers, and Tomaž Erjavec. 2017. TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms. In Electronic Lexicography in the 21st Century: Proceedings of eLex 2017 Conference, edited by Iztok Kosem et al., 485-94. Available at https://elex.link/elex2017/ proceedings-download/

Boersma, Paul, and David Weenink. 2017. Praat, a system for doing phonetics by computer (Version 6.0.28). Retrieved from http://www.praat.org/

Boroditsky, Lera. 2000. Metaphoric structuring: Understanding time through spatial metaphors. Cognition 75.1: 1-28.

Bowers, Jack. 2016. A cognitive analysis of Mixtepec-Mixtec body part terms. In La grammatical de las expressiones de partes del cuerpo. Lima, Peru: PUCP.

Bowers, Jack, Axel Harold, and Laurent Romary. 2018. TEI-Lex0 Etym - towards terse recommendations for the encoding of etymological information. Presented at the TEI Conference and Members' Meeting. Tokyo, Japan.

Bowers, Jack, and Laurent Romary. 2016. Deep encoding of etymological information in TEI. Journal of the Text Encoding Initiative (Issue 10). https://doi. org/10.4000/jtei.1643

--. 2018. Encoding Mixtepec-Mixtec etymology in TEI. Presented at the TEI Conference and Members' Meeting. Tokyo, Japan.

Brugman, Claudia, and Monica Macaulay. 1986. Interacting semantic systems: Mixtec expressions of location. In Annual Meeting of the Berkeley Linguistics Society 12: 315-27. [OpenAIRE]

Clausner, Timothy C., and William Croft. 1999. Domains and image schemas. Cognitive Linguistics 10: 1-32.

Czaykowska-Higgins, Ewa , Martin D. Holmes, and Sarah M. Kell. 2014. Using TEI for an endangered language lexical resource: The Nxaʔamxcí n Database-Dictionary Project. Language Documentation & Conservation 8: 1-37.

Farrar, Scott, and Terry Langendoen. 2003. A linguistic ontology for the semantic web. Glot International 7.3: 97-100.

Fellbaum, Christiane. 2005. WordNet and wordnets. In Encyclopedia of Language and Linguistics, 2nd edn., edited by K. Brown, 665-70. Oxford: Elsevier.

--. 2010. WordNet. In Theory and Applications of Ontology: Computer Applications, edited by Roberto Poli, Michael Healy, and Achilles Kameas, 231-43. Dordrecht: Springer.

34 references, page 1 of 3
Abstract
International audience; This paper discusses the digital dictionary component in an ongoing language documentation project for the Mixtepec-Mixtec language (iso 639-3: mix). Mixtepec-Mixtec (Sa’an Savi ‘rain language’) is an Otomonguean language spoken by roughly 9,000–10,000 people in the Juxtlahuaca district of Oaxaca and in parts of the Guerrero and Puebla states of Mexico. Creating a digital dictionary for an under-resourced language entails a number of challenges that require unique and nuanced encoding solutions in which a delicate balance between the linguistic content, data structure, potential linked resources, and editorial metadata must be found. Here...
Subjects
free text keywords: [ INFO.INFO-DL ] Computer Science [cs]/Digital Libraries [cs.DL], [ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL], Dictionary encoding, TEI, Mixtec, Digital humanities, Language documentation, [SCCO.LING]Cognitive science/Linguistics, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
34 references, page 1 of 3

Auer, Sören, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web, edited by Karl Aberer et al. Lecture Notes in Computer Science, Vol. 4825. Berlin and Heidelberg: Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540-76298-0_52

Austin, Peter K. 2006. Data and language documentation. In Essentials of Language Documentation, edited by Jost Gippert, Nikolaus P. Himmelmann, and Ulrike Mosel, 87-112. Berlin: Mouton de Gruyter.

Bański, Piotr, Jack Bowers, and Tomaž Erjavec. 2017. TEI-Lex0 guidelines for the encoding of dictionary information on written and spoken forms. In Electronic Lexicography in the 21st Century: Proceedings of eLex 2017 Conference, edited by Iztok Kosem et al., 485-94. Available at https://elex.link/elex2017/ proceedings-download/

Boersma, Paul, and David Weenink. 2017. Praat, a system for doing phonetics by computer (Version 6.0.28). Retrieved from http://www.praat.org/

Boroditsky, Lera. 2000. Metaphoric structuring: Understanding time through spatial metaphors. Cognition 75.1: 1-28.

Bowers, Jack. 2016. A cognitive analysis of Mixtepec-Mixtec body part terms. In La grammatical de las expressiones de partes del cuerpo. Lima, Peru: PUCP.

Bowers, Jack, Axel Harold, and Laurent Romary. 2018. TEI-Lex0 Etym - towards terse recommendations for the encoding of etymological information. Presented at the TEI Conference and Members' Meeting. Tokyo, Japan.

Bowers, Jack, and Laurent Romary. 2016. Deep encoding of etymological information in TEI. Journal of the Text Encoding Initiative (Issue 10). https://doi. org/10.4000/jtei.1643

--. 2018. Encoding Mixtepec-Mixtec etymology in TEI. Presented at the TEI Conference and Members' Meeting. Tokyo, Japan.

Brugman, Claudia, and Monica Macaulay. 1986. Interacting semantic systems: Mixtec expressions of location. In Annual Meeting of the Berkeley Linguistics Society 12: 315-27. [OpenAIRE]

Clausner, Timothy C., and William Croft. 1999. Domains and image schemas. Cognitive Linguistics 10: 1-32.

Czaykowska-Higgins, Ewa , Martin D. Holmes, and Sarah M. Kell. 2014. Using TEI for an endangered language lexical resource: The Nxaʔamxcí n Database-Dictionary Project. Language Documentation & Conservation 8: 1-37.

Farrar, Scott, and Terry Langendoen. 2003. A linguistic ontology for the semantic web. Glot International 7.3: 97-100.

Fellbaum, Christiane. 2005. WordNet and wordnets. In Encyclopedia of Language and Linguistics, 2nd edn., edited by K. Brown, 665-70. Oxford: Elsevier.

--. 2010. WordNet. In Theory and Applications of Ontology: Computer Applications, edited by Roberto Poli, Michael Healy, and Achilles Kameas, 231-43. Dordrecht: Springer.

34 references, page 1 of 3
Any information missing or wrong?Report an Issue