publication . Conference object . 2018

TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources

Romary, Laurent; Tasovac, Toma;
Open Access
  • Published: 09 Sep 2018
  • Publisher: Zenodo
  • Country: France
Abstract
International audience; Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]).The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.
Subjects
free text keywords: dictionaries, modeling, TEI, TEI Lex-0, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [SCCO.LING]Cognitive science/Linguistics
Communities
  • DARIAH EU
Funded by
EC| ELEXIS
Project
ELEXIS
European Lexicographic Infrastructure
  • Funder: European Commission (EC)
  • Project Code: 731015
  • Funding stream: H2020 | RIA
Validated by funder
Any information missing or wrong?Report an Issue