International audience; This paper describes the workflow of the Grammateus project, from gathering data on Greek documentary papyri to the creation of a web application. The first stage is the selection of a corpus and the choice of metadata to record: papyrology specialists gather data from printed editions, existing online resources and digital facsimiles. In the next step, this data is transformed into the EpiDoc standard of XML TEI encoding, to facilitate its reuse by others, and processed for HTML display. We also reuse existing text transcriptions available on . Since these transcriptions may be regularly updated by the scholarly community, we aim to access them dynamically. Although the transcriptions follow the EpiDoc guidelines, the wide diversity of the papyri as well as small inconsistencies in encoding make data reuse challenging. Currently, our data is available on an institutional GitLab repository, and we will archive our final dataset according to the FAIR principles.
Publication . Part of book or chapter of book . 2016
International audience; This chapter gives an overview of one possible staged methodology for structuring LCI data by presenting a new scientific object, LEarning and TEaching Corpora (LETEC). Firstly, the chapter clarifies the notion of corpora, used in so many different ways in language studies, and underlines how corpora differ from raw language data. Secondly, using examples taken from actual online learning situations, the chapter illustrates the methodology that is used to collect, transform and organize data from online learning situations in order to make them sharable through open-access repositories. The ethics and rights for releasing a corpus as OpenData are discussed. Thirdly, the authors suggest how the transcription of interactions may become more systematic, and what benefits may be expected from analysis tools, before opening the CALL research perspective applied to LCI towards its applications to teacher-training in Computer-Mediated Communication (CMC), and the common interests the CALL field shares with researchers in the field of Corpus Linguistics working on CMC.
International audience; One of the funded project proposals under DARIAH’s Open Humanities call 2015 was “Open History: Sustainable digital publishing of archival catalogues of twentieth-century history archives”. Based on the experiences of the Collaborative EuropeaN Digital Archival Research Infrastructure (CENDARI) and the European Holocaust Research Infrastructure (EHRI), the main goal of the “Open History” project was to enhance the dialogue between (meta-)data providers and research infrastructures. Integrating archival descriptions – when they were already available – held at a wide variety of twentieth-century history archives (from classic archives to memorial sites, libraries and private archives) into research infrastructures has proven to be a major challenge, which could not be done without some degree of limited to extensive pre-processing or other preparatory work. The “Open History” project organized two workshops and developed two tools: an easily accessible and general article on why the practice of standardization and sharing is important and how this can be achieved; and a model which provides checklists for self-analyses of archival institutions. The text that follows is the article we have developed. It intentionally remains at a general level, without much jargon, so that it can be easily read by those who are non-archivists or non-IT. Hence, we hope it will be easy to understand for both those who are describing the sources at various archives (with or without IT or archival sciences degrees), as well as decision-makers (directors and advisory boards) who wish to understand the benefits of investing in standardization and sharing of data. It is important to note is that this text is a first step, not a static, final result. Not all aspects about standardization and publication of (meta-)data are discussed, nor are updates or feedback mechanisms for annotations and comments discussed. The idea is that this text can be used in full or in part and that it will include further chapters and section updates as time goes by and as other communities begin using it. Some archives will read through much of these and see confirmation of what they have already been implementing; others – especially the smaller institutions, such as private memory institutions – will find this a low-key and hands-on introduction to help them in their efforts.