International audience; In this paper we describe the development and evaluation of a visual analytics tool to support historical research. Historians continuously gather data related to their scholarly research from archival visits and background search. Organising and making sense of all this data can be challenging as many historians continue to rely on analog or basic digital tools. We built an integrated note-taking environment for historians which unifies a set of func-tionalities we identified as important for historical research including editing, tagging, searching, sharing and visualization. Our approach was to involve users from the initial stage of brainstorming and requirement analysis through to design, implementation and evaluation. We report on the process and results of our work, and conclude by reflecting on our own experience in conducting user-centered visual analytics design for digital humanities.
International audience; Several Research Infrastructures(RIs)exist in the Humanities and Social Sciences, some –such as CLARIN, DARIAH and CESSDA –which address specific areas of interest, i.e. linguistic studies, digital humanities and social science data archives. RIs are also unique in their scope and application, largely tailored to their specific community needs. However, commonalities do exist and it is recognised that benefits are to be gained from these such as efficient use of resources, enabling multi-disciplinary research and sharing good practices. As such,a bridging project PARTHENOS has worked closely with CLARIN and DARIAH as well as ARIADNE (archaeology), CENDARI (history), EHRI (holocaust studies) and E-RIHS (heritage science) to iden-tify, develop and promote these commonalities. In this paper, we present some specif-ic examples of cross-discipline and trans-border applications arising from joint RI collaboration, allowing for entirely new avenues of research
International audience; CIDOC CRM is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The Semantic Web with its Linked Open Data cloud enables scholars and cultural institutions to publish their data in RDF, using CIDOC CRM as an interlingua that enables a semantically consistent re-interpretation of their data. Nowadays more and more projects have done the task of mapping legacy datasets to CIDOC CRM, and successful Extract-Transform-Load data-integration processes have been performed in this way. A next step is enabling people and applications to actually dynamically explore autonomous datasets using the semantic mediation offered by CIDOC CRM. This is the purpose of OpenArchaeo, a tool for querying archaeological datasets on the LOD cloud. We present its main features: the principles behind its user friendly query interface and its SPARQL Endpoint for programs, together with its overall architecture designed to be extendable and scalable, for handling transparent interconnections with evolving distributed sources while achieving good efficiency.
International audience; This paper presents the work carried out within the EU Cendari project to provide an appropriate customisation of the EAG format that would fulfil the expectations of researchers in contemporary and medieval history describing where they could find collections and documents of specific interests. After describing the general data landscape that we have to deal with in the Cendari project, we specifically address the data entry and acquisition scenario to identify how this impacts on the actual data structures to be handled. We then present how we implemented such constraints by means of a full TEI/ODD specification of EAG and point out the main changes we made, which we think could also contribute to the further evolution of the EAG setting at large. We end up providing a wider picture of what we think could be the future of archival formats (EAG, EAD, EAC) if we want them to be more coherent and more sustainable at the service of both archives and researchers.
International audience; Bibliographic data can be produced by different kind of people, responding to different purposes. An author can provide information about the origin of a quotation; a bookseller can offer the reader a catalogue of his supply; a printer handles the exact accounts of his stock; a librarian needs a file showing the precise location of a specific copy… From archives to printed books, we will try to give an overview of the different sources which can provide bibliographic data.
Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear
Darhri, Anas Alaoui M.; Vincent Baillet; Bastien Bourineau; Alessio Calantropio; Gabriella Carpentiero; Medhi Chayani; Livio de Luca; Iwona Dudek; Bruno Dutailly; Hélène Gautier; +22 more
Darhri, Anas Alaoui M.; Vincent Baillet; Bastien Bourineau; Alessio Calantropio; Gabriella Carpentiero; Medhi Chayani; Livio de Luca; Iwona Dudek; Bruno Dutailly; Hélène Gautier; Eleonora Grilli; Valentin Grimaud; Christoph Hoffmann; Adeline Joffres; Nenad Jončić; Michel Jordan; Justin Kimball; Adeline Manuel; Patrick Mcinerney; Imanol Muñoz Pandiella; Ariane Néroulidis; Erica Nocerino; Anthony Pamart; Costas Papadopoulos; Marco Potenziani; Emilie Saubestre; Roberto Scopigno; Dorian Seillier; Sarah Tournon-Valiente; Martina Trognitz; Jean-Marc Vallet; Chiara Zuanni;
Publisher: HAL CCSD
Project: EC | PARTHENOS (654119)
International audience; Through this White Paper, which gathers contributions from experts of 3D data as well as professionals concerned with the interoperability and sustainability of 3D research data, the PARTHENOS project aims at highlighting some of the current issues they have to face, with possible specific points according to the discipline, and potential practices and methodologies to deal with these issues.During the workshop, several tools to deal with these issues have been introduced and confronted with the participants experiences, this White Paper now intends to go further by also integrating participants feedbacks and suggestions of potential improvements.Therefore, even if the focus is put on specific tools, the main goal is to contribute to the development of standardized good practices related to the sharing, publication, storage and long-term preservation of 3D data.
International audience; Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]).The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.
International audience; Once upon a time, the preservation, the study, and the digitization of books were primarily justified by utilitarian needs: to obtain a qualification, or to hold down a job — or to share ideas, or to satisfy a hedonistic need derived from the pleasure of reading. Nowadays the building of corpora and the digitization of collections could replace books by « data », the consultation of which threatens a close engagement with the linearity of text. I will comment on examples chosen from the library of Le Mans, and from my own collection, to question this apparent under valuing of reading in favour of results derived by means of tools only partly under human control.; La conservation, l’étude et la numérisation des livres ont été avant tout justifiées par un besoin social ou individuel de lecture : utilitaire, pour obtenir un diplôme et avoir un métier, pour diffuser des idées, besoin hédoniste issu du plaisir de la lecture. Actuellement la constitution des corpus et la numérisation de collections pourraient remplacer les livres par les « données », dont la consultation mettrait en danger un rapport au texte qui était avant tout « linéaire ». Des exemples pris dans les fonds de la bibliothèque du Mans ou personnels permettent d’alimenter le débat autour de ce qui apparaît comme une déqualification de la lecture, au profit de résultats obtenus grâce à un outillage plus ou moins bien maîtrisé par l’humain.