More and more cultural institutions use Linked Data principles to share and connect their collection metadata. In the archival field, initiatives emerge to exploit data contained in archival descriptions and adapt encoding standards to the semantic web. In this context, online authority files can be used to enrich metadata. However, relying on a decentralized network of knowledge bases such as Wikidata, DBpedia or even Viaf has its own difficulties. This paper aims to offer a critical view of these linked authority files by adopting a close-reading approach. Through a practical case study, we intend to identify and illustrate the possibilities and limits of RDF triples compared to institutions' less structured metadata. Comment: Workshop "Dariah "Trust and Understanding: the value of metadata in a digitally joined-up world" (14/05/2018, Brussels), preprint of the submission to the journal "Archives et Biblioth\`eques de Belgique"
International audience; This contribution will show how Access play a strong role in the creation and structuring of DARIAH, a European Digital Research Infrastructure in Arts and Humanities.To achieve this goal, this contribution will develop the concept of Access from five examples:_ Interdisciplinarity point of view_ Manage contradiction between national and international perspectives_ Involve different communities (not only researchers stakeholders)_ Manage tools and services_ Develop and use new collaboration toolsWe would like to demonstrate that speaking about Access always implies a selection, a choice, even in the perspective of "Open Access".
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear
In this paper we provide a systematic and comprehensive set of modeling principles for representing etymological data in digital dictionaries using TEI. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries and born-digital lexical databases that are constructed manually or semi-automatically. We provide examples from many different types of etymological phenomena from traditional lexicographic practice, as well as analytical approaches from functional and cognitive linguistics such as metaphor, metonymy, and grammaticalization, which in many lexicographical and formal linguistic circles have not often been treated as truly etymological in nature, and have thus been largely left out of etymological dictionaries. In order to fully and accurately express the phenomena and their structures, we have made several proposals for expanding and amending some aspects of the existing TEI framework. Finally, with reference to both synchronic and diachronic data, we also demonstrate how encoders may integrate semantic web/linked open data information resources into TEI dictionaries as a basis for the sense, and/or the semantic domain, of an entry and/or an etymon.
A defining feature of data and data workflows in the arts and humanities domain is their dependence on cultural heritage sources hosted and curated in museums, libraries, galleries and archives. A major difficulty when scholars interact with heritage data is that the nature of the cooperation between researchers and Cultural Heritage Institutions and the researchers working in CHIs (henceforth CHIs) is often constrained by structural and legal challenges but even more by uncertainties as to the expectations of both parties.This recognition led several European organizations such as APEF, CLARIN, Europeana, E-RIHS to come together and join forces under the governance of DARIAH to set up principles and mechanisms for improving the conditions for the use and re-use of cultural heritage data issued by cultural heritage institutions and studied and enriched by researchers. As a first step of this joint effort is the Heritage Data Reuse Charter (https://datacharter.hypotheses.org/) establishes 6 basic principles for improving the use and re-use of cultural heritage resources by researchers and , to help all the relevant actors to work together to connect and improve access to heritage data. These are: Reciprocity, Interoperability, Citability, Openness, Stewardship and Trustworthiness.As a further step in translating these principles to actual data workflows the survey below serves as a template to frame exchanges around cultural heritage data by enabling both Cultural Heritage Institutions, infrastructure providers and researchers and to clarify their goals at the beginning and the project, to specify access to data, provenance information, preferred citation standards, hosting responsibilities etc. on the basis of which the parties can arrive at mutual reuse agreements that could serve as a starting point for a FAIR-by-construction data management, right from the project planning/application phase. In practice, the survey below can be flexibly applied in platform-independent ways in exchange protocols between Cultural Heritage Institutions and researchers, Institutions who sign the Charter could use it (and expect to use such surveys) in their own exchange protocols. Another direction of future developments is to set up a platform dedicated to such exchanges. On the other hand, researchers are encouraged to contact the CHIs during the initial stages of their project in order to explain their plans and figure details of transaction together. This mutual declaration can later be a powerful component in their Data Management Plans as it shows evidence for responsible and fair conduct of cultural heritage data, and fair (but also FAIR) research data management practices that are based on partnership with the holding institution. As enclosing a Research Data Management Plan to grant applications is becoming a more and more common requirement among research funders, we need to raise the funders’ awareness to the fact that such bi- or trilateral agreements and data reuse declarations among researchers, CHIs and infrastructure providers are crucial domain-specific components of FAIR data management.
International audience; The CENDARI infrastructure is a research-supporting platform designed to provide tools for transnational historical research, focusing on two topics: medieval culture and World War I. It exposes to the end users modern Web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand historians' needs and derive an abstract workflow to support them. We then outline the tools that we have built, tying their technical descriptions to the user requirements. The main tools are the note-taking environment and its faceted search capabilities; the data integration platform including the Data API, supporting semantic enrichment through entity recognition; and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented.
There is a growing need to establish domain-or discipline-specific approaches to research data sharing workflows. A defining feature of data and data workflows in the arts and humanities domain is their dependence on cultural heritage sources hosted and curated in museums, libraries, galleries and archives. A major difficulty when scholars interact with heritage data is that the nature of the cooperation between researchers and Cultural Heritage Institutions (henceforth CHIs) is often constrained by structural and legal challenges but even more by uncertainties as to the expectations of both parties. The Heritage Data Reuse Charter aims to address these by designing a common environment that will enable all the relevant actors to work together to connect and improve access to heritage data and make transactions related to the scholarly use of cultural heritage data more visible and transparent. As a first step, a wide range of stakeholders on the Cultural Heritage and research sector agreed upon a set of generic principles, summarized in the Mission Statement of the Charter, that can serve as a baseline governing the interactions between CHIs, researchers and data centres. This was followed by a long and thorough validation process related to these principles through surveys 1 and workshops 2. As a second step, we now put forward a questionnaire template tool that helps researchers and CHIs to translate the 6 core principles into specific research project settings. It contains questions about access to data, provenance information, preferred citation standards, hosting responsibilities etc. on the basis of which the parties can arrive at mutual reuse agreements that could serve as a starting point for a FAIR-by-construction data management, right from the project planning/application phase. The questionnaire template and the resulting mutual agreements can be flexibly applied to projects of different scale and in platform-independent ways. Institutions can embed them into their own exchange protocols while researchers can add them to their Data Management Plans. As such, they can show evidence for responsible and fair conduct of cultural heritage data, and fair (but also FAIR) research data management practices that are based on partnership with the holding institution.
Publisher: Centre pour la Communication Scientifique Directe (CCSD)
This article presents a study of the French-speaking digital humanities. It is based on the experience of two research engineers from the French National Center for Scientific Research (CNRS) who have been studying these issues for the last ten years. They conducted a survey at the École Normale Supérieure (ENS-Paris) which enabled them to draw up an overview of the transformation of the profession of humanities and social sciences research engineers in the context of the digital humanities. The Digit_Hum initiative, which they run in parallel with their respective activities at the ENS, also provided information for this overview thanks to its role as a space for discussion about the digital humanities along with training and structuring of this field at the ENS and the Université Paris Sciences & Lettres (PSL). Cet article est une réflexion sur les humanités numériques en contexte francophone. Elle s’appuie sur l'expérience de deux ingénieures du Centre National de la Recherche Scientifique travaillant sur ces questions depuis une dizaine d'années. À travers l'enquête qu'elles ont menée à l'École normale supérieure (ENS-Paris), elles dressent un panorama de la transformation du métier d'ingénieur(e) en sciences humaines et sociales dans le contexte des humanités numériques. L'initiative Digit_Hum, qu'elles animent en parallèle de leurs activités respectives à l'École, nourrit également ce témoignage en constituant un espace de discussions, de formations et de structuration des humanités numériques au sein de l'ENS et de l’Université Paris Sciences & Lettres.
This paper addresses the integration of a Named Entity Recognition and Disambiguation (NERD) service within a group of open access (OA) publishing digital platforms and considers its potential impact on both research and scholarly publishing. The software powering this service, called entity-fishing, was initially developed by Inria in the context of the EU FP7 project CENDARI and provides automatic entity recognition and disambiguation using the Wikipedia and Wikidata data sets. The application is distributed with an open-source licence, and it has been deployed as a web service in DARIAH's infrastructure hosted by the French HumaNum. In the paper, we focus on the specific issues related to its integration on five OA platforms specialized in the publication of scholarly monographs in the social sciences and humanities (SSH), as part of the work carried out within the EU H2020 project HIRMEOS (High Integration of Research Monographs in the European Open Science infrastructure). In the first section, we give a brief overview of the current status and evolution of OA publications, considering specifically the challenges that OA monographs are encountering. In the second part, we show how the HIRMEOS project aims to face these challenges by optimizing five OA digital platforms for the publication of monographs from the SSH and ensuring their interoperability. In sections three and four we give a comprehensive description of the entity-fishing service, focusing on its concrete applications in real use cases together with some further possible ideas on how to exploit the annotations generated. We show that entity-fishing annotations can improve both research and publishing process. In the last chapter, we briefly present further possible application scenarios that could be made available through infrastructural projects.