This report provides information about activities and progress towards establishing DARIAH membership in six countries: the Czech Republic, Finland, Israel, Spain, Switzerland, and the UK, which took place between July and December 2019. Previous activities were described in detail in the D3.2 - Regularly Monitor Country-Specific Progress in Enabling New DARIAH Membership. During the project lifetime, the Czech Republic joined DARIAH ERIC; in other countries, collaboration with DARIAH has been greatly strengthened and significant progress regarding DARIAH membership has been achieved. The report also outlines the next steps in the accession processes, building on the results of the DESIR project.
Project: EC | Locus Ludi (741520), EC | DESIR (731081)
The DESIR project sets out to strengthen the sustainability of DARIAH and firmly establish it as a long-term leader and partner within arts and humanities communities. The project was designed to address six core infrastructural sustainability dimensions and one of these was dedicated to training and education, which is also one of the four pillars identified in the DARIAH Strategic Plan 2019-2026. In the framework of Work Package 7: Teaching, DESIR organised dedicated workshops in the six DARIAH accession countries (Czech Republic, Finland, Israel, Spain, Switzerland and the United Kingdom) to introduce them to the DARIAH infrastructure and related services, and to develop methodological research skills. The topic of each workshop was decided by accession countries representatives according to the training needs of the national communities of researchers in the (Digital) Humanities. Training topics varied greatly: on the one hand, some workshops had the objective to introduce participants to specific methodological research skills; on the other hand, a different approach was used, and some events focused on the infrastructural role of training and education. The workshops organised in the context of Work Package 7: Teaching are listed below:• CZECH REPUBLIC: “A series of fall tutorials 2019 organized by LINDAT/CLARIAHCZ, tutorial #3 on TEI Training”, November 28, 2019, Prague;• FINLAND: “Reuse & sustainability: Open Science and social sciences and humanities research infrastructures”, 23 October 2019, Helsinki;• ISRAEL: “Introduction to Text Encoding and Digital Editions”, 24 October 2019, Haifa;• SPAIN: “DESIR Workshop: Digital Tools, Shared Data, and Research Dissemination”, 3 July 2019, Madrid;• SWITZERLAND: “Sharing the Experience: Workflows for the Digital Humanities”, 5-6 December 2019, Neuchâtel;• UNITED KINGDOM: “Research Software Engineering for Digital Humanities: Role of Training in Sustaining Expertise”, 9 December, London.
Publication . Part of book or chapter of book . 2018
International audience; En guise de postface, il nous a semblé nécessaire de revenir sur le processus collaboratif de la fabrication de cet ouvrage et de vous confier la genèse de ce projet. Tout est parti d'un constat pragmatique, de nos situations quotidiennes de travail : le/la chercheur·e qui produit ou utilise des données a besoin de réponses concrètes aux questions auxquelles il/elle est confronté·e sur son terrain comme lors de tous ses travaux de recherche. Produire, exploiter, diffuser, partager ou éditer des sources numériques fait aujourd'hui partie de notre travail ordinaire. La rupture apportée par le développement du web et l'arrivée du format numérique ont largement facilité la diffusion et le partage des ressources (documentaires, textuelles, photographiques, sonores ou audiovisuelles...) dans le monde de la recherche et, au-delà, auprès des citoyens de plus en plus curieux et intéressés par les documents produits par les scientifiques.
Countries: Spain, Spain, Netherlands, Netherlands, France
Project: FCT | EXPL/BBB-BEP/1356/2013 (EXPL/BBB-BEP/1356/2013), AKA | ELIXIR - Data for Life Eu... (273655), WT , EC | WENMR (261572), EC | EGI-INSPIRE (261323), EC | BIOMEDBRIDGES (284209), FCT | SFRH/BPD/78075/2011 (SFRH/BPD/78075/2011), FCT | EXPL/BBB-BEP/1356/2013 (EXPL/BBB-BEP/1356/2013), AKA | ELIXIR - Data for Life Eu... (273655), WT ,...
With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community. AD was supported by Fundação para a Ciência e a Tecnologia, Portugal (SFRH/BPD/78075/2011 and EXPL/BBBBEP/1356/2013). FP has been supported by the National Grid Infrastructure NGI_GRNET, HellasGRID, as part of the EGI. IFB acknowledges funding from the “National Infrastructures in Biology and Health” call of the French “Investments for the Future” initiative. The WeNMR project has been funded by a European FP7 e-Infrastructure grant, contract no. 261572. AF was supported by a grant from Labex CEBA (Centre d’études de la Biodiversité Amazonienne) from ANR. MC is supported by UK’s BBSRC core funding. CSC was supported by Academy of Finland grant No. 273655 for ELIXIR Finland. The EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) is co-funded by the European Commission (contract number: RI-261323). The BioMedBridges project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 284209. This is an open-access article.-- et al. Peer Reviewed
Publisher: Japanese Association for Digital Humanities
Project: EC | HIRMEOS (731102)
International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.
International audience; This article presents an overview of approaches and results during our participation in the CLEF HIPE 2020 NERC-COARSE-LIT and EL-ONLY tasks for English and French. For these two tasks, we use two systems: 1) DeLFT, a Deep Learning framework for text processing; 2) entity-fishing, generic named entity recognition and disambiguation service deployed in the technical framework of INRIA.
International audience; The CENDARI infrastructure is a research-supporting platform designed to provide tools for transnational historical research, focusing on two topics: medieval culture and World War I. It exposes to the end users modern Web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand historians' needs and derive an abstract workflow to support them. We then outline the tools that we have built, tying their technical descriptions to the user requirements. The main tools are the note-taking environment and its faceted search capabilities; the data integration platform including the Data API, supporting semantic enrichment through entity recognition; and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented.
International audience; In recent years, a variety of initiatives have been funded with the aim of producing software tools or environments of a type variously known as virtual research environments, research infrastructures, or cyberinfrastructures. These initiatives vary in their scale, specialization, scope, and level of funding. One issue that they face in common, however, is that of sustainability: how can the continued--and useful--existence of a system or tool be guaranteed, or at least facilitated, once a project's funding has been spent? In this paper, we examine how such sustainability has been enabled, in the particular case of infrastructures for textual scholarship, in the context of three international projects: TextGrid,1 TEXTvre,2 and DARIAH3. Firstly, we will address the inter-project collaboration and crossfertilization between TextGrid and TEXTvre, including architectural decisions and shared data infrastructures, and investigate how the projects benefited from the exchange. We will then discuss how this existing collaboration can be taken forward by the loosely-coupled and distributed framework being developed by the DARIAH community, and how it can serve as a model for the sort of collaborations that DARIAH plans to enable.
International audience; This paper explores what is needed to foster an acceptance of digital practices in the humanities beyond the creation of pure infrastructure, specifically in terms of understanding and technically modelling traditional scholarly research within a digital medium while enabling new modes of scholarly work that could only be carried out within a digitally-mediated environment.
International audience; Defining digital humanities might be an endless debate if we stick to the discussion about the boundaries of this concept as an academic “discipline”. In an attempt to concretely identify this field and its actors, this paper shows that it is possible to analyse them through Twitter, a social media widely used by this “community of practice”. Based on a network analysis of 2,500 users identified as members of this movement, the visualisation of the “who’s following who?” graph allows us to highlight the structure of the network’s relationships, and identify users whose position is particular. Specifically, we show that linguistic groups are key factors to explain clustering within a network whose characteristics look similar to a small world.