Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
21 Research products, page 1 of 3

  • DARIAH EU
  • Publications
  • Research software
  • Conference object
  • European Commission
  • Hal-Diderot
  • Mémoires en Sciences de l'Information et de la Communication
  • INRIA a CCSD electronic archive server
  • ProdInra

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Boukhelifa , Nadia; Giannisakis , Emmanouil; Dimara , Evanthia; Willett , Wesley; Fekete , Jean-Daniel;
    Publisher: HAL CCSD
    Country: France
    Project: EC | CENDARI (284432)

    International audience; In this paper we describe the development and evaluation of a visual analytics tool to support historical research. Historians continuously gather data related to their scholarly research from archival visits and background search. Organising and making sense of all this data can be challenging as many historians continue to rely on analog or basic digital tools. We built an integrated note-taking environment for historians which unifies a set of func-tionalities we identified as important for historical research including editing, tagging, searching, sharing and visualization. Our approach was to involve users from the initial stage of brainstorming and requirement analysis through to design, implementation and evaluation. We report on the process and results of our work, and conclude by reflecting on our own experience in conducting user-centered visual analytics design for digital humanities.

  • Publication . Conference object . 2019
    English
    Authors: 
    Bassett, Sheena; Wessels, Leon; Krauwer, Steven; Maegaard, Bente; Hollander, Hella; Admiraal, Femmy; Romary, Laurent; Uiterwaal, Frank;
    Publisher: HAL CCSD
    Country: France
    Project: EC | PARTHENOS (654119)

    International audience; Several Research Infrastructures(RIs)exist in the Humanities and Social Sciences, some –such as CLARIN, DARIAH and CESSDA –which address specific areas of interest, i.e. linguistic studies, digital humanities and social science data archives. RIs are also unique in their scope and application, largely tailored to their specific community needs. However, commonalities do exist and it is recognised that benefits are to be gained from these such as efficient use of resources, enabling multi-disciplinary research and sharing good practices. As such,a bridging project PARTHENOS has worked closely with CLARIN and DARIAH as well as ARIADNE (archaeology), CENDARI (history), EHRI (holocaust studies) and E-RIHS (heritage science) to iden-tify, develop and promote these commonalities. In this paper, we present some specif-ic examples of cross-discipline and trans-border applications arising from joint RI collaboration, allowing for entirely new avenues of research

  • Publication . Conference object . 2019
    English
    Authors: 
    Marlet , Olivier; Francart, Thomas; Markhoff, Béatrice; Rodier, Xavier;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ARIADNEplus (823914)

    International audience; CIDOC CRM is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The Semantic Web with its Linked Open Data cloud enables scholars and cultural institutions to publish their data in RDF, using CIDOC CRM as an interlingua that enables a semantically consistent re-interpretation of their data. Nowadays more and more projects have done the task of mapping legacy datasets to CIDOC CRM, and successful Extract-Transform-Load data-integration processes have been performed in this way. A next step is enabling people and applications to actually dynamically explore autonomous datasets using the semantic mediation offered by CIDOC CRM. This is the purpose of OpenArchaeo, a tool for querying archaeological datasets on the LOD cloud. We present its main features: the principles behind its user friendly query interface and its SPARQL Endpoint for programs, together with its overall architecture designed to be extendable and scalable, for handling transparent interconnections with evolving distributed sources while achieving good efficiency.

  • French
    Authors: 
    Puren, Marie;
    Publisher: HAL CCSD
    Country: France
    Project: EC | IPERION CH (654028)

    International audience; Avec la mise en place de grandes infrastructures de recherche en sciences du patrimoine comme E-RIHS, on rassemble des acteurs divers, issus à la fois des sciences humaines et sociales et des sciences expérimentales. Le paléontologue croise l'historien de l'art, et lephysicien collabore avec le restaurateur.Dans ce cadre, la gestion des données de la recherche est un véritable défi, car elle doit rassembler, valoriser et rendre accessibles des données produites par des protagonistes très différents, utilisant des méthodes elles aussi très différentes. Comment en effet gérer et échanger à la fois des données d'expériences, des images numérisées et des rapports de restauration ?Le cycle de vie des données de la recherche, de leur création à leur diffusion en passant par leur analyse, au sein de cette communauté interdisciplinaire interroge la définition même de ce type de données, et nous amène à questionner les pratiques autour de celles-ci.

  • Authors: 
    Romary, Laurent; Biabiany, Damien; Klaus Illmayer; Puren, Marie; Riondet, Charles; Seillier, Dorian; Tadjou, Lionel;
    Country: France
    Project: EC | PARTHENOS (654119)

    International audience

  • Publication . Conference object . 2013
    English
    Authors: 
    Medves, Maud; Romary, Laurent;
    Publisher: HAL CCSD
    Country: France
    Project: EC | CENDARI (284432)

    International audience; This paper presents the work carried out within the EU Cendari project to provide an appropriate customisation of the EAG format that would fulfil the expectations of researchers in contemporary and medieval history describing where they could find collections and documents of specific interests. After describing the general data landscape that we have to deal with in the Cendari project, we specifically address the data entry and acquisition scenario to identify how this impacts on the actual data structures to be handled. We then present how we implemented such constraints by means of a full TEI/ODD specification of EAG and point out the main changes we made, which we think could also contribute to the further evolution of the EAG setting at large. We end up providing a wider picture of what we think could be the future of archival formats (EAG, EAD, EAC) if we want them to be more coherent and more sustainable at the service of both archives and researchers.

  • Open Access
    Authors: 
    Romary, Laurent; Tasovac, Toma;
    Publisher: Zenodo
    Country: France
    Project: EC | ELEXIS (731015)

    International audience; Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]).The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.

  • Open Access
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: Japanese Association for Digital Humanities
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • Open Access English
    Authors: 
    Raciti, Marco; Gabay, Simon; Moranville, Yoann; Jorge, Maria do Rosário; Fernandes, João;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    International audience; Europe has a long and rich tradition as a centre of research and teaching in the arts and humanities. However, the huge digital transformation that affects the arts and humanities research landscape all over the world requires that we set up sustainable research infrastructures, new and refined techniques, state-of-the-art methods and an expanded skills base. Responding to these challenges, the Digital Research Infrastructure for Arts and Humanities (DARIAH) was launched as a pan-European network and research infrastructure. After expansion and consolidation, which involved DARIAH’s inclusion in the ESFRI roadmap, DARIAH became a European Research Infrastructure Consortium (ERIC) in 2014. The Horizon 2020 funded project DESIR (DARIAH ERIC Sustainability Refined) sets out to strengthen the sustainability of DARIAH and help establish it as a reliable long-term partner within our communities. Sustaining existing digital expertise, tools, resources in Europe in the context of DESIR involves a goal-oriented set of measures in order to first, maintain, expand and develop DARIAH in its capacities as an organisation and technical research infrastructure; secondly, to engage its members further, as well as measure and increase their trust in DARIAH; thirdly, to expand the network in order to integrate new regions and communities. The DESIR consortium is composed of core DARIAH members, representatives from potential new DARIAH members and external technical experts. The sustainability of a research infrastructure is the capacity to remain operative, effective and competitive over its expected lifetime. In DESIR, this definition is translated into an evolving 6-dimensional process, divided into the following challenges:•Dissemination•Growth•Technology•Robustness•Trust•EducationWith our poster, we would like to show how the project helps sustaining DARIAH. Within DESIR, dissemination is the ability to communicate DARIAH’s strategy and benefits effectively within the DARIAH community and in new areas, spreading out to new communities. Through the international workshops held at Stanford University and at the Library of Congress, DARIAH has been introduced to many non-European DH scholars. These events were an important first step to foster international cooperation between US and European colleagues as well as a catalyst for ongoing collaborations in the future. A third workshop took place in Canberra at the Australian Research Data Commons in March 2019.DARIAH has currently 17 members from all over Europe. Nevertheless, efforts should be made to include as many countries as possible to bring in and scale, to a European level, even more state-of-the-art DH activities.Six candidates ready for building strong national consortia have been identified, enabling a substantial expansion of DARIAH’s country coverage. Additionally, thematic workshops are organised in each country as well as tailored training measures.DESIR widens the research infrastructure in core areas which are vital for DARIAH’s sustainability but are not yet covered by the existing set-up. As DARIAH expands across Europe, continuously enhancing and further developing the ERIC exceeds DARIAH’s internal technological capacities. Two notable results were achieved so far: firstly, the publication of a technical reference as a result of a workshop organised in October 2017 with CESSDA and CLARIN. It’s a collection of basic guidelines and references for development and maintenance of infrastructure services within DARIAH and beyond, addressing an ongoing issue for research infrastructures, namely software sustainability. Secondly, the organisation of a Code Sprint, focusing on bibliographical and citation metadata, which helped shaping DARIAH’s profile in four technology areas (visualisation, text analytic services, entity-based search and scholarly content management). Another Code sprint is expected to take place in Summer 2019.Another output is the implementation of a centralized helpdesk. This helpdesk is hosted by CLARIN-D and the solution of integration within the existing DARIAH website was the creation of a WordPress plugin. This plugin is used to connect our website with the OTRS server and allows the creation of issues easily by users unfamiliar with OTRS.Sustaining a research infrastructure involves also two important aspects: trust and education. For DARIAH, it is crucial to increase trust and confidence from its users. In DESIR we develop recommendations and strategies accordingly, targeting new cross-disciplinary communities, based on the results of a survey and interviews addressed to the scientific community, with different levels of approach - national, institutional and individual.In addition, education is a key area and the project contributes to the ongoing discussions about the role and modalities of training and education in the development, consolidation and sustainability of digital research infrastructures. We believe that investing time and efforts into training and educating users is a way of securing the social sustainability of a research infrastructure.

  • Open Access English
    Authors: 
    Francoise Genova;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ASTERICS (653477), EC | RDA Europe (653194)

    The situation of data sharing in astronomy is positioned in the current general context of a political push towards, and rapid development of, scientific data sharing. Data is already one of the major infrastructures of astronomy, thanks to the data and service providers and to the International Virtual Observatory Alliance (IVOA). Other disciplines are moving on in the same direction. International organisations, in particular the Research Data Alliance (RDA), are developing building blocks and bridges to enable scientific data sharing across borders. The liaisons between RDA and astronomy, and RDA activities relevant to the librarian community, are discussed. To be published in Proceedings of the Libraries and Information Systems in Astronomy 2018 - LISA VIII conference, held in Strasbourg, France, June 6-9,2017

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
21 Research products, page 1 of 3
  • Open Access English
    Authors: 
    Boukhelifa , Nadia; Giannisakis , Emmanouil; Dimara , Evanthia; Willett , Wesley; Fekete , Jean-Daniel;
    Publisher: HAL CCSD
    Country: France
    Project: EC | CENDARI (284432)

    International audience; In this paper we describe the development and evaluation of a visual analytics tool to support historical research. Historians continuously gather data related to their scholarly research from archival visits and background search. Organising and making sense of all this data can be challenging as many historians continue to rely on analog or basic digital tools. We built an integrated note-taking environment for historians which unifies a set of func-tionalities we identified as important for historical research including editing, tagging, searching, sharing and visualization. Our approach was to involve users from the initial stage of brainstorming and requirement analysis through to design, implementation and evaluation. We report on the process and results of our work, and conclude by reflecting on our own experience in conducting user-centered visual analytics design for digital humanities.

  • Publication . Conference object . 2019
    English
    Authors: 
    Bassett, Sheena; Wessels, Leon; Krauwer, Steven; Maegaard, Bente; Hollander, Hella; Admiraal, Femmy; Romary, Laurent; Uiterwaal, Frank;
    Publisher: HAL CCSD
    Country: France
    Project: EC | PARTHENOS (654119)

    International audience; Several Research Infrastructures(RIs)exist in the Humanities and Social Sciences, some –such as CLARIN, DARIAH and CESSDA –which address specific areas of interest, i.e. linguistic studies, digital humanities and social science data archives. RIs are also unique in their scope and application, largely tailored to their specific community needs. However, commonalities do exist and it is recognised that benefits are to be gained from these such as efficient use of resources, enabling multi-disciplinary research and sharing good practices. As such,a bridging project PARTHENOS has worked closely with CLARIN and DARIAH as well as ARIADNE (archaeology), CENDARI (history), EHRI (holocaust studies) and E-RIHS (heritage science) to iden-tify, develop and promote these commonalities. In this paper, we present some specif-ic examples of cross-discipline and trans-border applications arising from joint RI collaboration, allowing for entirely new avenues of research

  • Publication . Conference object . 2019
    English
    Authors: 
    Marlet , Olivier; Francart, Thomas; Markhoff, Béatrice; Rodier, Xavier;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ARIADNEplus (823914)

    International audience; CIDOC CRM is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The Semantic Web with its Linked Open Data cloud enables scholars and cultural institutions to publish their data in RDF, using CIDOC CRM as an interlingua that enables a semantically consistent re-interpretation of their data. Nowadays more and more projects have done the task of mapping legacy datasets to CIDOC CRM, and successful Extract-Transform-Load data-integration processes have been performed in this way. A next step is enabling people and applications to actually dynamically explore autonomous datasets using the semantic mediation offered by CIDOC CRM. This is the purpose of OpenArchaeo, a tool for querying archaeological datasets on the LOD cloud. We present its main features: the principles behind its user friendly query interface and its SPARQL Endpoint for programs, together with its overall architecture designed to be extendable and scalable, for handling transparent interconnections with evolving distributed sources while achieving good efficiency.

  • French
    Authors: 
    Puren, Marie;
    Publisher: HAL CCSD
    Country: France
    Project: EC | IPERION CH (654028)

    International audience; Avec la mise en place de grandes infrastructures de recherche en sciences du patrimoine comme E-RIHS, on rassemble des acteurs divers, issus à la fois des sciences humaines et sociales et des sciences expérimentales. Le paléontologue croise l'historien de l'art, et lephysicien collabore avec le restaurateur.Dans ce cadre, la gestion des données de la recherche est un véritable défi, car elle doit rassembler, valoriser et rendre accessibles des données produites par des protagonistes très différents, utilisant des méthodes elles aussi très différentes. Comment en effet gérer et échanger à la fois des données d'expériences, des images numérisées et des rapports de restauration ?Le cycle de vie des données de la recherche, de leur création à leur diffusion en passant par leur analyse, au sein de cette communauté interdisciplinaire interroge la définition même de ce type de données, et nous amène à questionner les pratiques autour de celles-ci.

  • Authors: 
    Romary, Laurent; Biabiany, Damien; Klaus Illmayer; Puren, Marie; Riondet, Charles; Seillier, Dorian; Tadjou, Lionel;
    Country: France
    Project: EC | PARTHENOS (654119)

    International audience

  • Publication . Conference object . 2013
    English
    Authors: 
    Medves, Maud; Romary, Laurent;
    Publisher: HAL CCSD
    Country: France
    Project: EC | CENDARI (284432)

    International audience; This paper presents the work carried out within the EU Cendari project to provide an appropriate customisation of the EAG format that would fulfil the expectations of researchers in contemporary and medieval history describing where they could find collections and documents of specific interests. After describing the general data landscape that we have to deal with in the Cendari project, we specifically address the data entry and acquisition scenario to identify how this impacts on the actual data structures to be handled. We then present how we implemented such constraints by means of a full TEI/ODD specification of EAG and point out the main changes we made, which we think could also contribute to the further evolution of the EAG setting at large. We end up providing a wider picture of what we think could be the future of archival formats (EAG, EAD, EAC) if we want them to be more coherent and more sustainable at the service of both archives and researchers.

  • Open Access
    Authors: 
    Romary, Laurent; Tasovac, Toma;
    Publisher: Zenodo
    Country: France
    Project: EC | ELEXIS (731015)

    International audience; Achieving consistent encoding within a given community of practice has been a recurrent issue for the TEI Guidelines. The topic is of particular importance for lexical data if we think of the potential wealth of content we could gain from pooling together the information available in the variety of highly structured, historical and contemporary lexical resources. Still, the encoding possibilities offered by the Dictionaries Chapter in the Guidelines are too numerous and too flexible to guarantee sufficient interoperability and a coherent model for searching, visualising or enriching multiple lexical resources.Following the spirit of TEI Analytics [Zillig, 2009], developed in the context of the MONK project, TEI Lex-0 aims at establishing a target format to facilitate the interoperability of heterogeneously encoded lexical resources. This is important both in the context of building lexical infrastructures as such [Ermolaev and Tasovac, 2012] and in the context of developing generic TEI-aware tools such as dictionary viewers and profilers. The format itself should not necessarily be one which is used for editing or managing individual resources, but one to which they can be univocally transformed to be queried, visualised, or mined in a uniform way. We are also aiming to stay as aligned as possible with the TEI subset developed in conjunction with the revision of the ISO LMF (Lexical Markup Framework) standard so that coherent design guidelines can be provided to the community (cf. [Romary, 2015]).The paper will provide an overview of the various domains covered by TEI Lex- 0 and the main decisions that were taken over the last 18 months: constraining the general structure of a lexical entry; offering mechanisms to overcome the limits of when used in retro-digitized dictionaries (by allowing, for instance, and as children of ); systematizing the representation of morpho-syntactic information [Bański et al., 2017]; providing a strict -based encoding of sense-related information; deprecating ; dealing with internal and external references in dictionary entries, providing more advanced encodings of etymology (see submission by Bowers, Herold and Romary); as well as defining technical constraints on the systematic use of @xml:id at different levels of the dictionary microstructure. The activity of the group has already lead to changes in the Guidelines in response to specific GitHub tickets.

  • Open Access
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: Japanese Association for Digital Humanities
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • Open Access English
    Authors: 
    Raciti, Marco; Gabay, Simon; Moranville, Yoann; Jorge, Maria do Rosário; Fernandes, João;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    International audience; Europe has a long and rich tradition as a centre of research and teaching in the arts and humanities. However, the huge digital transformation that affects the arts and humanities research landscape all over the world requires that we set up sustainable research infrastructures, new and refined techniques, state-of-the-art methods and an expanded skills base. Responding to these challenges, the Digital Research Infrastructure for Arts and Humanities (DARIAH) was launched as a pan-European network and research infrastructure. After expansion and consolidation, which involved DARIAH’s inclusion in the ESFRI roadmap, DARIAH became a European Research Infrastructure Consortium (ERIC) in 2014. The Horizon 2020 funded project DESIR (DARIAH ERIC Sustainability Refined) sets out to strengthen the sustainability of DARIAH and help establish it as a reliable long-term partner within our communities. Sustaining existing digital expertise, tools, resources in Europe in the context of DESIR involves a goal-oriented set of measures in order to first, maintain, expand and develop DARIAH in its capacities as an organisation and technical research infrastructure; secondly, to engage its members further, as well as measure and increase their trust in DARIAH; thirdly, to expand the network in order to integrate new regions and communities. The DESIR consortium is composed of core DARIAH members, representatives from potential new DARIAH members and external technical experts. The sustainability of a research infrastructure is the capacity to remain operative, effective and competitive over its expected lifetime. In DESIR, this definition is translated into an evolving 6-dimensional process, divided into the following challenges:•Dissemination•Growth•Technology•Robustness•Trust•EducationWith our poster, we would like to show how the project helps sustaining DARIAH. Within DESIR, dissemination is the ability to communicate DARIAH’s strategy and benefits effectively within the DARIAH community and in new areas, spreading out to new communities. Through the international workshops held at Stanford University and at the Library of Congress, DARIAH has been introduced to many non-European DH scholars. These events were an important first step to foster international cooperation between US and European colleagues as well as a catalyst for ongoing collaborations in the future. A third workshop took place in Canberra at the Australian Research Data Commons in March 2019.DARIAH has currently 17 members from all over Europe. Nevertheless, efforts should be made to include as many countries as possible to bring in and scale, to a European level, even more state-of-the-art DH activities.Six candidates ready for building strong national consortia have been identified, enabling a substantial expansion of DARIAH’s country coverage. Additionally, thematic workshops are organised in each country as well as tailored training measures.DESIR widens the research infrastructure in core areas which are vital for DARIAH’s sustainability but are not yet covered by the existing set-up. As DARIAH expands across Europe, continuously enhancing and further developing the ERIC exceeds DARIAH’s internal technological capacities. Two notable results were achieved so far: firstly, the publication of a technical reference as a result of a workshop organised in October 2017 with CESSDA and CLARIN. It’s a collection of basic guidelines and references for development and maintenance of infrastructure services within DARIAH and beyond, addressing an ongoing issue for research infrastructures, namely software sustainability. Secondly, the organisation of a Code Sprint, focusing on bibliographical and citation metadata, which helped shaping DARIAH’s profile in four technology areas (visualisation, text analytic services, entity-based search and scholarly content management). Another Code sprint is expected to take place in Summer 2019.Another output is the implementation of a centralized helpdesk. This helpdesk is hosted by CLARIN-D and the solution of integration within the existing DARIAH website was the creation of a WordPress plugin. This plugin is used to connect our website with the OTRS server and allows the creation of issues easily by users unfamiliar with OTRS.Sustaining a research infrastructure involves also two important aspects: trust and education. For DARIAH, it is crucial to increase trust and confidence from its users. In DESIR we develop recommendations and strategies accordingly, targeting new cross-disciplinary communities, based on the results of a survey and interviews addressed to the scientific community, with different levels of approach - national, institutional and individual.In addition, education is a key area and the project contributes to the ongoing discussions about the role and modalities of training and education in the development, consolidation and sustainability of digital research infrastructures. We believe that investing time and efforts into training and educating users is a way of securing the social sustainability of a research infrastructure.

  • Open Access English
    Authors: 
    Francoise Genova;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ASTERICS (653477), EC | RDA Europe (653194)

    The situation of data sharing in astronomy is positioned in the current general context of a political push towards, and rapid development of, scientific data sharing. Data is already one of the major infrastructures of astronomy, thanks to the data and service providers and to the International Virtual Observatory Alliance (IVOA). Other disciplines are moving on in the same direction. International organisations, in particular the Research Data Alliance (RDA), are developing building blocks and bridges to enable scientific data sharing across borders. The liaisons between RDA and astronomy, and RDA activities relevant to the librarian community, are discussed. To be published in Proceedings of the Libraries and Information Systems in Astronomy 2018 - LISA VIII conference, held in Strasbourg, France, June 6-9,2017