Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
57 Research products, page 1 of 6

  • DARIAH EU
  • Publications
  • 2017-2021
  • European Commission
  • EU
  • Mémoires en Sciences de l'Information et de la Communication
  • Hyper Article en Ligne
  • DARIAH EU

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Frank Uiterwaal; Franco Niccolucci; Sheena Bassett; Steven Krauwer; Hella Hollander; Femmy Admiraal; Laurent Romary; George Bruseker; Carlo Meghini; Jennifer Edmond; +1 more
    Publisher: Edinburgh University Press for the Association for History and Computing,, Edinburgh , Regno Unito
    Countries: France, France, France, Italy, Italy, Netherlands
    Project: EC | PARTHENOS (654119)

    This article has been accepted for publication by EUP in the IJHAC: International Journal of Humanities and Arts Computing (https://www.euppublishing.com/loi/ijhac); International audience; Since the first ESFRI roadmap in 2006, multiple humanities Research Infrastructures (RIs) have been set up all over the European continent, supporting archaeologists (ARIADNE), linguists (CLARIN-ERIC), Holocaust researchers (EHRI), cultural heritage specialists (IPERION-CH) and others. These examples only scratch the surface of the breadth of research communities that have benefited from close cooperation in the European Research Area.While each field developed discipline-specific services over the years, common themes can also be distinguished. All humanities RIs address, in varying degrees, questions around research data management, the use of standards and the desired interoperability of data across disciplinary boundaries.This article sheds light on how cluster project PARTHENOS developed pooled services and shared solutions for its audience of humanities researchers, RI managers and policymakers. In a time where the convergence of existing infrastructure is becoming ever more important – with the construction of a European Open Science Cloud as an audacious, ultimate goal – we hope that our experiences inform future work and provide inspiration on how to exploit synergies in interdisciplinary, transnational, scientific cooperation.

  • Open Access English
    Authors: 
    Stefan Buddenbohm; Maaike A. de Jong; Jean-Luc Minel; Yoann Moranville;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HaS-DARIAH (675570)

    AbstractHow can researchers identify suitable research data repositories for the deposit of their research data? Which repository matches best the technical and legal requirements of a specific research project? For this end and with a humanities perspective the Data Deposit Recommendation Service (DDRS) has been developed as a prototype. It not only serves as a functional service for selecting humanities research data repositories but it is particularly a technical demonstrator illustrating the potential of re-using an already existing infrastructure - in this case re3data - and the feasibility to set up this kind of service for other research disciplines. The documentation and the code of this project can be found in the DARIAH GitHub repository: https://dariah-eric.github.io/ddrs/.

  • Open Access English
    Authors: 
    Maryl, Maciej; Błaszczyńska, Marta; Zalotyńska, Agnieszka; Taylor, Laurence; Avanço, Karla; Balula, Ana; Buchner, Anna; Caliman, Lorena; Clivaz, Claire; Costa, Carlos; +21 more
    Publisher: HAL CCSD
    Countries: Croatia, France
    Project: EC | OPERAS-P (871069)

    This report discusses the scholarly communication issues in Social Sciences and Humanities that are relevant to the future development and functioning of OPERAS. The outcomes collected here can be divided into two groups of innovations regarding 1) the operation of OPERAS, and 2) its activities. The “operational” issues include the ways in which an innovative research infrastructure should be governed (Chapter 1) as well as the business models for open access publications in Social Sciences and Humanities (Chapter 2). The other group of issues is dedicated to strategic areas where OPERAS and its services may play an instrumental role in providing, enabling, or unlocking innovation: FAIR data (Chapter 3), bibliodiversity and multilingualism in scholarly communication (Chapter 4), the future of scholarly writing (Chapter 5), and quality assessment (Chapter 6). Each chapter provides an overview of the main findings and challenges with emphasis on recommendations for OPERAS and other stakeholders like e-infrastructures, publishers, SSH researchers, research performing organisations, policy makers, and funders. Links to data and further publications stemming from work concerning particular tasks are located at the end of each chapter.

  • Open Access English
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • English
    Authors: 
    Khemakhem, Mohamed;
    Publisher: HAL CCSD
    Project: ANR | BASNUM (ANR-18-CE38-0003), EC | PARTHENOS (654119)

    Dictionaries could be considered as the most comprehensive reservoir of human knowledge, which carry not only the lexical description of words in one or more languages, but also the common awareness of a certain communityabout every known piece of knowledge in a time frame. Print dictionaries are the principle resources which enable the documentation and transfer of such knowledge. They already exist in abundant numbers, while new onesare continuously compiled, even with the recent strong move to digital resources.However, a majority of these dictionaries, even when available digitally, is still not fully structured due to the absence of scalable methods and techniques that can cover the variety of corresponding material. Moreover, the relatively few existing structured resources present limited exchange and query alternatives, given the discrepancy of their data models and formats.In this thesis we address the task of parsing lexical information in print dictionaries through the design of computer models that enable their automatic structuring. Solving this task goes hand in hand with finding a standardised output for these models to guarantee a maximum interoperability among resources and usability for downstream tasks.First, we present different classifications of the dictionaric resources to delimit the category of print dictionaries we aim to process. Second, we introduce the parsing task by providing an overview of the processing challengesand a study of the state of the art. Then, we present a novel approach based on a top-down parsing of the lexical information. We also outline the archiecture of the resulting system, called GROBID-Dictionaries, and the methodology we followed to close the gap between the conception of the system and its applicability to real-world scenarios.After that, we draw the landscape of the leading standards for structured lexical resources. In addition, we provide an analysis of two ongoing initiatives, TEI-Lex-0 and LMF, that aim at the unification of modelling the lexical information in print and electronic dictionaries. Based on that, we present a serialisation format that is inline with the schemes of the two standardisation initiatives and fits the approach implemented in our parsing system.After presenting the parsing and standardised serialisation facets of our lexical models, we provide an empirical study of their performance and behaviour. The investigation is based on a specific machine learning setup andseries of experiments carried out with a selected pool of varied dictionaries.We try in this study to present different ways for feature engineering and exhibit the strength and the limits of the best resulting models. We also dedicate two series of experiments for exploring the scalability of our models with regard to the processed documents and the employed machine learning technique.Finally, we sum up this thesis by presenting the major conclusions and opening new perspectives for extending our investigations in a number of research directions for parsing entry-based documents.; Les dictionnaires peuvent être considérés comme le réservoir le plus compréhensible de connaissances humaines, qui contiennent non seulement la description lexicale des mots dans une ou plusieurs langues, mais aussi la conscience commune d’une certaine communauté sur chaque élément de connaissance connu dans une période de temps donnée. Les dictionnaires imprimés sont les principales ressources qui permettent la documentation et le transfert de ces connaissances. Ils existent déjà en grand nombre, et de nouveaux dictionnaires sont continuellement compilés.Cependant, la majorité de ces dictionnaires dans leur version numérique n’est toujours pas structurée en raison de l’absence de méthodes et de techniques évolutives pouvant couvrir le nombre du matériel croissant et sa variété. En outre, les ressources structurées existantes, relativement peu nombreuses, présentent des alternatives d’échange et de recherche limitées, en raison d’un sérieux manque de synchronisation entre leurs schémas de structure.Dans cette thèse, nous abordons la tâche d’analyse des informations lexicales dans les dictionnaires imprimés en construisant des modèles qui permettent leur structuration automatique. La résolution de cette tâche va depair avec la recherche d’une sortie standardisée de ces modèles afin de garantir une interopérabilité maximale entre les ressources et une facilité d’utilisation pour les tâches en aval.Nous commençons par présenter différentes classifications des ressources dictionnaires pour délimiter les catégories des dictionnaires imprimés sur lesquelles ce travail se focalise. Ensuite, nous définissions la tâche d’analyse en fournissant un aperçu des défis de traitement et une étude de l’état de l’art.Nous présentons par la suite une nouvelle approche basée sur une analyse en cascade de l’information lexicale. Nous décrivons également l’architecture du système résultant, appelé GROBID-Dictionaries, et la méthodologie quenous avons suivie pour rapprocher la conception du système de son applicabilité aux scénarios du monde réel.Ensuite, nous prestons des normes clés pour les ressources lexicales structurées. En outre, nous fournissons une analyse de deux initiatives en cours, TEI-Lex-0 et LMF, qui visent à unifier la modélisation de l’information lexicale dans les dictionnaires imprimés et électroniques. Sur cette base, nous présentons un format de sérialisation conforme aux schémas des deux initiatives de normalisation et qui est assorti à l’approche développée dans notresystème d’analyse lexicale.Après avoir présenté les facettes d’analyse et de sérialisation normalisées de nos modèles lexicaux, nous fournissons une étude empirique de leurs performances et de leurs comportements. L’étude est basée sur une configuration spécifique d’apprentissage automatique et sur une série d’expériences menées avec un ensemble sélectionné de dictionnaires variés. Dans cette étude, nous essayons de présenter différentes manières d’ingénierie des caractéristiques et de montrer les points forts et les limites des meilleurs modèles résultants. Nous consacrons également deux séries d’expériences pour explorer l’extensibilité de nos modèles en ce qui concerne les documents traités et la technique d’apprentissage automatique employée.Enfin, nous clôturons cette thèse en présentant les principales conclusions et en ouvrant de nouvelles perspectives pour l’extension de nos investigations dans un certain nombre de directions de recherche pour l’analyse des documents structurés en un ensemble d’entrées.

  • Publication . Report . 2020
    English
    Authors: 
    Bertrand, Loïc; Anglos, Demetrios; Castillejo, Marta; Charbonnel, Bénédicte; David, Sophie; de Clercq, Hilde; Dubray, Fanny; Spring, Marika;
    Publisher: HAL CCSD
    Country: France
    Project: EC | E-RIHS PP (739503)

    The study and preservation of tangible cultural and natural heritage is a global challenge for science and society at large. The European Research Infrastructure for Heritage Science (E-RIHS) will play a leading role in research on the interpretation, preservation, documentation and management of heritage. As an interdisciplinary infrastructure, E-RIHS will interconnect knowledge and methodologies to address key scientific questions in the field of heritage as a whole. The infrastructure is built on ten core pillars. It will provide a structured and unified input of large-scale instruments, portable devices, physical and digital archives. Its implementation will focus on scientific excellence, interdisciplinarity and cooperation. In doing so, it will offer unprecedented research opportunities to a wide range of interdisciplinary scientific communities.

  • Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
    Open Access English
    Authors: 
    Rehm, Georg; Marheinecke, Katrin; Hegele, Stefanie; Piperidis, Stelios; Bontcheva, Kalina; Hajic, Jan; Choukri, Khalid; Vasiljevs, Andrejs; Backfried, Gerhard; Prinz, Christoph; +37 more
    Publisher: European Language Resources Association
    Countries: Denmark, France
    Project: EC | ELG (825627), EC | BDVe (732630), SFI | ADAPT: Centre for Digital... (13/RC/2106), EC | AI4EU (825619), FCT | PINFRA/22117/2016 (PINFRA/22117/2016), EC | X5gon (761758)

    Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  • Open Access English
    Authors: 
    Khan, Fahad; Romary, Laurent; Salgado, Ana; Bowers, Jack; Khemakhem, Mohamed; Tasovac, Toma;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ELEXIS (731015)

    Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.html; International audience; In this article, we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely Part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the use of both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of the reference Portuguese dictionary Grande Dicionário Houaiss da Língua Portuguesa, part of a broader experiment comprising the analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the Unified Modelling Language (UML) and also in a couple of cases in TEI.

  • English
    Authors: 
    Rollo, Maria Fernanda; Jorge, Maria do Rosário; Fernandes, João; da Silva, Filipe Guimarães; Queiroz, Inês; Lucas, Pedro;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    The European Commission aims to develop a more sustainable environment for research infrastructures ecosystem, and to ensure that the benefits and impacts are widely perceived by research communities and led to research excellence. This vision is reflected in a range of international and European documents. Recent work conducted by the OECD and the European Commission, particularly by ESFRI and e-IRG, have stated the need to make structural changes in the EU framework for research infrastructures (RIs). In line with this strategic vision, DARIAH intends to establish itself as a sustainable research infrastructure. DESIR (DARIAH ERIC Sustainability Refined) work package 6 TRUST contributes to DARIAH’s long-term sustainability by measuring acceptance and impact of DARIAH in new cross-disciplinary DARIAH communities and core groups. This was the base to define the theoretical and methodological framework that supported the research here presented. Therefore, this report focuses on the development of recommendations and strategies to support and increase confidence in DARIAH services and infrastructure, aiming at contributing to a major DESIR goal, which is to enlarge DARIAH by engaging new cross-disciplinary communities and considering their specific requirements. The proposed recommendations could set the basis for a broader debate within the DARIAH and RIs landscape on the actions to be taken at all decision levels in order to address a vision for longer-term sustainable RI. So, this report intends to be a policy document that aims at inspiring the future path of DARIAH, contributing to its sustainability and to fulfil the mission for which it was created. The recommendations stem from the analytical work developed from the contributions of multiple sources of information: an academically-driven multi-country survey (see D6.2); 33 qualitative interviews in three different countries; a workshop with DARIAH national coordinators held in Warsaw; contributions from DESIR partners who lead other work projects within the project; and DESIR Winter School “Shaping New Approaches to Data Management in Arts and Humanities”. After defining the entire set of recommendations, they were grouped according to three main strategic frameworks (sustainability, scope and DARIAH Strategic Plan) and visually displayed in a “Recommendations & Community Engagement Tool” (https://dariah.peopleware.pt), an open platform that supports DARIAH, strengthening the link with arts and humanities communities.The new DARIAH Strategic Plan for the next seven years, which will be followed by the publication of a Strategic Action Plan, represents a big opportunity to address sustainability, both as a conceptual level and in terms of organizational and operational configuration. Therefore, the main findings are summarized in seven key recommendations, linked with the strategic pillars of the recent published DARIAH Strategic Plan:1. Promote research excellence with inclusive, collaborative, bureaucracy free and community-driven approach.2. Ensure the integration of tools, services, data and resources within DARIAH community and with other Research Infrastructures (e.g. by gathering them on a platform such as the Marketplace).3. Foster a collaborative learning environment and anticipate the skills of the future through a joint strategy for education and training (e.g. DARIAH-CAMPUS).4. Establish a flexible, participatory and effective governance model with a clear and sustainable business plan.5. Strengthen DARIAH’s representation in European and International policy arena, expanding its visibility and cooperation outside EU borders.6. Broaden and extend DARIAH’s role, action and benefits towards the strengthening of scientific citizenship in Europe.7. Set up means for monitoring and bringing communities together, while respecting diversity on an institutional, scientific, disciplinary and methodological level.The work developed in the DESIR project - particularly this set of recommendations - could be a contribution to foster the implementation of guidelines and short and long-term actions to improve DARIAH’s sustainability and firmly establish it as a long-term leader and partner within arts and humanities communities.

  • Publication . Report . 2019
    English
    Authors: 
    Toma Tasovac; Jennifer Edmond; Vicky Garnett; Deborah Ellen Thorpe;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    • To the extent that is has been theorised, work on DH pedagogy has tended to be very strongly tied to the classroom experience. A classroom experience, however, exists within a particular social and institutional framework (students seeking knowledge, experience or qualification from instructors who master a specific body of knowledge) which is quite different from the operational and distributed nature of Research Infrastructures such as DARIAH.• Research infrastructures seldom possess the kinds of specialised procedures, staff, resources and expertise to deliver formal educational programmes, but the strength of RI’s lies in the provision of and reflection upon the experience of acculturation and professionalization in “real” cross-institutional and often cross-cultural projects in which peer learning, skills transfers and network building are a rule rather than an exception.• Research Infrastructures such as DARIAH have a specific role to play in the European educational landscape by complementing rather than replacing the pedagogical models prevalent in HEIs today.• RI’s such as DARIAH should focus not only on DH or even on a discipline in which a student or researcher seeks to use DH methodologies, but also on highlighting how these practices engage interdependent communities of practice with intersecting concerns.• DARIAH should intensify effort to position itself as pedagogically relevant beyond the individual humanities disciplines in terms of what it can contribute to the development and dissemination of early-career researchers’ transferable skills and competences as identified by the Eurodoc 2018 Report.• DARIAH should establish an active educational partnership network in order to validate a new approach to the skills needs of humanities students and researchers, looking beyond the frame of what is currently available in the context of formal educational programmes.• DARIAH should develop a curricular model and, if possible, an internship program, to enable fluid exchange of knowledge and students between university programmes and the applied contexts of the research infrastructure.• DARIAH should continue to create and maintain essential filtering and contextualising layers for training materials, which are now available throughDARIAH-Campus, in order to coordinate and enhance open educational resources with other stakeholders in the field.• DARIAH should aim to apply and test its learning resources in different HE contexts in order to profit from unforeseen synergies and unexpected outcomes such as, for instance, the initiative to publish young researchers’ data papers using the DARIAH-Campus Event Capture Template, which emerged out of the DESIR Workshop at the University of Neuchâtel.• Building on currently identified needs, DARIAH should develop foresight models to predict future needs within the Higher Education sector.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
57 Research products, page 1 of 6
  • Open Access English
    Authors: 
    Frank Uiterwaal; Franco Niccolucci; Sheena Bassett; Steven Krauwer; Hella Hollander; Femmy Admiraal; Laurent Romary; George Bruseker; Carlo Meghini; Jennifer Edmond; +1 more
    Publisher: Edinburgh University Press for the Association for History and Computing,, Edinburgh , Regno Unito
    Countries: France, France, France, Italy, Italy, Netherlands
    Project: EC | PARTHENOS (654119)

    This article has been accepted for publication by EUP in the IJHAC: International Journal of Humanities and Arts Computing (https://www.euppublishing.com/loi/ijhac); International audience; Since the first ESFRI roadmap in 2006, multiple humanities Research Infrastructures (RIs) have been set up all over the European continent, supporting archaeologists (ARIADNE), linguists (CLARIN-ERIC), Holocaust researchers (EHRI), cultural heritage specialists (IPERION-CH) and others. These examples only scratch the surface of the breadth of research communities that have benefited from close cooperation in the European Research Area.While each field developed discipline-specific services over the years, common themes can also be distinguished. All humanities RIs address, in varying degrees, questions around research data management, the use of standards and the desired interoperability of data across disciplinary boundaries.This article sheds light on how cluster project PARTHENOS developed pooled services and shared solutions for its audience of humanities researchers, RI managers and policymakers. In a time where the convergence of existing infrastructure is becoming ever more important – with the construction of a European Open Science Cloud as an audacious, ultimate goal – we hope that our experiences inform future work and provide inspiration on how to exploit synergies in interdisciplinary, transnational, scientific cooperation.

  • Open Access English
    Authors: 
    Stefan Buddenbohm; Maaike A. de Jong; Jean-Luc Minel; Yoann Moranville;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HaS-DARIAH (675570)

    AbstractHow can researchers identify suitable research data repositories for the deposit of their research data? Which repository matches best the technical and legal requirements of a specific research project? For this end and with a humanities perspective the Data Deposit Recommendation Service (DDRS) has been developed as a prototype. It not only serves as a functional service for selecting humanities research data repositories but it is particularly a technical demonstrator illustrating the potential of re-using an already existing infrastructure - in this case re3data - and the feasibility to set up this kind of service for other research disciplines. The documentation and the code of this project can be found in the DARIAH GitHub repository: https://dariah-eric.github.io/ddrs/.

  • Open Access English
    Authors: 
    Maryl, Maciej; Błaszczyńska, Marta; Zalotyńska, Agnieszka; Taylor, Laurence; Avanço, Karla; Balula, Ana; Buchner, Anna; Caliman, Lorena; Clivaz, Claire; Costa, Carlos; +21 more
    Publisher: HAL CCSD
    Countries: Croatia, France
    Project: EC | OPERAS-P (871069)

    This report discusses the scholarly communication issues in Social Sciences and Humanities that are relevant to the future development and functioning of OPERAS. The outcomes collected here can be divided into two groups of innovations regarding 1) the operation of OPERAS, and 2) its activities. The “operational” issues include the ways in which an innovative research infrastructure should be governed (Chapter 1) as well as the business models for open access publications in Social Sciences and Humanities (Chapter 2). The other group of issues is dedicated to strategic areas where OPERAS and its services may play an instrumental role in providing, enabling, or unlocking innovation: FAIR data (Chapter 3), bibliodiversity and multilingualism in scholarly communication (Chapter 4), the future of scholarly writing (Chapter 5), and quality assessment (Chapter 6). Each chapter provides an overview of the main findings and challenges with emphasis on recommendations for OPERAS and other stakeholders like e-infrastructures, publishers, SSH researchers, research performing organisations, policy makers, and funders. Links to data and further publications stemming from work concerning particular tasks are located at the end of each chapter.

  • Open Access English
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • English
    Authors: 
    Khemakhem, Mohamed;
    Publisher: HAL CCSD
    Project: ANR | BASNUM (ANR-18-CE38-0003), EC | PARTHENOS (654119)

    Dictionaries could be considered as the most comprehensive reservoir of human knowledge, which carry not only the lexical description of words in one or more languages, but also the common awareness of a certain communityabout every known piece of knowledge in a time frame. Print dictionaries are the principle resources which enable the documentation and transfer of such knowledge. They already exist in abundant numbers, while new onesare continuously compiled, even with the recent strong move to digital resources.However, a majority of these dictionaries, even when available digitally, is still not fully structured due to the absence of scalable methods and techniques that can cover the variety of corresponding material. Moreover, the relatively few existing structured resources present limited exchange and query alternatives, given the discrepancy of their data models and formats.In this thesis we address the task of parsing lexical information in print dictionaries through the design of computer models that enable their automatic structuring. Solving this task goes hand in hand with finding a standardised output for these models to guarantee a maximum interoperability among resources and usability for downstream tasks.First, we present different classifications of the dictionaric resources to delimit the category of print dictionaries we aim to process. Second, we introduce the parsing task by providing an overview of the processing challengesand a study of the state of the art. Then, we present a novel approach based on a top-down parsing of the lexical information. We also outline the archiecture of the resulting system, called GROBID-Dictionaries, and the methodology we followed to close the gap between the conception of the system and its applicability to real-world scenarios.After that, we draw the landscape of the leading standards for structured lexical resources. In addition, we provide an analysis of two ongoing initiatives, TEI-Lex-0 and LMF, that aim at the unification of modelling the lexical information in print and electronic dictionaries. Based on that, we present a serialisation format that is inline with the schemes of the two standardisation initiatives and fits the approach implemented in our parsing system.After presenting the parsing and standardised serialisation facets of our lexical models, we provide an empirical study of their performance and behaviour. The investigation is based on a specific machine learning setup andseries of experiments carried out with a selected pool of varied dictionaries.We try in this study to present different ways for feature engineering and exhibit the strength and the limits of the best resulting models. We also dedicate two series of experiments for exploring the scalability of our models with regard to the processed documents and the employed machine learning technique.Finally, we sum up this thesis by presenting the major conclusions and opening new perspectives for extending our investigations in a number of research directions for parsing entry-based documents.; Les dictionnaires peuvent être considérés comme le réservoir le plus compréhensible de connaissances humaines, qui contiennent non seulement la description lexicale des mots dans une ou plusieurs langues, mais aussi la conscience commune d’une certaine communauté sur chaque élément de connaissance connu dans une période de temps donnée. Les dictionnaires imprimés sont les principales ressources qui permettent la documentation et le transfert de ces connaissances. Ils existent déjà en grand nombre, et de nouveaux dictionnaires sont continuellement compilés.Cependant, la majorité de ces dictionnaires dans leur version numérique n’est toujours pas structurée en raison de l’absence de méthodes et de techniques évolutives pouvant couvrir le nombre du matériel croissant et sa variété. En outre, les ressources structurées existantes, relativement peu nombreuses, présentent des alternatives d’échange et de recherche limitées, en raison d’un sérieux manque de synchronisation entre leurs schémas de structure.Dans cette thèse, nous abordons la tâche d’analyse des informations lexicales dans les dictionnaires imprimés en construisant des modèles qui permettent leur structuration automatique. La résolution de cette tâche va depair avec la recherche d’une sortie standardisée de ces modèles afin de garantir une interopérabilité maximale entre les ressources et une facilité d’utilisation pour les tâches en aval.Nous commençons par présenter différentes classifications des ressources dictionnaires pour délimiter les catégories des dictionnaires imprimés sur lesquelles ce travail se focalise. Ensuite, nous définissions la tâche d’analyse en fournissant un aperçu des défis de traitement et une étude de l’état de l’art.Nous présentons par la suite une nouvelle approche basée sur une analyse en cascade de l’information lexicale. Nous décrivons également l’architecture du système résultant, appelé GROBID-Dictionaries, et la méthodologie quenous avons suivie pour rapprocher la conception du système de son applicabilité aux scénarios du monde réel.Ensuite, nous prestons des normes clés pour les ressources lexicales structurées. En outre, nous fournissons une analyse de deux initiatives en cours, TEI-Lex-0 et LMF, qui visent à unifier la modélisation de l’information lexicale dans les dictionnaires imprimés et électroniques. Sur cette base, nous présentons un format de sérialisation conforme aux schémas des deux initiatives de normalisation et qui est assorti à l’approche développée dans notresystème d’analyse lexicale.Après avoir présenté les facettes d’analyse et de sérialisation normalisées de nos modèles lexicaux, nous fournissons une étude empirique de leurs performances et de leurs comportements. L’étude est basée sur une configuration spécifique d’apprentissage automatique et sur une série d’expériences menées avec un ensemble sélectionné de dictionnaires variés. Dans cette étude, nous essayons de présenter différentes manières d’ingénierie des caractéristiques et de montrer les points forts et les limites des meilleurs modèles résultants. Nous consacrons également deux séries d’expériences pour explorer l’extensibilité de nos modèles en ce qui concerne les documents traités et la technique d’apprentissage automatique employée.Enfin, nous clôturons cette thèse en présentant les principales conclusions et en ouvrant de nouvelles perspectives pour l’extension de nos investigations dans un certain nombre de directions de recherche pour l’analyse des documents structurés en un ensemble d’entrées.

  • Publication . Report . 2020
    English
    Authors: 
    Bertrand, Loïc; Anglos, Demetrios; Castillejo, Marta; Charbonnel, Bénédicte; David, Sophie; de Clercq, Hilde; Dubray, Fanny; Spring, Marika;
    Publisher: HAL CCSD
    Country: France
    Project: EC | E-RIHS PP (739503)

    The study and preservation of tangible cultural and natural heritage is a global challenge for science and society at large. The European Research Infrastructure for Heritage Science (E-RIHS) will play a leading role in research on the interpretation, preservation, documentation and management of heritage. As an interdisciplinary infrastructure, E-RIHS will interconnect knowledge and methodologies to address key scientific questions in the field of heritage as a whole. The infrastructure is built on ten core pillars. It will provide a structured and unified input of large-scale instruments, portable devices, physical and digital archives. Its implementation will focus on scientific excellence, interdisciplinarity and cooperation. In doing so, it will offer unprecedented research opportunities to a wide range of interdisciplinary scientific communities.

  • Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
    Open Access English
    Authors: 
    Rehm, Georg; Marheinecke, Katrin; Hegele, Stefanie; Piperidis, Stelios; Bontcheva, Kalina; Hajic, Jan; Choukri, Khalid; Vasiljevs, Andrejs; Backfried, Gerhard; Prinz, Christoph; +37 more
    Publisher: European Language Resources Association
    Countries: Denmark, France
    Project: EC | ELG (825627), EC | BDVe (732630), SFI | ADAPT: Centre for Digital... (13/RC/2106), EC | AI4EU (825619), FCT | PINFRA/22117/2016 (PINFRA/22117/2016), EC | X5gon (761758)

    Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  • Open Access English
    Authors: 
    Khan, Fahad; Romary, Laurent; Salgado, Ana; Bowers, Jack; Khemakhem, Mohamed; Tasovac, Toma;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ELEXIS (731015)

    Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.html; International audience; In this article, we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely Part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the use of both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of the reference Portuguese dictionary Grande Dicionário Houaiss da Língua Portuguesa, part of a broader experiment comprising the analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the Unified Modelling Language (UML) and also in a couple of cases in TEI.

  • English
    Authors: 
    Rollo, Maria Fernanda; Jorge, Maria do Rosário; Fernandes, João; da Silva, Filipe Guimarães; Queiroz, Inês; Lucas, Pedro;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    The European Commission aims to develop a more sustainable environment for research infrastructures ecosystem, and to ensure that the benefits and impacts are widely perceived by research communities and led to research excellence. This vision is reflected in a range of international and European documents. Recent work conducted by the OECD and the European Commission, particularly by ESFRI and e-IRG, have stated the need to make structural changes in the EU framework for research infrastructures (RIs). In line with this strategic vision, DARIAH intends to establish itself as a sustainable research infrastructure. DESIR (DARIAH ERIC Sustainability Refined) work package 6 TRUST contributes to DARIAH’s long-term sustainability by measuring acceptance and impact of DARIAH in new cross-disciplinary DARIAH communities and core groups. This was the base to define the theoretical and methodological framework that supported the research here presented. Therefore, this report focuses on the development of recommendations and strategies to support and increase confidence in DARIAH services and infrastructure, aiming at contributing to a major DESIR goal, which is to enlarge DARIAH by engaging new cross-disciplinary communities and considering their specific requirements. The proposed recommendations could set the basis for a broader debate within the DARIAH and RIs landscape on the actions to be taken at all decision levels in order to address a vision for longer-term sustainable RI. So, this report intends to be a policy document that aims at inspiring the future path of DARIAH, contributing to its sustainability and to fulfil the mission for which it was created. The recommendations stem from the analytical work developed from the contributions of multiple sources of information: an academically-driven multi-country survey (see D6.2); 33 qualitative interviews in three different countries; a workshop with DARIAH national coordinators held in Warsaw; contributions from DESIR partners who lead other work projects within the project; and DESIR Winter School “Shaping New Approaches to Data Management in Arts and Humanities”. After defining the entire set of recommendations, they were grouped according to three main strategic frameworks (sustainability, scope and DARIAH Strategic Plan) and visually displayed in a “Recommendations & Community Engagement Tool” (https://dariah.peopleware.pt), an open platform that supports DARIAH, strengthening the link with arts and humanities communities.The new DARIAH Strategic Plan for the next seven years, which will be followed by the publication of a Strategic Action Plan, represents a big opportunity to address sustainability, both as a conceptual level and in terms of organizational and operational configuration. Therefore, the main findings are summarized in seven key recommendations, linked with the strategic pillars of the recent published DARIAH Strategic Plan:1. Promote research excellence with inclusive, collaborative, bureaucracy free and community-driven approach.2. Ensure the integration of tools, services, data and resources within DARIAH community and with other Research Infrastructures (e.g. by gathering them on a platform such as the Marketplace).3. Foster a collaborative learning environment and anticipate the skills of the future through a joint strategy for education and training (e.g. DARIAH-CAMPUS).4. Establish a flexible, participatory and effective governance model with a clear and sustainable business plan.5. Strengthen DARIAH’s representation in European and International policy arena, expanding its visibility and cooperation outside EU borders.6. Broaden and extend DARIAH’s role, action and benefits towards the strengthening of scientific citizenship in Europe.7. Set up means for monitoring and bringing communities together, while respecting diversity on an institutional, scientific, disciplinary and methodological level.The work developed in the DESIR project - particularly this set of recommendations - could be a contribution to foster the implementation of guidelines and short and long-term actions to improve DARIAH’s sustainability and firmly establish it as a long-term leader and partner within arts and humanities communities.

  • Publication . Report . 2019
    English
    Authors: 
    Toma Tasovac; Jennifer Edmond; Vicky Garnett; Deborah Ellen Thorpe;
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    • To the extent that is has been theorised, work on DH pedagogy has tended to be very strongly tied to the classroom experience. A classroom experience, however, exists within a particular social and institutional framework (students seeking knowledge, experience or qualification from instructors who master a specific body of knowledge) which is quite different from the operational and distributed nature of Research Infrastructures such as DARIAH.• Research infrastructures seldom possess the kinds of specialised procedures, staff, resources and expertise to deliver formal educational programmes, but the strength of RI’s lies in the provision of and reflection upon the experience of acculturation and professionalization in “real” cross-institutional and often cross-cultural projects in which peer learning, skills transfers and network building are a rule rather than an exception.• Research Infrastructures such as DARIAH have a specific role to play in the European educational landscape by complementing rather than replacing the pedagogical models prevalent in HEIs today.• RI’s such as DARIAH should focus not only on DH or even on a discipline in which a student or researcher seeks to use DH methodologies, but also on highlighting how these practices engage interdependent communities of practice with intersecting concerns.• DARIAH should intensify effort to position itself as pedagogically relevant beyond the individual humanities disciplines in terms of what it can contribute to the development and dissemination of early-career researchers’ transferable skills and competences as identified by the Eurodoc 2018 Report.• DARIAH should establish an active educational partnership network in order to validate a new approach to the skills needs of humanities students and researchers, looking beyond the frame of what is currently available in the context of formal educational programmes.• DARIAH should develop a curricular model and, if possible, an internship program, to enable fluid exchange of knowledge and students between university programmes and the applied contexts of the research infrastructure.• DARIAH should continue to create and maintain essential filtering and contextualising layers for training materials, which are now available throughDARIAH-Campus, in order to coordinate and enhance open educational resources with other stakeholders in the field.• DARIAH should aim to apply and test its learning resources in different HE contexts in order to profit from unforeseen synergies and unexpected outcomes such as, for instance, the initiative to publish young researchers’ data papers using the DARIAH-Campus Event Capture Template, which emerged out of the DESIR Workshop at the University of Neuchâtel.• Building on currently identified needs, DARIAH should develop foresight models to predict future needs within the Higher Education sector.