search
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
17 Research products, page 1 of 2

  • DARIAH EU
  • Publications
  • Other research products
  • 2012-2021
  • US

10
arrow_drop_down
Relevance
arrow_drop_down
  • French
    Authors: 
    Ginouvès, Véronique; Gras, Isabelle;
    Publisher: HAL CCSD
    Country: France

    International audience; En guise de postface, il nous a semblé nécessaire de revenir sur le processus collaboratif de la fabrication de cet ouvrage et de vous confier la genèse de ce projet. Tout est parti d'un constat pragmatique, de nos situations quotidiennes de travail : le/la chercheur·e qui produit ou utilise des données a besoin de réponses concrètes aux questions auxquelles il/elle est confronté·e sur son terrain comme lors de tous ses travaux de recherche. Produire, exploiter, diffuser, partager ou éditer des sources numériques fait aujourd'hui partie de notre travail ordinaire. La rupture apportée par le développement du web et l'arrivée du format numérique ont largement facilité la diffusion et le partage des ressources (documentaires, textuelles, photographiques, sonores ou audiovisuelles...) dans le monde de la recherche et, au-delà, auprès des citoyens de plus en plus curieux et intéressés par les documents produits par les scientifiques.

  • Publication . Article . Preprint . Conference object . 2019
    Open Access
    Authors: 
    Lilia Simeonova; Kiril Simov; Petya Osenova; Preslav Nakov;
    Publisher: Incoma Ltd., Shoumen, Bulgaria

    We propose a morphologically informed model for named entity recognition, which is based on LSTM-CRF architecture and combines word embeddings, Bi-LSTM character embeddings, part-of-speech (POS) tags, and morphological information. While previous work has focused on learning from raw word input, using word and character embeddings only, we show that for morphologically rich languages, such as Bulgarian, access to POS information contributes more to the performance gains than the detailed morphological information. Thus, we show that named entity recognition needs only coarse-grained POS tags, but at the same time it can benefit from simultaneously using some POS information of different granularity. Our evaluation results over a standard dataset show sizable improvements over the state-of-the-art for Bulgarian NER. Comment: named entity recognition; Bulgarian NER; morphology; morpho-syntax

  • Publication . Article . 2021
    Open Access
    Authors: 
    John A. Walsh; Peter J. Cobb; Wayne de Fremery; Koraljka Golub; Humphrey Keah; Jeonghyun Kim; Joseph Kiplang'at; Ying-Hsang Liu; Simon Mahony; Sam Gyun Oh; +3 more
    Publisher: Wiley

    The interdisciplinary field known as digital humanities (DH) is represented in various forms in the teaching and research practiced in iSchools. Building on the work of an iSchools organization committee charged with exploring digital humanities curricula, we present findings from a series of related studies exploring aspects of DH teaching, education, and research in iSchools, often in collaboration with other units and disciplines. Through a survey of iSchool programs and an online DH course registry, we investigate the various education models for DH training found in iSchools, followed by a detailed look at DH courses and curricula, explored through analysis of course syllabi and course descriptions. We take a brief look at collaborative disciplines with which iSchools cooperate on DH research projects or in offering DH education. Next, we explore DH careers through an analysis of relevant job advertisements. Finally, we offer some observations about the management and administrative challenges and opportunities related to offering a new iSchool DH program. Our results provide a snapshot of the current state of digital humanities in iSchools which may usefully inform the design and evolution of new DH programs, degrees, and related initiatives.

  • Open Access English
    Authors: 
    Laura Rimell; Thomas Lippincott; Karin Verspoor; Helen L. Johnson; Anna Korhonen;
    Publisher: Elsevier Inc.
    Project: UKRI | Lexical Acquisition for t... (EP/G051070/1)

    Background: Biomedical natural language processing (NLP) applications that have access to detailed resources about the linguistic characteristics of biomedical language demonstrate improved performance on tasks such as relation extraction and syntactic or semantic parsing. Such applications are important for transforming the growing unstructured information buried in the biomedical literature into structured, actionable information. In this paper, we address the creation of linguistic resources that capture how individual biomedical verbs behave. We specifically consider verb subcategorization, or the tendency of verbs to ''select'' co-occurrence with particular phrase types, which influences the interpretation of verbs and identification of verbal arguments in context. There are currently a limited number of biomedical resources containing information about subcategorization frames (SCFs), and these are the result of either labor-intensive manual collation, or automatic methods that use tools adapted to a single biomedical subdomain. Either method may result in resources that lack coverage. Moreover, the quality of existing verb SCF resources for biomedicine is unknown, due to a lack of available gold standards for evaluation. Results: This paper presents three new resources related to verb subcategorization frames in biomedicine, and four experiments making use of the new resources. We present the first biomedical SCF gold standards, capturing two different but widely-used definitions of subcategorization, and a new SCF lexicon, BioCat, covering a large number of biomedical sub-domains. We evaluate the SCF acquisition methodologies for BioCat with respect to the gold standards, and compare the results with the accuracy of the only previously existing automatically-acquired SCF lexicon for biomedicine, the BioLexicon. Our results show that the BioLexicon has greater precision while BioCat has better coverage of SCFs. Finally, we explore the definition of subcategorization using these resources and its implications for biomedical NLP. All resources are made publicly available. Conclusion: The SCF resources we have evaluated still show considerably lower accuracy than that reported with general English lexicons, demonstrating the need for domain- and subdomain-specific SCF acquisition tools for biomedicine. Our new gold standards reveal major differences when annotators use the different definitions. Moreover, evaluation of BioCat yields major differences in accuracy depending on the gold standard, demonstrating that the definition of subcategorization adopted will have a direct impact on perceived system accuracy for specific tasks.

  • Authors: 
    Georgios Artopoulos; Panayiotis Charalambous; Colter Eugene Wehmeier;
    Publisher: IGI Global

    This article reports on the technical development and testing of the basic components of a virtual environment platform that could be used for the cross-disciplinary study of complex urban realities, such as the historic city of Nicosia, Cyprus - the last divided capital of Europe. This platform captures data of virtual visitors' movements in space, and the article suggests that these data could help better understand the impact of planning scenarios and design interventions in open public spaces that used to be popular among the citizens of the historic city. The article presents how this platform uses interaction and immersion opportunities to engage citizens and stakeholders in the management of public open spaces that are associated with built heritage. Crowd simulation is discussed as a computational technique that when is combined with the presented virtual environment platform, and under the right conditions, would contribute to a digital practice for small-scale urban modelling. However, it is beyond the scope of this technical note to provide a full empirical testing and validation of the presented immersive virtual environment.

  • Authors: 
    Mae Velloso-Lyons; Quinn Dombrowski; Kathryn Starkey;
    Publisher: University of Toronto Press Inc. (UTPress)

    This paper introduces the Global Medieval Sourcebook (GMS), an online repository of medieval texts and translations with an open network of contributors. Drawing on our experience developing and building this project in two phases over six years, we reflect on questions regarding editorial, curatorial, and translation practices, and on technical issues involved in the preparation, display, and preservation of texts for online publication. As an example of a scholar-led digital project, the trajectory of the GMS has broad relevance for scholars planning digital curation work of their own. In particular, it offers a salutary example of the infrastructural barriers to sustaining collaborative digital work and a possible model for preserving a concluded project.

  • Open Access English
    Authors: 
    Bridgette Wessels; Rachel Finn; Peter Linde; Paolo Mazzetti; Stefano Nativi; Susan Riley; Rod Smallwood; Mark J. Taylor; Victoria Tsoukala; Kush Wadhwa; +1 more
    Countries: Italy, Netherlands, Sweden
    Project: EC | OPENAIRE (246686), EC | APARSEN (269977), EC | DRIVER II (212147), EC | ETTIS (285593), EC | RECODE (321463)

    This paper explores key issues in the development of open access to research data. The use of digital means for developing, storing and manipulating data is creating a focus on ‘data-driven science’. One aspect of this focus is the development of ‘open access’ to research data. Open access to research data refers to the way in which various types of data are openly available to public and private stakeholders, user communities and citizens. Open access to research data, however, involves more than simply providing easier and wider access to data for potential user groups. The development of open access requires attention to the ways data are considered in different areas of research. We identify how open access is being unevenly developed across the research environment and the consequences this has in terms of generating data gaps. Data gaps refer to the way data becomes detached from published conclusions. To address these issues, we examine four main areas in developing open access to research data: stakeholder roles and values; technological requirements for managing and sharing data; legal and ethical regulations and procedures; institutional roles and policy frameworks. We conclude that problems of variability and consistency across the open access ecosystem need to be addressed within and between these areas to ensure that risks surrounding a data gap are managed in open access. 11 authors. Missing: Sally Wyatt

  • Open Access English
    Authors: 
    Bridget Almas;
    Publisher: Ubiquity Press

    The Perseids project provides a platform for creating, publishing, and sharing research data, in the form of textual transcriptions, annotations and analyses. An offshoot and collaborator of the Perseus Digital Library (PDL), Perseids is also an experiment in reusing and extending existing infrastructure, tools, and services. This paper discusses infrastructure in the domain of digital humanities (DH). It outlines some general approaches to facilitating data sharing in this domain, and the specific choices we made in developing Perseids to serve that goal. It concludes by identifying lessons we have learned about sustainability in the process of building Perseids, noting some critical gaps in infrastructure for the digital humanities, and suggesting some implications for the wider community.

  • Open Access
    Authors: 
    Riccardo Pozzo; Andrea Filippetti; Mario Paolucci; Vania Virgili;
    Publisher: Oxford University Press (OUP)
    Country: Italy

    AbstractThis article introduces the notion of cultural innovation, which requires adapting our approach to co-creation. The argument opens with a first conceptualization of cultural innovation as an additional and autonomous category of the complex processes of co-creation. The dimensions of cultural innovation are contrasted against other forms of innovation. In a second step, the article makes an unprecedented attempt in describing processes and outcomes of cultural innovation, while showing their operationalization in some empirical case studies. In the conclusion, the article considers policy implications resulting from the novel definition of cultural innovation as the outcome of complex processes that involve the reflection of knowledge flows across the social environment within communities of practices while fostering the inclusion of diversity in society. First and foremost, cultural innovation takes a critical stance against inequalities in the distribution of knowledge and builds innovation for improving the welfare of individuals and communities.

  • English
    Authors: 
    Kristanti, Tanti; Romary, Laurent;
    Publisher: HAL CCSD
    Country: France

    International audience; This article presents an overview of approaches and results during our participation in the CLEF HIPE 2020 NERC-COARSE-LIT and EL-ONLY tasks for English and French. For these two tasks, we use two systems: 1) DeLFT, a Deep Learning framework for text processing; 2) entity-fishing, generic named entity recognition and disambiguation service deployed in the technical framework of INRIA.

search
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
17 Research products, page 1 of 2
  • French
    Authors: 
    Ginouvès, Véronique; Gras, Isabelle;
    Publisher: HAL CCSD
    Country: France

    International audience; En guise de postface, il nous a semblé nécessaire de revenir sur le processus collaboratif de la fabrication de cet ouvrage et de vous confier la genèse de ce projet. Tout est parti d'un constat pragmatique, de nos situations quotidiennes de travail : le/la chercheur·e qui produit ou utilise des données a besoin de réponses concrètes aux questions auxquelles il/elle est confronté·e sur son terrain comme lors de tous ses travaux de recherche. Produire, exploiter, diffuser, partager ou éditer des sources numériques fait aujourd'hui partie de notre travail ordinaire. La rupture apportée par le développement du web et l'arrivée du format numérique ont largement facilité la diffusion et le partage des ressources (documentaires, textuelles, photographiques, sonores ou audiovisuelles...) dans le monde de la recherche et, au-delà, auprès des citoyens de plus en plus curieux et intéressés par les documents produits par les scientifiques.

  • Publication . Article . Preprint . Conference object . 2019
    Open Access
    Authors: 
    Lilia Simeonova; Kiril Simov; Petya Osenova; Preslav Nakov;
    Publisher: Incoma Ltd., Shoumen, Bulgaria

    We propose a morphologically informed model for named entity recognition, which is based on LSTM-CRF architecture and combines word embeddings, Bi-LSTM character embeddings, part-of-speech (POS) tags, and morphological information. While previous work has focused on learning from raw word input, using word and character embeddings only, we show that for morphologically rich languages, such as Bulgarian, access to POS information contributes more to the performance gains than the detailed morphological information. Thus, we show that named entity recognition needs only coarse-grained POS tags, but at the same time it can benefit from simultaneously using some POS information of different granularity. Our evaluation results over a standard dataset show sizable improvements over the state-of-the-art for Bulgarian NER. Comment: named entity recognition; Bulgarian NER; morphology; morpho-syntax

  • Publication . Article . 2021
    Open Access
    Authors: 
    John A. Walsh; Peter J. Cobb; Wayne de Fremery; Koraljka Golub; Humphrey Keah; Jeonghyun Kim; Joseph Kiplang'at; Ying-Hsang Liu; Simon Mahony; Sam Gyun Oh; +3 more
    Publisher: Wiley

    The interdisciplinary field known as digital humanities (DH) is represented in various forms in the teaching and research practiced in iSchools. Building on the work of an iSchools organization committee charged with exploring digital humanities curricula, we present findings from a series of related studies exploring aspects of DH teaching, education, and research in iSchools, often in collaboration with other units and disciplines. Through a survey of iSchool programs and an online DH course registry, we investigate the various education models for DH training found in iSchools, followed by a detailed look at DH courses and curricula, explored through analysis of course syllabi and course descriptions. We take a brief look at collaborative disciplines with which iSchools cooperate on DH research projects or in offering DH education. Next, we explore DH careers through an analysis of relevant job advertisements. Finally, we offer some observations about the management and administrative challenges and opportunities related to offering a new iSchool DH program. Our results provide a snapshot of the current state of digital humanities in iSchools which may usefully inform the design and evolution of new DH programs, degrees, and related initiatives.

  • Open Access English
    Authors: 
    Laura Rimell; Thomas Lippincott; Karin Verspoor; Helen L. Johnson; Anna Korhonen;
    Publisher: Elsevier Inc.
    Project: UKRI | Lexical Acquisition for t... (EP/G051070/1)

    Background: Biomedical natural language processing (NLP) applications that have access to detailed resources about the linguistic characteristics of biomedical language demonstrate improved performance on tasks such as relation extraction and syntactic or semantic parsing. Such applications are important for transforming the growing unstructured information buried in the biomedical literature into structured, actionable information. In this paper, we address the creation of linguistic resources that capture how individual biomedical verbs behave. We specifically consider verb subcategorization, or the tendency of verbs to ''select'' co-occurrence with particular phrase types, which influences the interpretation of verbs and identification of verbal arguments in context. There are currently a limited number of biomedical resources containing information about subcategorization frames (SCFs), and these are the result of either labor-intensive manual collation, or automatic methods that use tools adapted to a single biomedical subdomain. Either method may result in resources that lack coverage. Moreover, the quality of existing verb SCF resources for biomedicine is unknown, due to a lack of available gold standards for evaluation. Results: This paper presents three new resources related to verb subcategorization frames in biomedicine, and four experiments making use of the new resources. We present the first biomedical SCF gold standards, capturing two different but widely-used definitions of subcategorization, and a new SCF lexicon, BioCat, covering a large number of biomedical sub-domains. We evaluate the SCF acquisition methodologies for BioCat with respect to the gold standards, and compare the results with the accuracy of the only previously existing automatically-acquired SCF lexicon for biomedicine, the BioLexicon. Our results show that the BioLexicon has greater precision while BioCat has better coverage of SCFs. Finally, we explore the definition of subcategorization using these resources and its implications for biomedical NLP. All resources are made publicly available. Conclusion: The SCF resources we have evaluated still show considerably lower accuracy than that reported with general English lexicons, demonstrating the need for domain- and subdomain-specific SCF acquisition tools for biomedicine. Our new gold standards reveal major differences when annotators use the different definitions. Moreover, evaluation of BioCat yields major differences in accuracy depending on the gold standard, demonstrating that the definition of subcategorization adopted will have a direct impact on perceived system accuracy for specific tasks.

  • Authors: 
    Georgios Artopoulos; Panayiotis Charalambous; Colter Eugene Wehmeier;
    Publisher: IGI Global

    This article reports on the technical development and testing of the basic components of a virtual environment platform that could be used for the cross-disciplinary study of complex urban realities, such as the historic city of Nicosia, Cyprus - the last divided capital of Europe. This platform captures data of virtual visitors' movements in space, and the article suggests that these data could help better understand the impact of planning scenarios and design interventions in open public spaces that used to be popular among the citizens of the historic city. The article presents how this platform uses interaction and immersion opportunities to engage citizens and stakeholders in the management of public open spaces that are associated with built heritage. Crowd simulation is discussed as a computational technique that when is combined with the presented virtual environment platform, and under the right conditions, would contribute to a digital practice for small-scale urban modelling. However, it is beyond the scope of this technical note to provide a full empirical testing and validation of the presented immersive virtual environment.

  • Authors: 
    Mae Velloso-Lyons; Quinn Dombrowski; Kathryn Starkey;
    Publisher: University of Toronto Press Inc. (UTPress)

    This paper introduces the Global Medieval Sourcebook (GMS), an online repository of medieval texts and translations with an open network of contributors. Drawing on our experience developing and building this project in two phases over six years, we reflect on questions regarding editorial, curatorial, and translation practices, and on technical issues involved in the preparation, display, and preservation of texts for online publication. As an example of a scholar-led digital project, the trajectory of the GMS has broad relevance for scholars planning digital curation work of their own. In particular, it offers a salutary example of the infrastructural barriers to sustaining collaborative digital work and a possible model for preserving a concluded project.

  • Open Access English
    Authors: 
    Bridgette Wessels; Rachel Finn; Peter Linde; Paolo Mazzetti; Stefano Nativi; Susan Riley; Rod Smallwood; Mark J. Taylor; Victoria Tsoukala; Kush Wadhwa; +1 more
    Countries: Italy, Netherlands, Sweden
    Project: EC | OPENAIRE (246686), EC | APARSEN (269977), EC | DRIVER II (212147), EC | ETTIS (285593), EC | RECODE (321463)

    This paper explores key issues in the development of open access to research data. The use of digital means for developing, storing and manipulating data is creating a focus on ‘data-driven science’. One aspect of this focus is the development of ‘open access’ to research data. Open access to research data refers to the way in which various types of data are openly available to public and private stakeholders, user communities and citizens. Open access to research data, however, involves more than simply providing easier and wider access to data for potential user groups. The development of open access requires attention to the ways data are considered in different areas of research. We identify how open access is being unevenly developed across the research environment and the consequences this has in terms of generating data gaps. Data gaps refer to the way data becomes detached from published conclusions. To address these issues, we examine four main areas in developing open access to research data: stakeholder roles and values; technological requirements for managing and sharing data; legal and ethical regulations and procedures; institutional roles and policy frameworks. We conclude that problems of variability and consistency across the open access ecosystem need to be addressed within and between these areas to ensure that risks surrounding a data gap are managed in open access. 11 authors. Missing: Sally Wyatt

  • Open Access English
    Authors: 
    Bridget Almas;
    Publisher: Ubiquity Press

    The Perseids project provides a platform for creating, publishing, and sharing research data, in the form of textual transcriptions, annotations and analyses. An offshoot and collaborator of the Perseus Digital Library (PDL), Perseids is also an experiment in reusing and extending existing infrastructure, tools, and services. This paper discusses infrastructure in the domain of digital humanities (DH). It outlines some general approaches to facilitating data sharing in this domain, and the specific choices we made in developing Perseids to serve that goal. It concludes by identifying lessons we have learned about sustainability in the process of building Perseids, noting some critical gaps in infrastructure for the digital humanities, and suggesting some implications for the wider community.

  • Open Access
    Authors: 
    Riccardo Pozzo; Andrea Filippetti; Mario Paolucci; Vania Virgili;
    Publisher: Oxford University Press (OUP)
    Country: Italy

    AbstractThis article introduces the notion of cultural innovation, which requires adapting our approach to co-creation. The argument opens with a first conceptualization of cultural innovation as an additional and autonomous category of the complex processes of co-creation. The dimensions of cultural innovation are contrasted against other forms of innovation. In a second step, the article makes an unprecedented attempt in describing processes and outcomes of cultural innovation, while showing their operationalization in some empirical case studies. In the conclusion, the article considers policy implications resulting from the novel definition of cultural innovation as the outcome of complex processes that involve the reflection of knowledge flows across the social environment within communities of practices while fostering the inclusion of diversity in society. First and foremost, cultural innovation takes a critical stance against inequalities in the distribution of knowledge and builds innovation for improving the welfare of individuals and communities.

  • English
    Authors: 
    Kristanti, Tanti; Romary, Laurent;
    Publisher: HAL CCSD
    Country: France

    International audience; This article presents an overview of approaches and results during our participation in the CLEF HIPE 2020 NERC-COARSE-LIT and EL-ONLY tasks for English and French. For these two tasks, we use two systems: 1) DeLFT, a Deep Learning framework for text processing; 2) entity-fishing, generic named entity recognition and disambiguation service deployed in the technical framework of INRIA.