Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
46 Research products, page 1 of 5

  • DARIAH EU
  • Publications
  • Research software
  • 2012-2021
  • Conference object
  • Hyper Article en Ligne
  • INRIA a CCSD electronic archive server
  • ProdInra
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • Publication . Article . Other literature type . Conference object . 2020
    Open Access English
    Authors: 
    Stefan Bornhofen; Marten Düring;
    Publisher: HAL CCSD
    Country: France
    Project: ANR | BLIZAAR (ANR-15-CE23-0002)

    AbstractThe paper presents Intergraph, a graph-based visual analytics technical demonstrator for the exploration and study of content in historical document collections. The designed prototype is motivated by a practical use case on a corpus of circa 15.000 digitized resources about European integration since 1945. The corpus allowed generating a dynamic multilayer network which represents different kinds of named entities appearing and co-appearing in the collections. To our knowledge, Intergraph is one of the first interactive tools to visualize dynamic multilayer graphs for collections of digitized historical sources. Graph visualization and interaction methods have been designed based on user requirements for content exploration by non-technical users without a strong background in network science, and to compensate for common flaws with the annotation of named entities. Users work with self-selected subsets of the overall data by interacting with a scene of small graphs which can be added, altered and compared. This allows an interest-driven navigation in the corpus and the discovery of the interconnections of its entities across time.

  • English
    Authors: 
    Blandine Nouvel; Evelyne Sinigaglia; Véronique HUMBERT;
    Publisher: HAL CCSD
    Country: France

    International audience; The aim of the talk is to present the methodology used to reorganise the PACTOLS thesaurus of Frantiq, launched within the framework of the MASA consortium. PACTOLS is a multilingual and open repository about archaeology from Prehistory to the present and for Classics. It is organized into six micro-thesaurus at the root of its name (Peuples, Anthroponymes,Chronologie, Toponymes, Oeuvres, Lieux, Sujets). The goal is to turn it into a tool interoperable with information systems beyond its original documentary purpose, and usable by archaeologists as a repository for managing scientific data. During the talk, we will describe the choice of tools, the organisation of work within the steering group and the collaborations with specialists for the upgrading and development of the vocabulary while showing the strengths and limitations of some experiments. Above allit will show how the introduction of the conceptual categories of the BackBone Thesaurus of DARIAH, modelled on the CIDOC-CRM ontology, through a progressive deconstruction/reconstruction process, eventually had an impact on all micro thesauri and questioned the organisation of knowledge so far proposed.

  • Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
    Open Access English
    Authors: 
    Rehm, Georg; Marheinecke, Katrin; Hegele, Stefanie; Piperidis, Stelios; Bontcheva, Kalina; Hajic, Jan; Choukri, Khalid; Vasiljevs, Andrejs; Backfried, Gerhard; Prinz, Christoph; +37 more
    Publisher: Zenodo
    Countries: France, Denmark
    Project: EC | X5gon (761758), SFI | ADAPT: Centre for Digital... (13/RC/2106), FCT | PINFRA/22117/2016 (PINFRA/22117/2016), EC | AI4EU (825619), EC | ELG (825627), EC | BDVe (732630)

    Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  • Open Access English
    Authors: 
    Ivan Kratchanov;

    International audience; The National Library Ivan Vazov in Plovdiv is the second largest library in Bulgaria. It serves asthe second national legal depository of Bulgarian printed works. In addition, it has contributedsignificantly to the preservation and the digital accessibility of the national cultural andhistorical heritage. This article offers an overview of the library’s history and currentdevelopments in the field of automation and digitization.

  • Publication . Article . Other literature type . Conference object . 2020
    Open Access English
    Authors: 
    Martin Grandjean;
    Publisher: HAL CCSD

    International audience; The technicality of network visualization applied to history and its relative novelty often result in a superficial use of a software, limited to describing a situation immediately extracted from a data set. This approach is justified in the exploratory phase of an analysis in most cases where the network is very explicitly present in the object studied. But the complexity of the entanglement of historical actors, places, institutions or temporal sequences makes finer modeling necessary if we want to go beyond a simplistic "datafication". To encourage curiosity towards other modes of analysis and put the data modeling (and therefore the historical sources) at the center of the research process, this article proposes a short introduction on how to discuss what makes a specific historical network, its components, its relationships, its layers and its different facets. It offers a kind of visual guide to help historians follow a multilayer framework to think their research object from another (multidimensional) angle and to combine them.

  • Publication . Part of book or chapter of book . 2019
    Open Access
    Authors: 
    Elisa Nury;
    Country: Switzerland

    International audience; This paper describes the workflow of the Grammateus project, from gathering data on Greek documentary papyri to the creation of a web application. The first stage is the selection of a corpus and the choice of metadata to record: papyrology specialists gather data from printed editions, existing online resources and digital facsimiles. In the next step, this data is transformed into the EpiDoc standard of XML TEI encoding, to facilitate its reuse by others, and processed for HTML display. We also reuse existing text transcriptions available on . Since these transcriptions may be regularly updated by the scholarly community, we aim to access them dynamically. Although the transcriptions follow the EpiDoc guidelines, the wide diversity of the papyri as well as small inconsistencies in encoding make data reuse challenging. Currently, our data is available on an institutional GitLab repository, and we will archive our final dataset according to the FAIR principles.

  • Publication . Conference object . 2019
    English
    Authors: 
    Dombrowski, Quinn; Fischer, Frank; Edmond, Jennifer; Tasovac, Toma; Raciti, Marco; Chambers, Sally; Daems, Joke; Hacigüzeller, Piraye; Smith, Kathleen M.; Worthey, Glen; +5 more
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    International audience; DARIAH, the digital humanities infrastructure with origins and an organisational home in Europe, is nearing the completion of its implementation phase. The significant investment from the European Commission and member countries has yielded a robust set of technical and social infrastructures, ranging from working groups, various registries, pedagogical materials, and software to support diverse approaches to digital humanities scholarship. While the funding and leadership of DARIAH to date has come from countries in, or contiguous with, Europe, the needs that drive its technical and social development are widely shared within the international digital humanities community beyond Europe. Scholars on every continent would benefit from well-supported technical tools and platforms, directories for facilitating access to information and resources, and support for working groups.The DARIAH Beyond Europe workshop series, organised and financed under the umbrella of the DESIR project (“DARIAH ERIC Sustainability Refined,” 2017–2019, funded by the European Union’s Horizon 2020 Research and Innovation Program), convened three meetings between September 2018 and March 2019 in the United States and Australia. These workshops served as fora for cross-cultural exchange, and introduced many non-European DH scholars to DARIAH; each of the workshops included a significant delegation from various DARIAH bodies, together with a larger number of local presenters and participants. The local contexts for these workshops were significantly different in their embodiment of research infrastructures: on the one hand, in the U.S., a private research university (Stanford) and the de facto national library (the Library of Congress), both in a country with a history of unsuccessful national-scale infrastructure efforts; and in Australia, a system which has invested substantially more in coordinated national research infrastructure in science and technology, but very little on a national scale in the humanities and arts. Europe is in many respects ahead of both host countries in terms of its research infrastructure ecosystem both at the national and pan-European levels.The Stanford workshop had four main topics of focus: corpus management; text and image analysis; geohumanities; and music, theatre, and sound studies. As the first of the workshops, the Stanford group also took the lead in proposing next steps toward exploring actionable “DARIAH beyond Europe” initiatives, including the beginnings of a blog shared among participants from all the workshops, extra-European use of DARIAH’s DH Course Registry, and non-European participation in DARIAH Working Groups.The overall theme of the Library of Congress workshop was “Collections as Data,” building on a number of U.S.-based initiatives exploring how to enhance researcher engagement with digital collections through computationally-driven research. In Washington, D.C., the knowledge exchange sessions focussed on digitised newspapers and text analysis, infrastructural challenges for public humanities, and the use of web-archives in DH research. As at Stanford, interconnecting with DARIAH Working Groups was of core interest to participants, and a new Working Group was proposed to explore global access and use of digitised historical newspapers. A further important outcome was the agreement to explore collaboration between the U.S.-based “Collections as Data” initiatives and the Heritage Data Reuse Charter in Europe. The third and final workshop in the series took place in March 2019 in Australia, hosted by the National Library of Australia in Canberra. Convened by the Australian Academy of the Humanities (AAH), together with the Australian Research Data Commons (ARDC) and DARIAH, this event was co-located with the Academy’s second annual Humanities, Arts and Culture Data Summit. The first day of the event, targeted at research leadership and policy makers, was intended to explore new horizons for data-driven humanities and arts research, digital cultural collections and research infrastructure. The two subsequent days focused on engaging with a wide variety of communities, including (digital) humanities researchers and cultural heritage professionals. Organised around a series of Knowledge Exchange Sessions, combined with research-led lightning talks, the participants spoke in detail about how big ideas can be implemented practically on the ground. This poster reflects on the key outcomes and future directions arising from these three workshops, and considers what it might look like for DARIAH to be adopted as a fundamental DH infrastructure in a complex variety of international, national, and regional contexts, with diverse funding models, resources, needs, and expectations. One major outcome of all workshops was the shared recognition that, in spite of extensive funding, planning, and goodwill, these workshops were not nearly global enough in their reach: most importantly they were not inclusive of the Global South. Our new DARIAH beyond Europe community has a strong shared commitment to address this gap.

  • Open Access English
    Authors: 
    Lamé, M.; Pittet, P.; Federico Ponchio; Markhoff, B.; Sanfilippo, E. M.;
    Publisher: HAL CCSD
    Countries: Italy, France

    International audience; In this paper, we present an online communication-driven decision support system to align terms from a dataset with terms of another dataset (standardized controlled vocabulary or not). Heterotoki differs from existing proposals in that it takes place at the interface with humans, inviting the experts to commit on their definitions, so as to either agree to validate the mapping or to propose some enrichment to the terminologies. More precisely, differently to most of existing proposals that support terminology alignment, Heterotoki sustains the negotiation of meaning thanks to semantic coordination support within its interface design. This negotiation involves domain experts having produced multiple datasets.

  • Open Access English
    Authors: 
    Marlet , Olivier; Francart, Thomas; Markhoff, Béatrice; Rodier, Xavier;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ARIADNEplus (823914)

    International audience; CIDOC CRM is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The Semantic Web with its Linked Open Data cloud enables scholars and cultural institutions to publish their data in RDF, using CIDOC CRM as an interlingua that enables a semantically consistent re-interpretation of their data. Nowadays more and more projects have done the task of mapping legacy datasets to CIDOC CRM, and successful Extract-Transform-Load data-integration processes have been performed in this way. A next step is enabling people and applications to actually dynamically explore autonomous datasets using the semantic mediation offered by CIDOC CRM. This is the purpose of OpenArchaeo, a tool for querying archaeological datasets on the LOD cloud. We present its main features: the principles behind its user friendly query interface and its SPARQL Endpoint for programs, together with its overall architecture designed to be extendable and scalable, for handling transparent interconnections with evolving distributed sources while achieving good efficiency.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to DARIAH EU. Are you interested to view more results? Visit OpenAIRE - Explore.
46 Research products, page 1 of 5
  • Open Access English
    Authors: 
    Luca Foppiano; Laurent Romary;
    Publisher: HAL CCSD
    Country: France
    Project: EC | HIRMEOS (731102)

    International audience; This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. The topics of accessibility and sustainability have been long discussed in the attempt of providing some best practices in the widely fragmented ecosystem of the DARIAH research infrastructure. The history of entity-fishing has been mentioned as an example of good practice: initially developed in the context of the FP9 CENDARI, the project was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access.entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case3 to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM).In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In order to cover all aspects, the architecture is structured to provide two complementary viewpoints. First, we discuss the system from the data angle, detailing the workflow from input to output and unpacking each building box in the processing flow. Secondly, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. The attempt here is to give a description of the tool and, at the same time, a technical software engineering analysis which will help the reader to understand our choice for the resources allocated in the infrastructure.Thanks to the work of million of volunteers, Wikipedia has reached today stability and completeness that leave no usable alternatives on the market (considering also the licence aspect). The launch of Wikidata in 2010 have completed the picture with a complementary language independent meta-model which is becoming the scientific reference for many disciplines. After providing an introduction to Wikipedia and Wikidata, we describe the knowledge base: the data organisation, the entity-fishing process to exploit it and the way it is built from nightly dumps using an offline process.We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms. We believe we have strived to provide the best performances with the minimum amount of resources. Thanks to the Huma-num infrastructure we still have the possibility to scale up the infrastructure as needed, for example to support an increase of demand or temporary needs to process huge backlog of documents. On the long term, thanks to this sustainable environment, we are planning to keep delivering the service far beyond the end of the H2020 HIRMEOS project.

  • Publication . Article . Other literature type . Conference object . 2020
    Open Access English
    Authors: 
    Stefan Bornhofen; Marten Düring;
    Publisher: HAL CCSD
    Country: France
    Project: ANR | BLIZAAR (ANR-15-CE23-0002)

    AbstractThe paper presents Intergraph, a graph-based visual analytics technical demonstrator for the exploration and study of content in historical document collections. The designed prototype is motivated by a practical use case on a corpus of circa 15.000 digitized resources about European integration since 1945. The corpus allowed generating a dynamic multilayer network which represents different kinds of named entities appearing and co-appearing in the collections. To our knowledge, Intergraph is one of the first interactive tools to visualize dynamic multilayer graphs for collections of digitized historical sources. Graph visualization and interaction methods have been designed based on user requirements for content exploration by non-technical users without a strong background in network science, and to compensate for common flaws with the annotation of named entities. Users work with self-selected subsets of the overall data by interacting with a scene of small graphs which can be added, altered and compared. This allows an interest-driven navigation in the corpus and the discovery of the interconnections of its entities across time.

  • English
    Authors: 
    Blandine Nouvel; Evelyne Sinigaglia; Véronique HUMBERT;
    Publisher: HAL CCSD
    Country: France

    International audience; The aim of the talk is to present the methodology used to reorganise the PACTOLS thesaurus of Frantiq, launched within the framework of the MASA consortium. PACTOLS is a multilingual and open repository about archaeology from Prehistory to the present and for Classics. It is organized into six micro-thesaurus at the root of its name (Peuples, Anthroponymes,Chronologie, Toponymes, Oeuvres, Lieux, Sujets). The goal is to turn it into a tool interoperable with information systems beyond its original documentary purpose, and usable by archaeologists as a repository for managing scientific data. During the talk, we will describe the choice of tools, the organisation of work within the steering group and the collaborations with specialists for the upgrading and development of the vocabulary while showing the strengths and limitations of some experiments. Above allit will show how the introduction of the conceptual categories of the BackBone Thesaurus of DARIAH, modelled on the CIDOC-CRM ontology, through a progressive deconstruction/reconstruction process, eventually had an impact on all micro thesauri and questioned the organisation of knowledge so far proposed.

  • Publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . Article . 2020
    Open Access English
    Authors: 
    Rehm, Georg; Marheinecke, Katrin; Hegele, Stefanie; Piperidis, Stelios; Bontcheva, Kalina; Hajic, Jan; Choukri, Khalid; Vasiljevs, Andrejs; Backfried, Gerhard; Prinz, Christoph; +37 more
    Publisher: Zenodo
    Countries: France, Denmark
    Project: EC | X5gon (761758), SFI | ADAPT: Centre for Digital... (13/RC/2106), FCT | PINFRA/22117/2016 (PINFRA/22117/2016), EC | AI4EU (825619), EC | ELG (825627), EC | BDVe (732630)

    Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

  • Open Access English
    Authors: 
    Ivan Kratchanov;

    International audience; The National Library Ivan Vazov in Plovdiv is the second largest library in Bulgaria. It serves asthe second national legal depository of Bulgarian printed works. In addition, it has contributedsignificantly to the preservation and the digital accessibility of the national cultural andhistorical heritage. This article offers an overview of the library’s history and currentdevelopments in the field of automation and digitization.

  • Publication . Article . Other literature type . Conference object . 2020
    Open Access English
    Authors: 
    Martin Grandjean;
    Publisher: HAL CCSD

    International audience; The technicality of network visualization applied to history and its relative novelty often result in a superficial use of a software, limited to describing a situation immediately extracted from a data set. This approach is justified in the exploratory phase of an analysis in most cases where the network is very explicitly present in the object studied. But the complexity of the entanglement of historical actors, places, institutions or temporal sequences makes finer modeling necessary if we want to go beyond a simplistic "datafication". To encourage curiosity towards other modes of analysis and put the data modeling (and therefore the historical sources) at the center of the research process, this article proposes a short introduction on how to discuss what makes a specific historical network, its components, its relationships, its layers and its different facets. It offers a kind of visual guide to help historians follow a multilayer framework to think their research object from another (multidimensional) angle and to combine them.

  • Publication . Part of book or chapter of book . 2019
    Open Access
    Authors: 
    Elisa Nury;
    Country: Switzerland

    International audience; This paper describes the workflow of the Grammateus project, from gathering data on Greek documentary papyri to the creation of a web application. The first stage is the selection of a corpus and the choice of metadata to record: papyrology specialists gather data from printed editions, existing online resources and digital facsimiles. In the next step, this data is transformed into the EpiDoc standard of XML TEI encoding, to facilitate its reuse by others, and processed for HTML display. We also reuse existing text transcriptions available on . Since these transcriptions may be regularly updated by the scholarly community, we aim to access them dynamically. Although the transcriptions follow the EpiDoc guidelines, the wide diversity of the papyri as well as small inconsistencies in encoding make data reuse challenging. Currently, our data is available on an institutional GitLab repository, and we will archive our final dataset according to the FAIR principles.

  • Publication . Conference object . 2019
    English
    Authors: 
    Dombrowski, Quinn; Fischer, Frank; Edmond, Jennifer; Tasovac, Toma; Raciti, Marco; Chambers, Sally; Daems, Joke; Hacigüzeller, Piraye; Smith, Kathleen M.; Worthey, Glen; +5 more
    Publisher: HAL CCSD
    Country: France
    Project: EC | DESIR (731081)

    International audience; DARIAH, the digital humanities infrastructure with origins and an organisational home in Europe, is nearing the completion of its implementation phase. The significant investment from the European Commission and member countries has yielded a robust set of technical and social infrastructures, ranging from working groups, various registries, pedagogical materials, and software to support diverse approaches to digital humanities scholarship. While the funding and leadership of DARIAH to date has come from countries in, or contiguous with, Europe, the needs that drive its technical and social development are widely shared within the international digital humanities community beyond Europe. Scholars on every continent would benefit from well-supported technical tools and platforms, directories for facilitating access to information and resources, and support for working groups.The DARIAH Beyond Europe workshop series, organised and financed under the umbrella of the DESIR project (“DARIAH ERIC Sustainability Refined,” 2017–2019, funded by the European Union’s Horizon 2020 Research and Innovation Program), convened three meetings between September 2018 and March 2019 in the United States and Australia. These workshops served as fora for cross-cultural exchange, and introduced many non-European DH scholars to DARIAH; each of the workshops included a significant delegation from various DARIAH bodies, together with a larger number of local presenters and participants. The local contexts for these workshops were significantly different in their embodiment of research infrastructures: on the one hand, in the U.S., a private research university (Stanford) and the de facto national library (the Library of Congress), both in a country with a history of unsuccessful national-scale infrastructure efforts; and in Australia, a system which has invested substantially more in coordinated national research infrastructure in science and technology, but very little on a national scale in the humanities and arts. Europe is in many respects ahead of both host countries in terms of its research infrastructure ecosystem both at the national and pan-European levels.The Stanford workshop had four main topics of focus: corpus management; text and image analysis; geohumanities; and music, theatre, and sound studies. As the first of the workshops, the Stanford group also took the lead in proposing next steps toward exploring actionable “DARIAH beyond Europe” initiatives, including the beginnings of a blog shared among participants from all the workshops, extra-European use of DARIAH’s DH Course Registry, and non-European participation in DARIAH Working Groups.The overall theme of the Library of Congress workshop was “Collections as Data,” building on a number of U.S.-based initiatives exploring how to enhance researcher engagement with digital collections through computationally-driven research. In Washington, D.C., the knowledge exchange sessions focussed on digitised newspapers and text analysis, infrastructural challenges for public humanities, and the use of web-archives in DH research. As at Stanford, interconnecting with DARIAH Working Groups was of core interest to participants, and a new Working Group was proposed to explore global access and use of digitised historical newspapers. A further important outcome was the agreement to explore collaboration between the U.S.-based “Collections as Data” initiatives and the Heritage Data Reuse Charter in Europe. The third and final workshop in the series took place in March 2019 in Australia, hosted by the National Library of Australia in Canberra. Convened by the Australian Academy of the Humanities (AAH), together with the Australian Research Data Commons (ARDC) and DARIAH, this event was co-located with the Academy’s second annual Humanities, Arts and Culture Data Summit. The first day of the event, targeted at research leadership and policy makers, was intended to explore new horizons for data-driven humanities and arts research, digital cultural collections and research infrastructure. The two subsequent days focused on engaging with a wide variety of communities, including (digital) humanities researchers and cultural heritage professionals. Organised around a series of Knowledge Exchange Sessions, combined with research-led lightning talks, the participants spoke in detail about how big ideas can be implemented practically on the ground. This poster reflects on the key outcomes and future directions arising from these three workshops, and considers what it might look like for DARIAH to be adopted as a fundamental DH infrastructure in a complex variety of international, national, and regional contexts, with diverse funding models, resources, needs, and expectations. One major outcome of all workshops was the shared recognition that, in spite of extensive funding, planning, and goodwill, these workshops were not nearly global enough in their reach: most importantly they were not inclusive of the Global South. Our new DARIAH beyond Europe community has a strong shared commitment to address this gap.

  • Open Access English
    Authors: 
    Lamé, M.; Pittet, P.; Federico Ponchio; Markhoff, B.; Sanfilippo, E. M.;
    Publisher: HAL CCSD
    Countries: Italy, France

    International audience; In this paper, we present an online communication-driven decision support system to align terms from a dataset with terms of another dataset (standardized controlled vocabulary or not). Heterotoki differs from existing proposals in that it takes place at the interface with humans, inviting the experts to commit on their definitions, so as to either agree to validate the mapping or to propose some enrichment to the terminologies. More precisely, differently to most of existing proposals that support terminology alignment, Heterotoki sustains the negotiation of meaning thanks to semantic coordination support within its interface design. This negotiation involves domain experts having produced multiple datasets.

  • Open Access English
    Authors: 
    Marlet , Olivier; Francart, Thomas; Markhoff, Béatrice; Rodier, Xavier;
    Publisher: HAL CCSD
    Country: France
    Project: EC | ARIADNEplus (823914)

    International audience; CIDOC CRM is an ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information. The Semantic Web with its Linked Open Data cloud enables scholars and cultural institutions to publish their data in RDF, using CIDOC CRM as an interlingua that enables a semantically consistent re-interpretation of their data. Nowadays more and more projects have done the task of mapping legacy datasets to CIDOC CRM, and successful Extract-Transform-Load data-integration processes have been performed in this way. A next step is enabling people and applications to actually dynamically explore autonomous datasets using the semantic mediation offered by CIDOC CRM. This is the purpose of OpenArchaeo, a tool for querying archaeological datasets on the LOD cloud. We present its main features: the principles behind its user friendly query interface and its SPARQL Endpoint for programs, together with its overall architecture designed to be extendable and scalable, for handling transparent interconnections with evolving distributed sources while achieving good efficiency.