More and more cultural institutions use Linked Data principles to share and connect their collection metadata. In the archival field, initiatives emerge to exploit data contained in archival descriptions and adapt encoding standards to the semantic web. In this context, online authority files can be used to enrich metadata. However, relying on a decentralized network of knowledge bases such as Wikidata, DBpedia or even Viaf has its own difficulties. This paper aims to offer a critical view of these linked authority files by adopting a close-reading approach. Through a practical case study, we intend to identify and illustrate the possibilities and limits of RDF triples compared to institutions' less structured metadata. Comment: Workshop "Dariah "Trust and Understanding: the value of metadata in a digitally joined-up world" (14/05/2018, Brussels), preprint of the submission to the journal "Archives et Biblioth\`eques de Belgique"
We propose a morphologically informed model for named entity recognition, which is based on LSTM-CRF architecture and combines word embeddings, Bi-LSTM character embeddings, part-of-speech (POS) tags, and morphological information. While previous work has focused on learning from raw word input, using word and character embeddings only, we show that for morphologically rich languages, such as Bulgarian, access to POS information contributes more to the performance gains than the detailed morphological information. Thus, we show that named entity recognition needs only coarse-grained POS tags, but at the same time it can benefit from simultaneously using some POS information of different granularity. Our evaluation results over a standard dataset show sizable improvements over the state-of-the-art for Bulgarian NER. named entity recognition; Bulgarian NER; morphology; morpho-syntax
International audience; This contribution will show how Access play a strong role in the creation and structuring of DARIAH, a European Digital Research Infrastructure in Arts and Humanities.To achieve this goal, this contribution will develop the concept of Access from five examples:_ Interdisciplinarity point of view_ Manage contradiction between national and international perspectives_ Involve different communities (not only researchers stakeholders)_ Manage tools and services_ Develop and use new collaboration toolsWe would like to demonstrate that speaking about Access always implies a selection, a choice, even in the perspective of "Open Access".
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear
CLARIN (Common Language Resources and Technology Infrastructure) is regarded as one of the most important European research infrastructures, offering and promoting a wide array of useful services for (digital) research in linguistics and humanities. However, the assessment of the users for its core technical development has been highly limited, therefore, it is unclear if the community is thoroughly aware of the status-quo of the growing infrastructure. In addition, CLARIN does not seem to be fully materialised marketing and business plans and strategies despite its strong technical assets. This article analyses the web traffic of the Virtual Language Observatory, one of the main web applications of CLARIN and a symbol of pan-European re-search cooperation, to evaluate the users and performance of the service in a transparent and scientific way. It is envisaged that the paper can raise awareness of the pressing issues on objective and transparent operation of the infrastructure though Open Evaluation, and the synergy between marketing and technical development. It also investigates the "science of web analytics" in an attempt to document the research process for the purpose of reusability and reproducibility, thus to find universal lessons for the use of a web analytics, rather than to merely produce a statistical report of a particular website which loses its value outside its context.
The digital humanities (DH) enrich the traditional fields of the humanities with new practices, approaches and methods. Since the turn of the millennium, the necessary skills to realise these new possibilities have been taught in summer schools, workshops and other alternative formats. In the meantime, a growing number of Bachelor's and Master's programmes in digital humanities have been launched worldwide. The DH Course Registry, which is the focus of this article, was created to provide an overview of the growing range of courses on offer worldwide. Its mission is to gather the rich offerings of different courses and to provide an up-to-date picture of the teaching and training opportunities in the field of DH. The article provides a general introduction to this emerging area of research and introduces the two European infrastructures CLARIN and DARIAH, which jointly operate the DH Course Registry. A short history of the Registry is accompanied by a description of the data model and the data curation workflow. Current data, available through the API of the Registry, is evaluated to quantitatively map the international landscape of DH teaching.Preprint of a publication for LibraryTribune (China) (accepted)
The digital transformation has initiated a paradigm shift in research and scholarly communication practices towards a more open scholarly culture. Although this transformation is slowly happening in the Digital Humanities field, open is not yet default. The article introduces the OpenMethods metablog, a community platform that highlights open research methods, tools, and practices within the context of the Digital Humanities by republishing open access content around methods and tools in various formats and languages. It also describes the platform’s technical infrastructure based on its requirements and main functionalities, and especially the collaborative content sourcing and editorial workflows. The article concludes with a discussion of the potentials of the OpenMethods metablog to overcome barriers towards open practices by focusing on inclusive, community sourced information based around opening up research processes and the challenges that need to be overcome to achieve its goals.
The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as cognitive, social, or scholarly constructs with a rich history further complicate the matter. This contribution focuses on thematic aspects of genre with a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to discover thematic patterns and trends in large collections of texts, with a view to class or browse them on the basis of their dominant themes. It has rarely if ever, however, been applied to collections of dramatic texts. In this contribution, Topic Modeling is used to analyze a collection of French Drama of the Classical Age and the Enlightenment. The general aim of this contribution is to discover what semantic types of topics are found in this collection, whether different dramatic subgenres have distinctive dominant topics and plot-related topic patterns, and inversely, to what extent clustering methods based on topic scores per play produce groupings of texts which agree with more conventional genre distinctions. This contribution shows that interesting topic patterns can be detected which provide new insights into the thematic, subgenre-related structure of French drama as well as into the history of French drama of the Classical Age and the Enlightenment. 11 figures
In this paper we provide a systematic and comprehensive set of modeling principles for representing etymological data in digital dictionaries using TEI. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries and born-digital lexical databases that are constructed manually or semi-automatically. We provide examples from many different types of etymological phenomena from traditional lexicographic practice, as well as analytical approaches from functional and cognitive linguistics such as metaphor, metonymy, and grammaticalization, which in many lexicographical and formal linguistic circles have not often been treated as truly etymological in nature, and have thus been largely left out of etymological dictionaries. In order to fully and accurately express the phenomena and their structures, we have made several proposals for expanding and amending some aspects of the existing TEI framework. Finally, with reference to both synchronic and diachronic data, we also demonstrate how encoders may integrate semantic web/linked open data information resources into TEI dictionaries as a basis for the sense, and/or the semantic domain, of an entry and/or an etymon.
A defining feature of data and data workflows in the arts and humanities domain is their dependence on cultural heritage sources hosted and curated in museums, libraries, galleries and archives. A major difficulty when scholars interact with heritage data is that the nature of the cooperation between researchers and Cultural Heritage Institutions and the researchers working in CHIs (henceforth CHIs) is often constrained by structural and legal challenges but even more by uncertainties as to the expectations of both parties.This recognition led several European organizations such as APEF, CLARIN, Europeana, E-RIHS to come together and join forces under the governance of DARIAH to set up principles and mechanisms for improving the conditions for the use and re-use of cultural heritage data issued by cultural heritage institutions and studied and enriched by researchers. As a first step of this joint effort is the Heritage Data Reuse Charter (https://datacharter.hypotheses.org/) establishes 6 basic principles for improving the use and re-use of cultural heritage resources by researchers and , to help all the relevant actors to work together to connect and improve access to heritage data. These are: Reciprocity, Interoperability, Citability, Openness, Stewardship and Trustworthiness.As a further step in translating these principles to actual data workflows the survey below serves as a template to frame exchanges around cultural heritage data by enabling both Cultural Heritage Institutions, infrastructure providers and researchers and to clarify their goals at the beginning and the project, to specify access to data, provenance information, preferred citation standards, hosting responsibilities etc. on the basis of which the parties can arrive at mutual reuse agreements that could serve as a starting point for a FAIR-by-construction data management, right from the project planning/application phase. In practice, the survey below can be flexibly applied in platform-independent ways in exchange protocols between Cultural Heritage Institutions and researchers, Institutions who sign the Charter could use it (and expect to use such surveys) in their own exchange protocols. Another direction of future developments is to set up a platform dedicated to such exchanges. On the other hand, researchers are encouraged to contact the CHIs during the initial stages of their project in order to explain their plans and figure details of transaction together. This mutual declaration can later be a powerful component in their Data Management Plans as it shows evidence for responsible and fair conduct of cultural heritage data, and fair (but also FAIR) research data management practices that are based on partnership with the holding institution. As enclosing a Research Data Management Plan to grant applications is becoming a more and more common requirement among research funders, we need to raise the funders’ awareness to the fact that such bi- or trilateral agreements and data reuse declarations among researchers, CHIs and infrastructure providers are crucial domain-specific components of FAIR data management.