The European Holocaust Research Infrastructure (EHRI) started in October 2010 to build on a network that connects both people (Holocaust researchers, archivists, curators, librarians and digital humanists) and dispersed Holocaust source material and collections. EHRI’s aim is making sources visible in a systematic way in order to counteract the fragmentation of the sources and to reveal interconnections. EHRI focuses on Archive and collection descriptions, which are available through the EHRI Portal. EHRI is currently in its second phase and is on the ESFRI Roadmap2 for a more sustainable future. EHRI has developed a set of controlled vocabularies that serves both as a retrieval and cataloguing tool for the multilingual and highly heterogeneous data of the EHRI portal. These vocabularies were partly implemented in the first phase of the project. In the current phase of EHRI the vocabularies are in the process of quality improvement improve and enrich the existing terms, add new terms, disambiguate and remove the mistakes (deduplication, merging, adding multilingual labels, consistency checks, multiple parent relations, etc.) and increase their coverage. In the EHRI portal the subject terms are currently not available for the public, as they are used only for retrieval purposes.
AbstractThe paper presents Intergraph, a graph-based visual analytics technical demonstrator for the exploration and study of content in historical document collections. The designed prototype is motivated by a practical use case on a corpus of circa 15.000 digitized resources about European integration since 1945. The corpus allowed generating a dynamic multilayer network which represents different kinds of named entities appearing and co-appearing in the collections. To our knowledge, Intergraph is one of the first interactive tools to visualize dynamic multilayer graphs for collections of digitized historical sources. Graph visualization and interaction methods have been designed based on user requirements for content exploration by non-technical users without a strong background in network science, and to compensate for common flaws with the annotation of named entities. Users work with self-selected subsets of the overall data by interacting with a scene of small graphs which can be added, altered and compared. This allows an interest-driven navigation in the corpus and the discovery of the interconnections of its entities across time.
Project: FWF | Arabic in the Middle Atla... (P 21722)
International audience; Academic dictionary writing is making greater and greater use of the TEI Guidelines’ dictionary module. And as increasing numbers of TEI dictionaries become available, there is an ever more palpable need to work towards greater interoperability among dictionary writing systems and other language resources that are needed by dictionaries and dictionary tools. In particular this holds true for the crucial role that statistical data obtained from language resources play in lexicographic workflow—a role that also has to be reflected in the model of the data produced in these workflows. Presenting a range of current projects, the authors address two main questions in this area: How can the relationship between a dictionary and other language resources be conceptualized, irrespective of whether they are used in the production of the dictionary or to enrich existing lexicographic data? And how can this be documented using the TEI Guidelines? Discussing a variety of options, this paper proposes a customization of the TEI dictionary module that tries to respond to the emerging requirements in an environment of increasingly intertwined language resources.
Van Der Eycken, Johan; Styven, Dorien; Gheldof, Tom; Depoortere, Rolande;
Van Der Eycken, Johan; Styven, Dorien; Gheldof, Tom; Depoortere, Rolande;
Publisher: HAL CCSD
Countries: France, Belgium
This article shows that metadata plays a central role in our society and concludes that through collaborative work, it is possible to pool solutions and to establish relationships of cooperation, both at the level of practical tool development and with regard to sharing and creating knowledge and know-how. ispartof: ABB: Archives et Bibliothèques de Belgique - Archief- en Bibliotheekwezen in België vol:106 pages:135-144 status: published
International audience; The National Library Ivan Vazov in Plovdiv is the second largest library in Bulgaria. It serves asthe second national legal depository of Bulgarian printed works. In addition, it has contributedsignificantly to the preservation and the digital accessibility of the national cultural andhistorical heritage. This article offers an overview of the library’s history and currentdevelopments in the field of automation and digitization.
This paper is about data in the humanities. Most of my colleagues in literary and cultural studies would not necessarily speak of their objects of study as “data.” If you ask them what it is they are studying, they would rather speak of books, paintings and movies; of drama and crime fiction, of still lives and action painting; of German expressionist movies and romantic comedy. They would mention Denis Diderot or Toni Morrison, Chardin or Jackson Pollock, Fritz Lang or Diane Keaton. Maybe they would talk about what they are studying as texts, images, and sounds. But rarely would they consider their objects of study to be “data.” However, in the humanities just as in other areas of research, we are increasingly dealing with “data.” With digitization efforts in the private and public sectors going on around the world, more and more data relevant to our fields of study exists, and, if the data has been licensed appropriately, it is available for research. The digital humanities aim to raise to the challenge and realize the potential of this data for humanistic inquiry. As Christine Borgman has shown in her book on Scholarship in the Digital Age, this is as much a theoretical, methodological and social issue as it is a technical issue. Indeed, the existence of all this data raises a host of questions, some of which I would like to address here. For example: What is the relation between the data we have and our objects of study? – Does data replace books, paintings and movies? In what way can data be said to be representations of them? What difference does it make to analyze the digital representation or version of a novel or a painting instead of the printed book, the manuscript, or the original painting? What types of data are there in the humanities, and what difference does it make? – I will argue that one can distinguish two types of data, “big” data and “smart” data. What, then, does it mean to deal with big data, or smart data, in the humanities? What new ways of dealing with data do we need to adopt in the humanities? – How is big data and smart data being dealt with in the process of scholarly knowledge generation, that is when data is being created, enriched, analyzed and interpreted?
We propose a morphologically informed model for named entity recognition, which is based on LSTM-CRF architecture and combines word embeddings, Bi-LSTM character embeddings, part-of-speech (POS) tags, and morphological information. While previous work has focused on learning from raw word input, using word and character embeddings only, we show that for morphologically rich languages, such as Bulgarian, access to POS information contributes more to the performance gains than the detailed morphological information. Thus, we show that named entity recognition needs only coarse-grained POS tags, but at the same time it can benefit from simultaneously using some POS information of different granularity. Our evaluation results over a standard dataset show sizable improvements over the state-of-the-art for Bulgarian NER. named entity recognition; Bulgarian NER; morphology; morpho-syntax
International audience; This contribution will show how Access play a strong role in the creation and structuring of DARIAH, a European Digital Research Infrastructure in Arts and Humanities.To achieve this goal, this contribution will develop the concept of Access from five examples:_ Interdisciplinarity point of view_ Manage contradiction between national and international perspectives_ Involve different communities (not only researchers stakeholders)_ Manage tools and services_ Develop and use new collaboration toolsWe would like to demonstrate that speaking about Access always implies a selection, a choice, even in the perspective of "Open Access".