Publisher: Centre pour la Communication Scientifique Directe (CCSD)
This article presents a study of the French-speaking digital humanities. It is based on the experience of two research engineers from the French National Center for Scientific Research (CNRS) who have been studying these issues for the last ten years. They conducted a survey at the École Normale Supérieure (ENS-Paris) which enabled them to draw up an overview of the transformation of the profession of humanities and social sciences research engineers in the context of the digital humanities. The Digit_Hum initiative, which they run in parallel with their respective activities at the ENS, also provided information for this overview thanks to its role as a space for discussion about the digital humanities along with training and structuring of this field at the ENS and the Université Paris Sciences & Lettres (PSL). Cet article est une réflexion sur les humanités numériques en contexte francophone. Elle s’appuie sur l'expérience de deux ingénieures du Centre National de la Recherche Scientifique travaillant sur ces questions depuis une dizaine d'années. À travers l'enquête qu'elles ont menée à l'École normale supérieure (ENS-Paris), elles dressent un panorama de la transformation du métier d'ingénieur(e) en sciences humaines et sociales dans le contexte des humanités numériques. L'initiative Digit_Hum, qu'elles animent en parallèle de leurs activités respectives à l'École, nourrit également ce témoignage en constituant un espace de discussions, de formations et de structuration des humanités numériques au sein de l'ENS et de l’Université Paris Sciences & Lettres.
The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as cognitive, social, or scholarly constructs with a rich history further complicate the matter. This contribution focuses on thematic aspects of genre with a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to discover thematic patterns and trends in large collections of texts, with a view to class or browse them on the basis of their dominant themes. It has rarely if ever, however, been applied to collections of dramatic texts. In this contribution, Topic Modeling is used to analyze a collection of French Drama of the Classical Age and the Enlightenment. The general aim of this contribution is to discover what semantic types of topics are found in this collection, whether different dramatic subgenres have distinctive dominant topics and plot-related topic patterns, and inversely, to what extent clustering methods based on topic scores per play produce groupings of texts which agree with more conventional genre distinctions. This contribution shows that interesting topic patterns can be detected which provide new insights into the thematic, subgenre-related structure of French drama as well as into the history of French drama of the Classical Age and the Enlightenment. 11 figures
Digital technologies, such as the Internet and Artificial Intelligence, are part of our daily lives, influencing broader aspects of our way of life, as well as the way we interact with the past. Having dramatically changed the ways in which knowledge is produced and consumed, the algorithmic age has also radically changed the relationship that the general public has with History. Fields of History such as Public and Oral History have particularly benefitted from the rise of digital culture. How does our digital culture affect the way we think, study, research and teach the past, as historical evidence spreads rapidly in the public sphere? How do digital technologies promote the study, writing and teaching of History? What should historians, students of history and pre-service history teachers be critically aware of, when swarmed with digitized or born-digital content, constantly growing on the Internet? And while these changes are now visible globally, how is the discipline of History situated within the digital transformation rapidly advancing in Greece? Finally, what are the consequences of these changes for History as a subject taught at Greek secondary schools? These are some of the issues raised in the text that follows, which is part of the course materials of the undergraduate course offered during winter semester 2020-2021 at the School University of Athens, School of Philosophy, Pedagogy, Psychology. Course Title: 'Pedagogics of History: Theory and Practice', Academic Institution: School of Philosophy-Pedagogy-Psychology, University of Athens. Comment: 47 pages, in Greek, 8 figures
AbstractHow can researchers identify suitable research data repositories for the deposit of their research data? Which repository matches best the technical and legal requirements of a specific research project? For this end and with a humanities perspective the Data Deposit Recommendation Service (DDRS) has been developed as a prototype. It not only serves as a functional service for selecting humanities research data repositories but it is particularly a technical demonstrator illustrating the potential of re-using an already existing infrastructure - in this case re3data - and the feasibility to set up this kind of service for other research disciplines. The documentation and the code of this project can be found in the DARIAH GitHub repository: https://dariah-eric.github.io/ddrs/.
In this article, we propose a Category Theory approach to (syntactic) interoperability between linguistic tools. The resulting category consists of textual documents, including any linguistic annotations, NLP tools that analyze texts and add additional linguistic information, and format converters. Format converters are necessary to make the tools both able to read and to produce different output formats, which is the key to interoperability. The idea behind this document is the parallelism between the concepts of composition and associativity in Category Theory with the NLP pipelines. We show how pipelines of linguistic tools can be modeled into the conceptual framework of Category Theory and we successfully apply this method to two real-life examples. Paper submitted to Applied Category Theory 2020 and accepted for Virtual Poster Session
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe's specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI, including many opportunities, synergies but also misconceptions, has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear
We investigated the evolution and transformation of scientific knowledge in the early modern period, analyzing more than 350 different editions of textbooks used for teaching astronomy in European universities from the late fifteenth century to mid-seventeenth century. These historical sources constitute the Sphaera Corpus. By examining different semantic relations among individual parts of each edition on record, we built a multiplex network consisting of six layers, as well as the aggregated network built from the superposition of all the layers. The network analysis reveals the emergence of five different communities. The contribution of each layer in shaping the communities and the properties of each community are studied. The most influential books in the corpus are found by calculating the average age of all the out-going and in-coming links for each book. A small group of editions is identified as a transmitter of knowledge as they bridge past knowledge to the future through a long temporal interval. Our analysis, moreover, identifies the most disruptive books. These books introduce new knowledge that is then adopted by almost all the books published afterwards until the end of the whole period of study. The historical research on the content of the identified books, as an empirical test, finally corroborates the results of all our analyses. 19 pages, 9 figures
The digital transformation has initiated a paradigm shift in research and scholarly communication practices towards a more open scholarly culture. Although this transformation is slowly happening in the Digital Humanities field, open is not yet default. The article introduces the OpenMethods metablog, a community platform that highlights open research methods, tools, and practices within the context of the Digital Humanities by republishing open access content around methods and tools in various formats and languages. It also describes the platform’s technical infrastructure based on its requirements and main functionalities, and especially the collaborative content sourcing and editorial workflows. The article concludes with a discussion of the potentials of the OpenMethods metablog to overcome barriers towards open practices by focusing on inclusive, community sourced information based around opening up research processes and the challenges that need to be overcome to achieve its goals.
KBSET is an environment that provides support for scholarly editing in two flavors: First, as a practical tool KBSET/Letters that accompanies the development of editions of correspondences (in particular from the 18th and 19th century), completely from source documents to PDF and HTML presentations. Second, as a prototypical tool KBSET/NER for experimentally investigating novel forms of working on editions that are centered around automated named entity recognition. KBSET can process declarative application-specific markup that is expressed in LaTeX notation and incorporate large external fact bases that are typically provided in RDF. KBSET includes specially developed LaTeX styles and a core system that is written in SWI-Prolog, which is used there in many roles, utilizing that it realizes the potential of Prolog as a unifying language. Comment: To appear in DECLARE 2019 Revised Selected Papers
We present in this work a new dataset of coreference annotations for works of literature in English, covering 29,103 mentions in 210,532 tokens from 100 works of fiction. This dataset differs from previous coreference datasets in containing documents whose average length (2,105.3 words) is four times longer than other benchmark datasets (463.7 for OntoNotes), and contains examples of difficult coreference problems common in literature. This dataset allows for an evaluation of cross-domain performance for the task of coreference resolution, and analysis into the characteristics of long-distance within-document coreference.