The present paper describes the etymological component of the TEI Lex-0 initiative which aims at defining a terser subset of the TEI guidelines for the representation of etymological features in dictionary entries. Going beyond the basic provision of etymological mechanisms in the TEI guidelines, TEI Lex-0 Etym proposes a systematic representation of etymological and cognate descriptions by means of embedded constructs based on the (for etymologies) and (for etymons and cognates) elements. In particular, given that all the potential contents of etymons are highly analogous to those of dictionary entries in general, the contents presented herein heavily re-use many of the corresponding features and constraints introduced in other components of the TEI Lex-0 to the encoding of etymologies and etymons. The TEI Lex-0 Etym model is also closely aligned to ISO 24613-3 on modelling etymological data and the corresponding TEI serialisation available in ISO 24613-4.
Publication . Article . Conference object . Preprint . 2016 . Embargo End Date: 01 Jan 2016
International audience; Current research in lifelog data has not paid enough attention to analysis of cognitive activities in comparison to physical activities. We argue that as we look into the future, wearable devices are going to be cheaper and more prevalent and textual data will play a more significant role. Data captured by lifelogging devices will increasingly include speech and text, potentially useful in analysis of intellectual activities. Analyzing what a person hears, reads, and sees, we should be able to measure the extent of cognitive activity devoted to a certain topic or subject by a learner. Test-based lifelog records can benefit from semantic analysis tools developed for natural language processing. We show how semantic analysis of such text data can be achieved through the use of taxonomic subject facets and how these facets might be useful in quantifying cognitive activity devoted to various topics in a person's day. We are currently developing a method to automatically create taxonomic topic vocabularies that can be applied to this detection of intellectual activity.
In this article, we propose a Category Theory approach to (syntactic) interoperability between linguistic tools. The resulting category consists of textual documents, including any linguistic annotations, NLP tools that analyze texts and add additional linguistic information, and format converters. Format converters are necessary to make the tools both able to read and to produce different output formats, which is the key to interoperability. The idea behind this document is the parallelism between the concepts of composition and associativity in Category Theory with the NLP pipelines. We show how pipelines of linguistic tools can be modeled into the conceptual framework of Category Theory and we successfully apply this method to two real-life examples. Paper submitted to Applied Category Theory 2020 and accepted for Virtual Poster Session
Miriam Baglioni; Alessia Bardi; Argiro Kokogiannaki; Paolo Manghi; Katerina Iatropoulou; Pedro Príncipe; André Vieira; Lars Holm Nielsen; Harry Dimitropoulos; Ioannis Foufoulas; +7 more
Miriam Baglioni; Alessia Bardi; Argiro Kokogiannaki; Paolo Manghi; Katerina Iatropoulou; Pedro Príncipe; André Vieira; Lars Holm Nielsen; Harry Dimitropoulos; Ioannis Foufoulas; Natalia Manola; Claudio Atzori; Sandro La Bruzzo; Emma Lazzeri; Michele Artini; Michele De Bonis; Andrea Dell’Amico;
Despite the hype, the effective implementation of Open Science is hindered by several cultural and technical barriers. Researchers embraced digital science, use “digital laboratories” (e.g. research infrastructures, thematic services) to conduct their research and publish research data, but practices and tools are still far from achieving the expectations of transparency and reproducibility of Open Science. The places where science is performed and the places where science is published are still regarded as different realms. Publishing is still a post-experimental, tedious, manual process, too often limited to articles, in some contexts semantically linked to datasets, rarely to software, generally disregarding digital representations of experiments. In this work we present the OpenAIRE Research Community Dashboard (RCD), designed to overcome some of these barriers for a given research community, minimizing the technical efforts and without renouncing any of the community services or practices. The RCD flanks digital laboratories of research communities with scholarly communication tools for discovering and publishing interlinked scientific products such as literature, datasets, and software. The benefits of the RCD are show-cased by means of two real-case scenarios: the European Marine Science community and the European Plate Observing System (EPOS) research infrastructure. This work is partly funded by the OpenAIRE-Advance H2020 project (grant number: 777541; call: H2020-EINFRA-2017) and the OpenAIREConnect H2020 project (grant number: 731011; call: H2020-EINFRA-2016-1). Moreover, we would like to thank our colleagues Michele Manunta, Francesco Casu, and Claudio De Luca (Institute for the Electromagnetic Sensing of the Environment, CNR, Italy) for their work on the EPOS infrastructure RCD; and Stephane Pesant (University of Bremen, Germany) his work on the European Marine Science RCD. First Online 30 August 2019
In 2018, the European Strategic Forum for research infrastructures (ESFRI) was tasked by the Competitiveness Council, a configuration of the Council of the EU, to develop a common approach for monitoring of Research Infrastructures' performance. To this end, ESFRI established a working group, which has proposed 21 Key Performance Indicators (KPIs) to monitor the progress of the Research Infrastructures (RIs) addressed towards their objectives. The RIs were then asked to assess their relevance for their institution. The paper aims to identify the relevance of certain indicators for particular groups of RIs by using cluster and discriminant analysis. This could contribute to development of a monitoring system, tailored to particular RIs. To obtain a typology of the RIs, we first performed cluster analysis of the RIs according to their properties, which revealed clusters of RIs with similar characteristics, based on to the domain of operation, such as food, environment or engineering. Then, discriminant analysis was used to study how the relevance of the KPIs differs among the obtained clusters. This analysis revealed that the percentage of RIs correctly classified into five clusters, using the KPIs, is 80%. Such a high percentage indicates that there are significant differences in the relevance of certain indicators, depending on the ESFRI domain of the RI. The indicators therefore need to be adapted to the type of infrastructure. It is therefore proposed that the Strategic Working Groups of ESFRI addressing specific domains should be involved in the tailored development of the monitoring of pan-European RIs. Comment: 15 pages, 8 tables, 3 figures
More and more cultural institutions use Linked Data principles to share and connect their collection metadata. In the archival field, initiatives emerge to exploit data contained in archival descriptions and adapt encoding standards to the semantic web. In this context, online authority files can be used to enrich metadata. However, relying on a decentralized network of knowledge bases such as Wikidata, DBpedia or even Viaf has its own difficulties. This paper aims to offer a critical view of these linked authority files by adopting a close-reading approach. Through a practical case study, we intend to identify and illustrate the possibilities and limits of RDF triples compared to institutions' less structured metadata. Comment: Workshop "Dariah "Trust and Understanding: the value of metadata in a digitally joined-up world" (14/05/2018, Brussels), preprint of the submission to the journal "Archives et Biblioth\`eques de Belgique"
AARC (Authentication and Authorisation for Research Communities) is a two-year EC-funded project to develop and pilot an integrated cross-discipline authentication and authorisation framework, building on existing authentication and authorisation infrastructures (AAIs) and production federated infrastructure. AARC also champions federated access and offers tailored training to complement the actions needed to test AARC results and to promote AARC outcomes. This article describes a high-level blueprint architectures for interoperable AAIs. Comment: This text was part of a (public) EU deliverable document. It has a main part and a long appendix with more details about example infrastructures that were taken into acount
Article à paraitre dans les actes du 10e colloque international ISKO France 2015, Systèmes d'organisation des connaissances et humanités numériques, 5 et 6 novembre 2015.; La question émergente en France des données de la recherche se situe dans un cadre institutionnel foisonnant mais rigide, délicat à cerner. La recherche est aussi financée et évaluée au niveau européen. Cette organisation nationale et européenne se double d'un aspect international inhérent à la recherche et aux échanges d'informations rapides et répétés, accélérés par le développement d'Internet. Le labyrinthe institutionnel franco-européen se superpose ainsi avec le millefeuille international et disciplinaire du monde de la recherche. Enfin, la proximité de deux mouvements qui ne sont pourtant pas synonyme, l'Open Access et l'Open Data, vient encore troubler la compréhension de ce panorama. Il n'est donc pas aisé de comprendre les rôles de chacun des acteurs quant aux données de la recherche. C'est à une clarification de ce paysage que nous nous proposons de participer, en initiant une cartographie des initiatives et acteurs visibles en France concernant les données des sciences humaines et sociales. Mots clef : Open data, Open access, données de la recherche, France, sciences humaines et sociales, valorisation de la recherche, politique d'ouverture, acteur de la recherche, recherche publique.
This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications. 39 pages, 15 figures.Version accepted in Journal of Grid Computing