Miriam Baglioni; Alessia Bardi; Argiro Kokogiannaki; Paolo Manghi; Katerina Iatropoulou; Pedro Príncipe; André Vieira; Lars Holm Nielsen; Harry Dimitropoulos; Ioannis Foufoulas; +7 more
Miriam Baglioni; Alessia Bardi; Argiro Kokogiannaki; Paolo Manghi; Katerina Iatropoulou; Pedro Príncipe; André Vieira; Lars Holm Nielsen; Harry Dimitropoulos; Ioannis Foufoulas; Natalia Manola; Claudio Atzori; Sandro La Bruzzo; Emma Lazzeri; Michele Artini; Michele De Bonis; Andrea Dell’Amico;
Despite the hype, the effective implementation of Open Science is hindered by several cultural and technical barriers. Researchers embraced digital science, use “digital laboratories” (e.g. research infrastructures, thematic services) to conduct their research and publish research data, but practices and tools are still far from achieving the expectations of transparency and reproducibility of Open Science. The places where science is performed and the places where science is published are still regarded as different realms. Publishing is still a post-experimental, tedious, manual process, too often limited to articles, in some contexts semantically linked to datasets, rarely to software, generally disregarding digital representations of experiments. In this work we present the OpenAIRE Research Community Dashboard (RCD), designed to overcome some of these barriers for a given research community, minimizing the technical efforts and without renouncing any of the community services or practices. The RCD flanks digital laboratories of research communities with scholarly communication tools for discovering and publishing interlinked scientific products such as literature, datasets, and software. The benefits of the RCD are show-cased by means of two real-case scenarios: the European Marine Science community and the European Plate Observing System (EPOS) research infrastructure. This work is partly funded by the OpenAIRE-Advance H2020 project (grant number: 777541; call: H2020-EINFRA-2017) and the OpenAIREConnect H2020 project (grant number: 731011; call: H2020-EINFRA-2016-1). Moreover, we would like to thank our colleagues Michele Manunta, Francesco Casu, and Claudio De Luca (Institute for the Electromagnetic Sensing of the Environment, CNR, Italy) for their work on the EPOS infrastructure RCD; and Stephane Pesant (University of Bremen, Germany) his work on the European Marine Science RCD. First Online 30 August 2019
Over the course of its four year project timeline, the CENDARI project has collected archival descriptions and metadata in various formats from a broad range of cultural heritage institutions. These data were drawn together in a single repository and are being stored there. The repository contains curated data which has been manually established by the CENDARI team as well as data acquired from small, ‘hidden’ archives in spreadsheet format or from big aggregators with advanced data exchange tools in place. While the acquisition and curation of heterogeneous data in a single repository presents a technical challenge in itself, the ingestion of data into the CENDARI repository also opens up the possibility to process and index them through data extraction, entity recognition, semantic enhancement and other transformations. In this way the CENDARI project was able to act as a bridge between cultural heritage institutions and historical researchers, insofar as it drew together holdings from a broad range of institutions and enabled the browsing of this heterogeneous content within a single search space. This paper describes a broad range of ways in which the CENDARI project acquired data from cultural heritage institutions as well as the necessary technical background. In exemplifying diverse data creation or acquisition strategies, multiple formats and technical solutions, assets and drawbacks of a repository, this “White Book” aims at providing guidance and advice as well as best practices for archivists and cultural heritage institutions collaborating or planning to collaborate with infrastructure projects.
International audience; With the growth of the Open Science movement in the past few years, researchers have been increasingly encouraged by their home institutions, their funders, and by the public, to share the data they produce. A new model of data sharing is emerging, and this issue is becoming more and more crucial for the scientific community and for national and international research policy. As shown by the OECD in 2007, public granting agencies hope that publicly funded research projects would give access to the data produced within their work, in order to provide new resources for economic development. And with the extension of the Open Research Data Pilot in Horizon 2020, H2020 beneficiaries have to make their research data “findable, accessible, interoperable and reusable (FAIR)”, and are therefore asked to provide a Data Management Plan (or DMP) to this end.More than a constraint, this new model of openness brings direct benefits for researchers. Sharing their data allows the researchers to organise and retrieve them effectively, to ensure their security, to collaborate with fellow researchers within the same discipline or from other disciplines, to reduce costs by avoiding duplication of data collection, to make easier validation of results, to increase the impact and visibility of their research outputs.
Countries: Spain, Spain, Netherlands, Netherlands, France
Project: FCT | EXPL/BBB-BEP/1356/2013 (EXPL/BBB-BEP/1356/2013), AKA | ELIXIR - Data for Life Eu... (273655), WT , EC | WENMR (261572), EC | EGI-INSPIRE (261323), EC | BIOMEDBRIDGES (284209), FCT | SFRH/BPD/78075/2011 (SFRH/BPD/78075/2011), FCT | EXPL/BBB-BEP/1356/2013 (EXPL/BBB-BEP/1356/2013), AKA | ELIXIR - Data for Life Eu... (273655), WT ,...
With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community. AD was supported by Fundação para a Ciência e a Tecnologia, Portugal (SFRH/BPD/78075/2011 and EXPL/BBBBEP/1356/2013). FP has been supported by the National Grid Infrastructure NGI_GRNET, HellasGRID, as part of the EGI. IFB acknowledges funding from the “National Infrastructures in Biology and Health” call of the French “Investments for the Future” initiative. The WeNMR project has been funded by a European FP7 e-Infrastructure grant, contract no. 261572. AF was supported by a grant from Labex CEBA (Centre d’études de la Biodiversité Amazonienne) from ANR. MC is supported by UK’s BBSRC core funding. CSC was supported by Academy of Finland grant No. 273655 for ELIXIR Finland. The EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe) is co-funded by the European Commission (contract number: RI-261323). The BioMedBridges project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 284209. This is an open-access article.-- et al. Peer Reviewed
The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.