• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 4 versions
Publication . Article . 2021

A benchmark of Spanish language datasets for computationally driven research

Gustavo Candela; María-Dolores Sáez; Pilar Escobar; Manuel Marco-Such;
Open Access
Published: 13 Dec 2021
Publisher: SAGE Publications
Country: Spain
In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes. This research has been funded by the AETHER-UA (PID2020-112540RB-C43) Project from the Spanish Ministry of Science and Innovation.

Collections as data, Data quality metrics, Digital libraries, GLAM labs, Lenguajes y Sistemas Informáticos, Library and Information Sciences, Information Systems

Related Organizations
Related to Research communities
Download fromView all 3 sources
Journal of Information Science
License: cc-by-sa
Providers: UnpayWall