- home
- Advanced Search
Filters
Clear All- DARIAH EU
- Publications
- Other research products
- Preprint
- ES
- GB
- arXiv.org e-Print Archive
- DARIAH EU
- Publications
- Other research products
- Preprint
- ES
- GB
- arXiv.org e-Print Archive
Loading
description Publicationkeyboard_double_arrow_right Article 2023Embargo end date: 01 Jan 2023 SpainPublisher:arXiv Funded by:EC | CLS INFRA, EC | LyrAIcsEC| CLS INFRA ,EC| LyrAIcsBenito-Santos, Alejandro; Ghajari, Adrián; Hernández,Pedro; Fresno, Victor; Ros, Salvador; González-Blanco, Elena;handle: 10045/137147
En este trabajo presentamos un nuevo conjunto de datos y benchmark orientados a la tarea de similitud semántica en letras de canciones. Nuestro conjunto de datos, originalmente formado por 2775 pares de canciones en Español, fue anotado en un experimento de anotación colectivo por 63 anotadores nativos. Después de recoger y refinar los datos para asegurar un alto grado de consenso e integridad en los datos, obtuvimos 676 pares anotados de alta calidad que fueron empleados para evaluar el rendimiento de diferentes modelos del lenguaje monolingües y multilingües pertenecientes al estado del arte. En consecuencia, obtuvimos unos resultados base que esperamos sean de utilidad a la comunidad en todas aquellas aplicaciones académicas e industriales futuras que se realicen en este contexto. In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs that were used to evaluate the performance of various state-of-the-art monolingual and multilingual language models. Consequently, we established baseline results that we hope will be useful to the community in all future academic and industrial applications conducted in this context. This research has been carried out in the framework of the Grant LyrAIcs Grant agreement ID: 964009 funded by ERC-POCLS, and in the framework of the Grant CLS INFRA reference 101004984 funded by H2020-INFRAIA-2020-1. It has also received funding from the project ISL: Intelligent Systems for Learning (GID2016-39) in the call PID 22/23, and from FAIRTRANSNLP-DIAGNOSIS: Measuring and quantifying bias and fairness in NLP systems, grant PID2021-124361OB-C32, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe. Alejandro Benito-Santos acknowledges support from the postdoctoral grant ”Margarita Salas”, awarded by the Spanish Ministry of Universities.
arXiv.org e-Print Ar... arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2023Data sources: arXiv.org e-Print ArchiveRepositorio Institucional de la Universidad de AlicanteArticle . 2023Data sources: Repositorio Institucional de la Universidad de AlicanteRecolector de Ciencia Abierta, RECOLECTAArticle . 2023Full-Text: https://doi.org/10.26342/2023-71-12Data sources: Recolector de Ciencia Abierta, RECOLECTAadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.48550/arxiv.2306.01325&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euAccess RoutesGreen 0 citations 0 popularity Average influence Average impulse Average Powered by BIP!visibility 7visibility views 7 download downloads 10 Powered bymore_vert arXiv.org e-Print Ar... arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2023Data sources: arXiv.org e-Print ArchiveRepositorio Institucional de la Universidad de AlicanteArticle . 2023Data sources: Repositorio Institucional de la Universidad de AlicanteRecolector de Ciencia Abierta, RECOLECTAArticle . 2023Full-Text: https://doi.org/10.26342/2023-71-12Data sources: Recolector de Ciencia Abierta, RECOLECTAadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.48550/arxiv.2306.01325&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eudescription Publicationkeyboard_double_arrow_right Article 2018 France, United Kingdom, Germany EnglishPublisher:HAL CCSD Funded by:EC | CENDARIEC| CENDARINadia Boukhelifa; Michael Bryant; Natasa Bulatovic; Ivan Čukić; Jean-Daniel Fekete; Milica Knežević; Jörg Lehmann; David I. Stuart; Carsten Thiel;doi: 10.1145/3092906
The CENDARI infrastructure is a research-supporting platform designed to provide tools for transnational historical research, focusing on two topics: medieval culture and World War I. It exposes to the end users modern Web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand historians' needs and derive an abstract workflow to support them. We then outline the tools that we have built, tying their technical descriptions to the user requirements. The main tools are the note-taking environment and its faceted search capabilities; the data integration platform including the Data API, supporting semantic enrichment through entity recognition; and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented. International audience
OpenAIRE arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2016Data sources: arXiv.org e-Print ArchivePublikationenserver der Georg-August-Universität GöttingenArticle . 2020Journal on Computing and Cultural HeritageArticle . 2018 . Peer-reviewedLicense: ACM Copyright PoliciesData sources: CrossrefHal-DiderotArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentData sources: Hal-DiderotHAL - UPEC / UPEM; HAL-Pasteur; HAL-InsermArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1145/3092906&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euAccess RoutesGreen hybrid more_vert OpenAIRE arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2016Data sources: arXiv.org e-Print ArchivePublikationenserver der Georg-August-Universität GöttingenArticle . 2020Journal on Computing and Cultural HeritageArticle . 2018 . Peer-reviewedLicense: ACM Copyright PoliciesData sources: CrossrefHal-DiderotArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentData sources: Hal-DiderotHAL - UPEC / UPEM; HAL-Pasteur; HAL-InsermArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1145/3092906&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu
Loading
description Publicationkeyboard_double_arrow_right Article 2023Embargo end date: 01 Jan 2023 SpainPublisher:arXiv Funded by:EC | CLS INFRA, EC | LyrAIcsEC| CLS INFRA ,EC| LyrAIcsBenito-Santos, Alejandro; Ghajari, Adrián; Hernández,Pedro; Fresno, Victor; Ros, Salvador; González-Blanco, Elena;handle: 10045/137147
En este trabajo presentamos un nuevo conjunto de datos y benchmark orientados a la tarea de similitud semántica en letras de canciones. Nuestro conjunto de datos, originalmente formado por 2775 pares de canciones en Español, fue anotado en un experimento de anotación colectivo por 63 anotadores nativos. Después de recoger y refinar los datos para asegurar un alto grado de consenso e integridad en los datos, obtuvimos 676 pares anotados de alta calidad que fueron empleados para evaluar el rendimiento de diferentes modelos del lenguaje monolingües y multilingües pertenecientes al estado del arte. En consecuencia, obtuvimos unos resultados base que esperamos sean de utilidad a la comunidad en todas aquellas aplicaciones académicas e industriales futuras que se realicen en este contexto. In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs that were used to evaluate the performance of various state-of-the-art monolingual and multilingual language models. Consequently, we established baseline results that we hope will be useful to the community in all future academic and industrial applications conducted in this context. This research has been carried out in the framework of the Grant LyrAIcs Grant agreement ID: 964009 funded by ERC-POCLS, and in the framework of the Grant CLS INFRA reference 101004984 funded by H2020-INFRAIA-2020-1. It has also received funding from the project ISL: Intelligent Systems for Learning (GID2016-39) in the call PID 22/23, and from FAIRTRANSNLP-DIAGNOSIS: Measuring and quantifying bias and fairness in NLP systems, grant PID2021-124361OB-C32, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe. Alejandro Benito-Santos acknowledges support from the postdoctoral grant ”Margarita Salas”, awarded by the Spanish Ministry of Universities.
arXiv.org e-Print Ar... arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2023Data sources: arXiv.org e-Print ArchiveRepositorio Institucional de la Universidad de AlicanteArticle . 2023Data sources: Repositorio Institucional de la Universidad de AlicanteRecolector de Ciencia Abierta, RECOLECTAArticle . 2023Full-Text: https://doi.org/10.26342/2023-71-12Data sources: Recolector de Ciencia Abierta, RECOLECTAadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.48550/arxiv.2306.01325&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euAccess RoutesGreen 0 citations 0 popularity Average influence Average impulse Average Powered by BIP!visibility 7visibility views 7 download downloads 10 Powered bymore_vert arXiv.org e-Print Ar... arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2023Data sources: arXiv.org e-Print ArchiveRepositorio Institucional de la Universidad de AlicanteArticle . 2023Data sources: Repositorio Institucional de la Universidad de AlicanteRecolector de Ciencia Abierta, RECOLECTAArticle . 2023Full-Text: https://doi.org/10.26342/2023-71-12Data sources: Recolector de Ciencia Abierta, RECOLECTAadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.48550/arxiv.2306.01325&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eudescription Publicationkeyboard_double_arrow_right Article 2018 France, United Kingdom, Germany EnglishPublisher:HAL CCSD Funded by:EC | CENDARIEC| CENDARINadia Boukhelifa; Michael Bryant; Natasa Bulatovic; Ivan Čukić; Jean-Daniel Fekete; Milica Knežević; Jörg Lehmann; David I. Stuart; Carsten Thiel;doi: 10.1145/3092906
The CENDARI infrastructure is a research-supporting platform designed to provide tools for transnational historical research, focusing on two topics: medieval culture and World War I. It exposes to the end users modern Web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures. We describe how we gathered requirements through multiple methods to understand historians' needs and derive an abstract workflow to support them. We then outline the tools that we have built, tying their technical descriptions to the user requirements. The main tools are the note-taking environment and its faceted search capabilities; the data integration platform including the Data API, supporting semantic enrichment through entity recognition; and the environment supporting the software development processes throughout the project to keep both technical partners and researchers in the loop. The outcomes are technical together with new resources developed and gathered, and the research workflow that has been described and documented. International audience
OpenAIRE arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2016Data sources: arXiv.org e-Print ArchivePublikationenserver der Georg-August-Universität GöttingenArticle . 2020Journal on Computing and Cultural HeritageArticle . 2018 . Peer-reviewedLicense: ACM Copyright PoliciesData sources: CrossrefHal-DiderotArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentData sources: Hal-DiderotHAL - UPEC / UPEM; HAL-Pasteur; HAL-InsermArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1145/3092906&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euAccess RoutesGreen hybrid more_vert OpenAIRE arrow_drop_down arXiv.org e-Print ArchiveOther literature type . Preprint . 2016Data sources: arXiv.org e-Print ArchivePublikationenserver der Georg-August-Universität GöttingenArticle . 2020Journal on Computing and Cultural HeritageArticle . 2018 . Peer-reviewedLicense: ACM Copyright PoliciesData sources: CrossrefHal-DiderotArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentData sources: Hal-DiderotHAL - UPEC / UPEM; HAL-Pasteur; HAL-InsermArticle . 2018License: CC BYFull-Text: https://hal.inria.fr/hal-01523102v2/documentadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10.1145/3092906&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu