Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ arXiv.org e-Print Ar...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
arXiv.org e-Print Archive
Other literature type . Preprint . 2023
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2023
License: CC BY
Data sources: ZENODO
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://doi.org/10.48550/arxiv...
Article . 2023
License: CC BY SA
Data sources: Datacite
versions View all 6 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

LyricSIM: A novel Dataset and Benchmark for Similarity Detection in Spanish Song LyricS

Authors: Benito-Santos, Alejandro; Ghajari, Adrián; Hernández,Pedro; Fresno, Victor; Ros, Salvador; González-Blanco, Elena;

LyricSIM: A novel Dataset and Benchmark for Similarity Detection in Spanish Song LyricS

Abstract

En este trabajo presentamos un nuevo conjunto de datos y benchmark orientados a la tarea de similitud semántica en letras de canciones. Nuestro conjunto de datos, originalmente formado por 2775 pares de canciones en Español, fue anotado en un experimento de anotación colectivo por 63 anotadores nativos. Después de recoger y refinar los datos para asegurar un alto grado de consenso e integridad en los datos, obtuvimos 676 pares anotados de alta calidad que fueron empleados para evaluar el rendimiento de diferentes modelos del lenguaje monolingües y multilingües pertenecientes al estado del arte. En consecuencia, obtuvimos unos resultados base que esperamos sean de utilidad a la comunidad en todas aquellas aplicaciones académicas e industriales futuras que se realicen en este contexto.

In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs that were used to evaluate the performance of various state-of-the-art monolingual and multilingual language models. Consequently, we established baseline results that we hope will be useful to the community in all future academic and industrial applications conducted in this context.

This research has been carried out in the framework of the Grant LyrAIcs Grant agreement ID: 964009 funded by ERC-POCLS, and in the framework of the Grant CLS INFRA reference 101004984 funded by H2020-INFRAIA-2020-1. It has also received funding from the project ISL: Intelligent Systems for Learning (GID2016-39) in the call PID 22/23, and from FAIRTRANSNLP-DIAGNOSIS: Measuring and quantifying bias and fairness in NLP systems, grant PID2021-124361OB-C32, funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU A way of making Europe. Alejandro Benito-Santos acknowledges support from the postdoctoral grant ”Margarita Salas”, awarded by the Spanish Ministry of Universities.

Country
Spain
Keywords

Semantic textual similarity, FOS: Computer and information sciences, Computer Science - Computation and Language, Conjunto de datos, dataset,benchmark, semantic textual similarity,, Benchmark, Computer Science - Information Retrieval, Song lyrics, Annotation task, Cultural heritage, Tarea de anotación, Letras de canciones, Similitud semántica en textos, Computation and Language (cs.CL), Information Retrieval (cs.IR), Dataset

24 references, page 1 of 3

Abe, K., S. Yokoi, T. Kajiwara, y K. Inui. 2022. Why is sentence similarity benchmark not predictive of applicationoriented task performance? En Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, pa´ginas 70-87, Online, Noviembre. Association for Computational Linguistics.

Agerri, R. y E. Agirre. 2023. Lessons learned from the evaluation of Spanish Language Models. Procesamiento del Lenguaje Natural, 70(0):157-170, Marzo. [OpenAIRE]

Agirre, E., C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, I. Lopez-Gazpio, M. Maritxalar, R. Mihalcea, G. Rigau, L. Uria, y J. Wiebe. 2015. SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability. En Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2015), pa´ginas 252- 263, Denver, Colorado, Junio. Association for Computational Linguistics. 108 citations (Crossref) [2023-03-27].

Agirre, E., C. Banea, C. Cardie, D. Cer, M. Diab, A. Gonzalez-Agirre, W. Guo, R. Mihalcea, G. Rigau, y J. Wiebe. 2014. SemEval-2014 task 10: Multilingual semantic textual similarity. En Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pa´ginas 81-91, Dublin, Ireland, Agosto. Association for Computational Linguistics.

Agirre, E., D. Cer, M. Diab, y A. GonzalezAgirre. 2012. SemEval-2012 task 6: A pilot on semantic textual similarity. En *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pa´ginas 385-393, Montr´eal, Canada, 7-8 Junio. Association for Computational Linguistics.

Agirre, E., D. Cer, M. Diab, A. GonzalezAgirre, y W. Guo. 2013. *SEM 2013 shared task: Semantic Textual Similarity. En Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, pa´ginas 32-43, Atlanta, Georgia, USA, Junio. Association for Computational Linguistics.

Cer, D., M. Diab, E. Agirre, I. Lopez-Gazpio, y L. Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. En Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pa´ginas 1-14, Vancouver, Canada, Agosto. Association for Computational Linguistics.

Chandrasekaran, D. y V. Mago. 2022. Evolution of Semantic Similarity - A Survey. ACM Computing Surveys, 54(2):1- 37, Marzo. arXiv:2004.13820 [cs].

Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzm´an, E. Grave, M. Ott, L. Zettlemoyer, y V. Stoyanov. 2020. Unsupervised Crosslingual Representation Learning at Scale, Abril. arXiv:1911.02116 [cs].

da Silva, A. C. M., D. F. Silva, y R. M. Marcacini. 2020. 4mula: A multitask, multimodal, and multilingual dataset of music lyrics and audio features. En Proceedings of the Brazilian Symposium on Multimedia and the Web, WebMedia '20, pa´gina 145-148, New York, NY, USA. Association for Computing Machinery.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 7
    download downloads 10
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
  • 7
    views
    10
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
7
10
Green
Related to Research communities
DARIAH EU