publication . Conference object . 2020

Creating Expert Knowledge by Relying on Language Learners: a Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning

Lionel Nicolas; Verena Lyding; Claudia Borg; Corina Forascu; Karën Fort; Katerina Zdravkova; Iztok Kosem; Jaka Cibej; Špela Holdt; Alice Millour; ...
English
  • Published: 11 May 2020
  • Publisher: HAL CCSD
  • Country: France
Abstract
International audience; We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on ongoing proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend...
Subjects
free text keywords: Crowdsourcing, Computer-Assisted Language Learning, Collaborative Resource Construction, COST Action, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [SHS.LANGUE]Humanities and Social Sciences/Linguistics
58 references, page 1 of 4

1 Institute for Applied Linguistics, Eurac Research, Bolzano, Italy 2 Faculty of Information & Communication Technology, University of Malta, Msida, Malta 3 Faculty of Computer Science, Alexandru Ioan Cuza University of Iasi, Romania 4 Sorbonne Universite´ / STIH - EA 4509, France

5 Ss. Cyril and Methodius University, Faculty of Computer Science and Engineering, Macedonia 6 Faculty of Arts, University of Ljubljana, Slovenia 7 CLARIN ERIC, the Netherlands 8 Computational Cognition Lab, Open University of Cyprus, Cyprus 9 Orientale University of Naples, Italy, 10 Insight Centre for Data Analytics, National University of Ireland Galway, Ireland 11 Department of Computer Science, University of Helsinki, Finland 12 Human Languages Technologies Lab, INESC-ID, Lisbon, Portugal,

13 Dept. of Computer Science, Jerusalem College of Technology (Lev Academic Center), Israel

Biemann, C. (2013). Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation, 47(1):97-122, Mar.

Blanke, T., Bryant, M., Hedges, M., Aschenbrenner, A., and Priddy, M. (2011). Preparing dariah. In 2011 IEEE Seventh International Conference on eScience, pages 158-165. IEEE.

Borg, C. and Gatt, A. (2014). Crowd-sourcing evaluation of automatically acquired, morphologically related word groupings. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3325-3332, Reykjavik, Iceland, May. European Language Resources Association (ELRA).

Bos, J., Basile, V., Evang, K., Venhuizen, N., and Bjerva, J. (2017). The groningen meaning bank. In Nancy Ide et al., editors, Handbook of Linguistic Annotation, volume 2, pages 463-496. Springer.

Callison-Burch, C. and Dredze, M. (2010). Creating speech and language data with amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 1-12. Association for Computational Linguistics.

Cardenas, R., Borg, C., and Zeman, D. (2019). CUNIMalta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 104-112, Florence, Italy, August. Association for Computational Linguistics.

Chamberlain, J., Poesio, M., and Kruschwitz, U. (2008). Phrase detectives: A web-based collaborative annotation game. In Proceedings of the International Conference on Semantic Systems (I-Semantics 08), pages 42-49.

Chamberlain, J., Fort, K., Kruschwitz, U., Lafourcade, M., and Poesio, M. (2013). Using games to create language resources: Successes and limitations of the approach. In Iryna Gurevych et al., editors, The People's Web Meets NLP, Theory and Applications of Natural Language Processing, pages 3-44. Springer Berlin Heidelberg.

Cholakov, K. and Van Noord, G. (2010). Using unknown word techniques to learn known words. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 902-912. [OpenAIRE]

Constant, M., Eryig˘it, G., Monti, J., Van Der Plas, L., Ramisch, C., Rosner, M., and Todirascu, A. (2017). Multiword expression processing: A survey. Computational Linguistics, 43(4):837-892.

Dima, C. and Hinrichs, E. (2011). A semi-automatic, iterative method for creating a domain-specific treebank. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, pages 413-419, Hissar, Bulgaria, September. Association for Computational Linguistics.

Evanini, K., Higgins, D., and Zechner, K. (2010). Using amazon mechanical turk for transcription of non-native speech. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 53-56. Association for Computational Linguistics.

58 references, page 1 of 4
Any information missing or wrong?Report an Issue