publication . Article . 2013

Corpora and concordancers on the nl.ijs.si server

Tomaž Erjavec;
Open Access English
  • Published: 01 May 2013 Journal: Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave, volume 1, issue 1, pages 24-49 (issn: 2335-2736, Copyright policy)
  • Publisher: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (Ljubljana University Press, Faculty of Arts)
Abstract
The paper presents the monolingual and parallel corpora which can be accessed through two concordancers on the server nl.ijs.si. Twelve monolingual corpora contain Slovene language texts, one contains Japanese and one English texts, and comprise reference corpora, such as Gigafida for written contemporary Slovene, IMP for historical Slovene, and GOS for spoken Slovene and specialised corpora, such as the corpus of texts from the informatics domain and the corpus of Slovene tweets. The five parallel corpora contain Slovene texts sentence aligned with, variously, English, Japanese, French, German, and Italian from domains such as EU law, literature and journalism....
Subjects
free text keywords: language corpora, concordancers, CWB, CUWI, noSketchEngine, lcsh:Philology. Linguistics, lcsh:P1-1091
32 references, page 1 of 3

Arhar Holdt, Š., in Gorjanc, V. (2007): Korpus FidaPLUS: nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo, 52 (2): 95-110. [OpenAIRE]

Christ, O. (1994): A Modular and Flexible Architecture for an Integrated Corpus Query System. Proceedings of the Conference in Computational Lexicography, COMPLEX '94: 23-32. Budimpešta: Hungarian Academy of Sciences. [OpenAIRE]

Erjavec, T. (2002): The IJS-ELAN Slovene-English parallel corpus. International Journal of Corpus Linguistics, 7 (1): 1-20.

Erjavec, T., Ignat, C., Pouliquen, B., in Steinberger, R. (2005): Massive MultiLingual Corpus Compilation: Acquis Communautaire and ToTaLe. Proceedings of the 2nd Language & Technology Conference: 32-36. Poznan.

Erjavec, T., in Krek, S. (2008): Oblikoskladenjska priporočila in označeni korpusi JOS. Zbornik Šeste konference Jezikovne tehnologije: 49-53. Ljubljana: Institut »Jožef Stefan«.

Erjavec, T. (2009): Odprtost jezikovnih virov za slovenščino. Infrastruktura slovenščine in slovenistike (28. simpozij Obdobja): 115-121. Ljubljana: Znanstvena založba Filozofske fakultete.

Erjavec, T. (2010): Text Encoding Initiative Guidelines and their Localisation. Infoteka, 11 (1): 3a-14a.

Erjavec, T. (2011): Automatic Linguistic Annotation of Historical Language: ToTrTaLe and XIX Century Slovene. Proceedings of the 5th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH 2011: 33-38. Portland: Association for Computational Linguistics.

Erjavec, T. (2012): Jezikovni viri starejše slovenščine IMP: zbirka besedil, korpus, slovar. Zbornik Osme konference Jezikovne tehnologije: 52-56. Ljubljana: Institut »Jožef Stefan«.

Erjavec, T. (2013): Vzporedni korpus SPOOK: označevanje, zapis in iskanje. V Erjavec, T., Fišer, D. Krek, K., in Ledinek, N. (2010): Jezikovni viri projekta JOS. Zbornik Sedme konference Jezikovne tehnologije, 42-48. Ljubljana: Institut »Jožef Stefan«.

Grčar, M., Krek, S., in Dobrovoljc, K. (2012): Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. Zbornik Osme konference Jezikovne tehnologije, 89-94. Ljubljana: Institut »Jožef Stefan«.

Hmeljak Sangawa, K., Erjavec, T., in Kawamura, Y. (2010): Automated Collection of Japanese Word Usage Examples from a Parallel and a Monolingual Corpus. Proceedings of eLex »eLexicography in the 21st Century: New Challenges, New Applications«: 137-147. Louvain: Presses Universitaires de Louvain.

Hmeljak Sangawa, K., in Erjavec, T. (2008): A Low Cost Approach to Building a Japanese-Slovene Parallel Corpus. Denshi Jäohäo Tsäushin Gakkai gijutsu kenkyäu häokoku, 108: 7-10.

Holozan, P., in dr. (2008): Projekt »Sporazumevanje v slovenskem jeziku«: Specifikacije za učni korpus. Dostopno prek: http://projekt.slovenscina.eu/Media/Kazalniki/Kazalnik2/SSJ_Kazaln ik_2_Specifikacije-ucni-korpus_v1.pdf.

Jakopin, P., in Michelizza, P. (2007): Besedilni korpus Nova beseda. Mostovi, 41 (1/2): 165-176.

32 references, page 1 of 3
Any information missing or wrong?Report an Issue