research product . Other ORP type . 2016

88milSMS. A corpus of authentic text messages in French

Panckhurst, Rachel; Détrie, Catherine; Lopez, Cédric; Moïse, Claudine; Roche, Mathieu; Verine, Bertrand;
French
  • Published: 01 Jan 2016
  • Publisher: HAL CCSD
  • Country: France
Abstract
The first version of the corpus (ISLRN : 024-713-187-947-8) was produced in 2014 as part of the "sud4science LR project". More than 88,000 authentic SMS, sent by hundreds of donators living mainly in the Montpellier area, were collected, in 2011, then anonymised, by the researchers, their student interns and a legal adviser-CIL.The initial corpus was then converted to TEI standard in the project CoMeRe (Communication Médiée par les Réseaux). This project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on data extracted from the Internet. These heterogenous corpora will be s...
Subjects
free text keywords: CMC (computer-mediated communication), NLP, CMO, Annotation, TAL, SMS, Corpus, [INFO]Computer Science [cs]
Communities
DARIAH EU
Download fromView all 6 versions
HAL-Pasteur
Other ORP type . 2016
Provider: HAL-Pasteur
HAL - UPEC / UPEM
Other ORP type . 2016
Hal-Diderot
Other ORP type . 2016
Provider: Hal-Diderot
Any information missing or wrong?Report an Issue