Panckhurst R., Détrie C., Lopez C., Moïse C., Roche M., Verine B. (2016).
88milSMS. A corpus of authentic text messages in French (nouvelle version du corpus ISLRN :
024-713-187-947-8). In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang : Nancy.
[cmr-88milsms-tei-v1 ;]
The first version of the corpus (ISLRN : 024-713-187-947-8) was produced in 2014 as part of
the "sud4science LR project". More than 88,000 authentic SMS, sent by hundreds of donators
living mainly in the Montpellier area, were collected, in 2011, then anonymised, by the
researchers, their student interns and a legal adviser-CIL.
The initial corpus was then converted to TEI standard in the project CoMeRe (Communication
Médiée par les Réseaux). This project aims to build a kernel corpus assembling existing
corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on
data extracted from the Internet. These heterogenous corpora will be structured and
processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData
through the national infrastructure Ortolang, following constraints which will be reused
for the forthcoming “Corpus de Référence du Français”. Project supported by the national
consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to
Keywords: Short Message Service; Computer Mediated Communication; CMC;
- Created on: 2016-09-01
- Language: fra
- Coverage: nbparticipants=422 ; nbmessages=88522; nbemoticons-emojis=29563
- Time of data collection: name=88milsms ; start=2011-09-15 ; end=2011-12-15
- ConformTo: TEI (Text Encoding Initiative)The TEI structure used is an extension of TEI
for CMC genres. This extension is developped by a European project for which thr
participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT),
Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE),
Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE)."
- Scientific references:
This corpus contains :
- cmr-88milsms-tei-v1.xml;
- cmr-88milsms-guide.pdf;
- cmr-88milsms-participants_questionnaire_explications.pdf;
- cmr-88milsms-participants_questionnaire_reponses.ods;
- cmr-88milsms-participants_questionnaire_reponses-v2.ods;
- Creators: PANCKHURST Rachel; CHANIER Thierry ;
- compiler: PANCKHURST Rachel
- depositor: PANCKHURST Rachel
- editor: CHANIER Thierry
- developer: LOTIN Paul
- data_inputter: DÉTRIE Catherine
- developer: LOPEZ Cédric
- data_inputter: MOÏSE Claudine
- developer: ROCHE Mathieu
- developer: VERINE Bertrand
- sponsor: Maison des Sciences de l'Homme de Montpellier ;
- sponsor: Délégation générale à la langue française et aux langues de France ;
- sponsor: Consortium CORLI (; ILF (Institut de
Linguistique Française, ; TGIR (Très Grande Infrastructure de
Recherche, ; France
This corpus can be freely distributed and shared subject only to attribution.
The way to reference / cite the corpus is given in the bibliographicCitation