logo comere

Grand corpus de sms, smslareunion, banque de corpus CoMeRe

logo ortolang

This page: http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1
Back to corpus: http://hdl.handle.net/11403/comere/cmr-smslareunion

How to cite this resource

Ledegen, G.(2014). Grand corpus de sms smslareunion, .In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang : Nancy. [http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1]

This form has been automatically extracted from the TEI file. For the full contents, see http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1.xml.

Overview of the corpus

The first version of the corpus was established in the context of the operation sms4science (Fairon 2006), a research program initiated in 2004 by the CENTAL (Centre de Traitement Automatique du Langage, Catholic University of Louvain in Belgium). Conducted in La Réunion, first, the project has brought together 21 694 SMS messages from the period from April to June 2008, coming from 1,744 users, giving 12,622 finalized SMS messages. The initial corpus was converted into TEI within the framework of the CoMeRe (Communication médiée par les réseaux) project. This project aims to assemble different network-mediated communication corpora in French (Internet, telecommunication), to structure them in a standard format and to release the corpora in an open access format for research purposes. The CoMeRe project has received support from ORTOLANG and the national consortium Corpus-écrits. ;

Keywords : applied_linguistics ; discourse_analysis ; text_and_corpus_linguistics ; primary_text ; dialogue ; Communication Médiée par les Réseaux ; CoMeRe ; texto ; Computer Mediated Communication ; CMC ; Short Message Service ;

References

Ledegen, G. (2010). Contact de langues à La Réunion : « On ne débouche pas des cadeaux. Ben i fé qoué alors ? ». Langues et Cité, ‘Langues en contact’, n° 16, 9-10

Ledegen, G. (2011). Résonance SMS : « Jc c koi mé javé pa rèalizé sur le coup! ». LINX, n° 57, Gadet, F. Guérin, E. (Dirs), ‘Français parlé/français hors de France/créoles à base française d'un point de vue syntaxique’, 101-112.

Ledegen, G., M. Blondel, J. Gonac’h et J. Seeli. (2011). « Contacts de langues dans les SMS ‘sourds’ ». Langues et cité Bulletin de l’observatoire des pratiques linguistiques, n° 19, ‘Parler (avec) plusieurs langues : l’alternance codique’, 10.


Rationale for this corpus

This corpus is a subpart of the CoMeRe corpus databank. The CoMeRe (Communication Médiée par les Réseaux) project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH).

The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project which participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).


Description of the Interaction Space

CMC Environment

  • sms : Definition of the modality SMS. Type of messages used in SMS.
  • Structure of interactions
    post: one post corresponds to one SMS.


    reg: This element appears inside the
    add: This element appears inside

    Data Collection

    Data collected : From 2008-04-10 to 2008-06-30
    location: A private company collected the messages and sent them to Laboratoire de recherche sur les espaces Créolophones et Francophones, Université de la Réunion. La Réunion, France 1000184

    Language of the data: français français

    Types of interaction

    channel: mode: w , Short Message Service
    constitution: The harvest of SMS requires the intervention of a technical partner, Cirrus Informatique, which took in charge the reception of SMS and the transfer to the LCF
    derivation: type: original ,
    domain: domain of a message : business or domestic
    factuality: type: fact ,
    interaction: type: complete , active: single ,
    preparedness: type: spontaneous ,
    purpose: open, i.e. several possible purposes

    Participants (extract)

    QuestionnaireSome participants answered to a questionnaire. The questionnaire is detailed in this document cmr-smslareunion-tei-v1-questionnaire.pdf. Answers to the questionnaire are in this document cmr-smslareunion-tei-v1-answers.csv. Please note that persons who filled the questionnaire may not have sent SMS. hence they are not listed here as participants (e.g. cmr-slr-c001-p005). Vice versa: many participants listed here have not filled the questionnaire.

    Person ID= cmr-slr-c001-p0011

    Person ID= cmr-slr-c001-p0012

    Person ID= cmr-slr-c001-p0017

    Person ID= cmr-slr-c001-p0021


    Extracts of Interactions


    Composition of the corpus


    http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1.xml

    http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1-manuel.pdf

    http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1-questionnaire.pdf

    http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1-answers.csv


    Download the whole corpus: http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1.zip (ZIP file, 4.5 Mo )

    nbparticipants=884 ; nbmessages=12622


    Credits

    principal : Ledegen Gudrun, Chanier Thierry.
    compiler : Ledegen Gudrun.
    editor : Chanier Thierry.
    data inputter : Hriba Linda, Jin Kun, Caron Gauthier, Corré Gaëlle, Guillemain Marie-Caroline.
    developer : Lotin Paul.
    participant : Longhi Julien.
    publisher : ORTOLANG (Outils et Ressources pour un Traitement Optimisé de la LANGue), Nancy:France .

    Publication Statement and Rights

    Publisher(s)

    Date: 2014-05-01

    Identifier(s)

    uri: cmr-smslareunion-tei-v1
    short-uri: cmr-slr-c001
    url: http://hdl.handle.net/11403/comere/cmr-smslareunion/cmr-smslareunion-tei-v1

    Licence

    http://creativecommons.org/licenses/by/4.0/

    Rights holders of this corpus are: LCF ; Gudrun Ledegen ; Thierry Chanier

    This corpus can be freely distributed and shared subject only to attribution. The way to reference / cite the corpus is given in the titleSmt