logo comere

Alpes4science : corpus de SMS réels dans les Alpes, smsalpes, banque de corpus CoMeRe

logo ortolang

This page: http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1
Back to corpus: http://hdl.handle.net/11403/comere/cmr-smsalpes

How to cite this resource

Antoniadis, G.(2014). Corpus de SMS réels dans les Alpes, smsalpes, .In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang : Nancy. [cmr-smsalpes-tei-v1 ; http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1]

This form has been automatically extracted from the TEI file. For the full contents, see http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1.xml.

Overview of the corpus

The first version of the corpus was established in the context of the operation "SMS of the Alps", conducted by LIDILEM, University Stendhal. 22.000 real SMS, send essentially by hundreds of donators living in the departments of the Alps, have been collected, in 2011, by the researchers. The initial corpus was then converted to TEI standard in the project CoMeRe (Communication Médiée par les Réseaux) . This project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH). ;

Keywords : applied_linguistics ; discourse_analysis ; text_and_corpus_linguistics ; primary_text ; dialogue ; Communication Médiée par les Réseaux ; CoMeRe ; texto ; Computer Mediated Communication ; CMC ; Short Message Service ;

References

Antoniadis G., Chabert G., Zampa V. (2011). Alpes4science : Constitution d’un corpus de SMS réels en France métropolitaine. Colloque TEXTOS : dimensions culturelles, linguistiques et pragmatiques. Congrès annuel de l'ACFAS, 9 et 10 mai 2011, Sherbrooke, Canada

Chabert G., Zampa V., Antoniadis G., Mallen M. (2012). Des SMS Alpins, Éditions de la Bibliothèque départementale des Hautes-Alpes, Gap, ISBN 9782953719628


Rationale for this corpus

The project « SMS of the Alps » follows the collecting of 22.000 real SMS send essentialy by donators living in the departments of the Alps. The project aims three objectives : 1. The building up of a corpus of anonymised SMS, structured in XML, facilitating the re-use and interoperability of the data. The corpus, free of rights, will be available for researchers interested in this mode of communication. 2. The transcription of the SMS in “standard French” and the making of a dictionary French-SMS “language”. The transcription will be made semi-automatically, using a transcription-interface that is being elaborated. The dictionary will be presented with a request-interface to facilitate the exploitation. The interface should permit to “link” the data of the dictionary with those of the corpus, so to permit, for example, the research of contexts (of SMS) containing one or several terms given by the user. This kind of tool shall be very useful for the study of “SMS language” and the beginning point of several applications, in particular didactic ones. 3. The exploitation and study of data from the questionnaire. To facilitate the exploitation of these data, a request-interface will be created ; it should permit the extraction of data depending on a set of criteria chosen by the user. The initial corpus has been converted to TEI standard in the project CoMeRe (Communication Médiée par les Réseaux) .

The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project which participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).


Description of the Interaction Space

CMC Environment

  • sms : Definition of the modality SMS. Type of messages used in SMS.
  • Structure of interactions
    post: one post corresponds to one SMS. When arriving on the server, sms including more than 162 characters may have been truncated.

    Data Collection

    Data collected : From 2010-10-01 to 2011-01-16
    location: A private company collected the messages and sent them to Laboratoire de linguistique et didactique des langues étrangères et maternelles, Université Grenoble 3. Grenoble, France 7008759

    Language of the data: français

    Types of interaction

    channel: mode: w , Short Message Service
    constitution: The harvest of SMS requires the intervention of a technical partner, Orange Informatique, which took in charge the reception of SMS and the transfer to the Lidilem. When arriving on this server, SMS including more than 162 characters may have been truncated.
    derivation: type: original ,
    domain: domain of a message : business or domestic
    factuality: type: fact ,
    interaction: type: complete , active: single ,
    preparedness: type: spontaneous ,
    purpose: open, i.e. several possible purposes

    Participants (extract)

    No information on participants except their IDs and the fact that they live in Rhône-Alpes.Before giving her/his SMS, the participant accepted the consent form explicited in availability.

    Person ID= cmr-smsalpes-c001-p1000000060189758

    Person ID= cmr-smsalpes-c001-p316245434975

    Person ID= cmr-smsalpes-c001-p268574346157245

    Person ID= cmr-smsalpes-c001-p220400343194668


    Extracts of Interactions


    Composition of the corpus


    http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1.xml

    http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1-manuel.pdf


    Download the whole corpus: http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1.zip (ZIP file, 3.6 Mo )

    nbparticipants=359 ; nbmessages=22052


    Credits

    principal : Gerorges Antoniadis, Chanier Thierry.
    compiler : Antoniadis Georges.
    editor : Chanier Thierry.
    data inputter : Hriba Linda, Jin Kun.
    developer : Lotin Paul.
    participant : Ledegen Gudrun.
    publisher : ORTOLANG (Outils et Ressources pour un Traitement Optimisé de la LANGue), Nancy:France .

    Publication Statement and Rights

    Publisher(s)

    Date: 2014-04-30

    Identifier(s)

    uri: cmr-smsalpes-tei-v1
    short-uri: cmr-smsalpes-c001
    url: http://hdl.handle.net/11403/comere/cmr-smsalpes/cmr-smsalpes-tei-v1

    Licence

    http://creativecommons.org/licenses/by-nc-sa/4.0/

    Rights holders of this corpus are: Antoniadis Georges ; Thierry Chanier

    This corpus can be freely distributed and shared subject only to attribution, non commercial use and share alike. The way to reference / cite the corpus is given in the titleSmt