logo comere

Canal "cmr-getalp_org-edelweiss" du corpus de français tchaté

logo ortolang

This page: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1
Back to corpus: http://hdl.handle.net/11403/comere/cmr-getalp_org

How to cite this resource

Falaise, A.(2014).Corpus de français tchaté getalp_org .In Chanier T. (ed) Banque de corpus CoMeRe Ortolang/CoMeRe.[ http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1]

This form has been automatically extracted from the TEI file. For the full contents, see http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1.xml.

Overview of the corpus

This sub-corpus corresponds to an individual channel within cmr-getalp_org. This is a textchat corpus, in French, from the EpikNet network of Internet Relay Chat. The corpus was collected in 2004 and automatically encoded (Falaise 2005). It includes 4 million messages from 105 channels that are heterogenous in terms of their thematic and prgamatic nature. Topics discussed in each channel vary and and range from general chat, talking about everything and nothing, to specialised chat where, for example, programming problems or current affairs are discussed. Differences between channels also exist on a pragmatic level. Certain channels are dedicated to games (hangman, quizzes) whilst others include press releases from the Agence France Presse (AFP - French Press Agency) or are dedicated to technical discussions that take a question-answer form; for example, the channel dedicated to programming questions. The initial corpus was converted into TEI within the framework of the CoMeRe (Communication médiée par les réseaux) project. This project aims to assemble different network-mediated communication corpora in French (Internet, telecommunication), to structure them in a standard format and to release the corpora in an open access format for research purposes. The CoMeRe project has received support from ORTOLANG and the national consortium Corpus-écrits. ;

Keywords : applied_linguistics ; discourse_analysis ; text_and_corpus_linguistics ; primary_text ; dialogue ; Communication Médiée par les Réseaux ; CoMeRe ; clavardage ; Computer Mediated Communication ; CMC ; textchat ; IRC ;

References

Falaise, A. (2005). Constitution d'un corpus de français tchaté. Actes de RECITAL 2005, Dourdan. oai:hal.archives-ouvertes.fr:hal-00909667


Rationale for this corpus

This corpus is a subpart of the CoMeRe corpus databank

The CoMeRe (Communication Médiée par les Réseaux) project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora built on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH).

The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project for which thr participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).


Description of the Interaction Space

CMC Environment

  • texchat-epiknet : Definition of the modality textchat. Type of messages used in cmr-getalp_org. Textchat features are those coming from EpikNet
  • Structure of interactions
    post: One post corresponds to one texchat turn, i.e. one participant's utterrance.

    Data Collection

    Data collected : From 2004-02-03 to 2004-04-09
    rs: Blanquefort, France
    rs: 7008161
    rs: http://www.botstats.com
    rs: http://www.epiknet.org

    Language of the data: français

    Types of interaction

    channel: mode: w ,textchat
    constitution: Messages typed by participants inside EpikNet IRC Channels and then collected by Botstats.com
    derivation: type: original ,
    domain: type: public ,
    factuality: type: fact ,
    interaction: type: complete ,active: plural ,passive: many ,
    preparedness: type: spontaneous ,
    purpose: degree: high ,Canal généraliste.

    Participants (extract)

    As explained in the tagUsageof the element post, the system does not offer unambiguous ways of identifying a participant when interacting in a given channel (over , possibly, several weeks). Tracking aliases' use may be one way of approching this identification, but is not completely reliable. Hence it is not possible to list here the list of participants. This identification may be a topic of investigation for future analyses.

    Person ID= cmr-get-c003-p102
    persName: WilloW_HaWay

    Person ID= cmr-get-c003-p383
    persName: Anonyme7052611 Anonyme7052627 clochette

    Person ID= cmr-get-c003-p630
    persName: CyberDJ Morpheus

    Person ID= cmr-get-c003-p1294
    persName: Anonyme5168093 Anonyme5168100 Anonyme5168105 Anonyme5168111 Anonyme5168116 Anonyme5168122 Anonyme5168142 Anonyme5168155 Anonyme5168161 Anonyme5168167 Anonyme5168173 Anonyme5168180 Anonyme5168187 Anonyme5168193 Anonyme5168198 Anonyme5168206 Anonyme5168214 Anonyme5168220 Anonyme5168227 Anonyme5168233 Anonyme5168242 Anonyme5168247 Anonyme5168254 Anonyme5168260 Anonyme5168267 Anonyme5168273 Anonyme5168279 Anonyme5168283 Anonyme5168290 Anonyme5168296 Anonyme5168301 Anonyme5168306 Anonyme5168313 Anonyme5168319 Anonyme5168325 Anonyme5168331 Anonyme5168337 Anonyme5168345 Anonyme5168352 Anonyme5168359 Anonyme5168363 Anonyme5168372 Doudou_ZEN[^_^] _Doudou_ZEN[^_^]


    Extracts of Interactions


    Composition of the corpus

    Collection cmr-getalp_org: list of files

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org---quizz---tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org--p-u-r--tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-18-25ans-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-allsoluces-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-anaisgirl-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-angel-corp-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-blondin-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-botstats-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cplusplus-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-caline-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cocktails-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cstrike-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-darkcloud-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-dbz_legend-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-debian-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-deejays-world-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-deglingo-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-dragon-ball-z-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edensensuelcam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-enjoy-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-fac-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffmaniac-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffparadise-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffx-2-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-fikx-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-foldingathome-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-france1-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-francophone-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-funkycops-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-g-faction-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-games-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-gck-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-greatnothing-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hikago-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hinatalove-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hokutoteam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-humour-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-iquotes-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-irpg-chat-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-irpg-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ishtar-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-japanimation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-jump-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-koma-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-kyo-music-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-le-monde-des-reptiles-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-leseigneurdesanneaux-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-linux-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-madness-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-magnapoke-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manga-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manga4ever-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manganimation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-mew-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-mixi-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nemo-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ninou-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nintendojofr-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nokiagame.fr-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-php-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-planete-gundam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-pokelord-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-politique-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-princedelu-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-programmation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-qcradio-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-quebec-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-radioabf-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-radiofrhub-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ragnarok-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ragot-chan-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-raysanctuary-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-rhone-alpes-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-rien-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-sc-team-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-scripts-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-slackware-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-[dmb]dreamchan-tei-v1


    Download the whole corpus: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-tei-v1.zip (ZIP file, 118.8 Mo )

    nbMotsMessages=433471 ; nbevenements=41740 ; nbcommandes=3364 ; nbmessages=223688 ; nbmots=652686 ; nbparticipants=1235 ; nbconnectes=3401 ; nbformes2=8895 ; nbformes1=14672 ; information computed and described by A. Falaise according to nbconnectes : le nombre d'utilisateurs uniques se connectant ou se déconnectant, déterminés d'après les logins de connexion (et non les pseudos); c'est à dire à peu de choses près le nombre d'utilisateurs connectés à un moment ou un autre sur le canal nbparticipants : le nombre d'intervenants uniques, déterminés d'après les pseudos (un même utilisateur peut avoir plusieurs pseudos, on comptera alors plusieurs intervenants); c'est à dire le nombre d'utilisateurs envoyant des messages et/ou des commandes. nbmessages : le nombre de balises "chat-message" (voir cette balise) nbevenements : le nombre de balises "chat-event" (voir cette balise) nbcommandes : le nombre de balises "chat-command" (voir cette balise) nbMotsMessages : le nombre de mots dans les interventions des balises "chat-message", paratexte (date, heure, pseudo de l'auteur) non compris. Un mot est défini par n'importe quoi compris entre deux blancs ou caractères .:/\'"+;!,?(){}[] nbmots : le nombre de mots dans toutes les interventions, paratexte (date, heure, pseudo de l'auteur) non compris. nbformes1 : le nombre de formes uniques apparaissant au moins une fois. nbformes2 : le nombre de formes uniques apparaissant au moins deux fois.


    Credits

    principal : Falaise Achille, Chanier Thierry.
    compiler : Falaise Achille .
    editor : Chanier Thierry .
    data inputter : Hriba Linda, Jin Kun.
    developer : Lotin Paul.
    participant : Wigham Ciara.
    publisher : ORTOLANG (Outils et Ressources pour un Traitement Optimisé de la LANGue), Nancy:France .

    Publication Statement and Rights

    Publisher(s)

    Date: 2014-05-01

    Identifier(s)

    uri: cmr-getalp_org-edelweiss-tei-v1
    short-uri: cmr-get-c003
    url: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1

    Licence

    http://creativecommons.org/licenses/by-nc-sa/4.0/

    This corpus can be freely distributed and shared subject only to attribution, non commercial use and share alike. The way to reference / cite the corpus is given in the titleSmt

    Rights holders of this corpus are: Kévin Labécot ; Achille Falaise ; Thierry Chanier