logo comere

Canal "cmr-getalp_org-actu" du corpus de français tchaté

logo ortolang

This page: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1
Back to corpus: http://hdl.handle.net/11403/comere/cmr-getalp_org

How to cite this resource

Falaise, A.(2014).Corpus de français tchaté getalp_org .In Chanier T. (ed) Banque de corpus CoMeRe Ortolang/CoMeRe.[ http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1]

This form has been automatically extracted from the TEI file. For the full contents, see http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1.xml.

Overview of the corpus

This sub-corpus corresponds to an individual channel within cmr-getalp_org. This is a textchat corpus, in French, from the EpikNet network of Internet Relay Chat. The corpus was collected in 2004 and automatically encoded (Falaise 2005). It includes 4 million messages from 105 channels that are heterogenous in terms of their thematic and prgamatic nature. Topics discussed in each channel vary and and range from general chat, talking about everything and nothing, to specialised chat where, for example, programming problems or current affairs are discussed. Differences between channels also exist on a pragmatic level. Certain channels are dedicated to games (hangman, quizzes) whilst others include press releases from the Agence France Presse (AFP - French Press Agency) or are dedicated to technical discussions that take a question-answer form; for example, the channel dedicated to programming questions. The initial corpus was converted into TEI within the framework of the CoMeRe (Communication médiée par les réseaux) project. This project aims to assemble different network-mediated communication corpora in French (Internet, telecommunication), to structure them in a standard format and to release the corpora in an open access format for research purposes. The CoMeRe project has received support from ORTOLANG and the national consortium Corpus-écrits. ;

Keywords : applied_linguistics ; discourse_analysis ; text_and_corpus_linguistics ; primary_text ; dialogue ; Communication Médiée par les Réseaux ; CoMeRe ; clavardage ; Computer Mediated Communication ; CMC ; textchat ; IRC ;

References

Falaise, A. (2005). Constitution d'un corpus de français tchaté. Actes de RECITAL 2005, Dourdan. oai:hal.archives-ouvertes.fr:hal-00909667


Rationale for this corpus

This corpus is a subpart of the CoMeRe corpus databank

The CoMeRe (Communication Médiée par les Réseaux) project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora built on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH).

The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project for which thr participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).


Description of the Interaction Space

CMC Environment

  • texchat-epiknet : Definition of the modality textchat. Type of messages used in cmr-getalp_org. Textchat features are those coming from EpikNet
  • Structure of interactions
    post: One post corresponds to one texchat turn, i.e. one participant's utterrance.

    Data Collection

    Data collected : From 2004-02-03 to 2004-04-09
    rs: Blanquefort, France
    rs: 7008161
    rs: http://www.botstats.com
    rs: http://www.epiknet.org

    Language of the data: français

    Types of interaction

    channel: mode: w ,textchat
    constitution: Messages typed by participants inside EpikNet IRC Channels and then collected by Botstats.com
    derivation: type: original ,
    domain: type: public ,
    factuality: type: fact ,
    interaction: type: complete ,active: plural ,passive: many ,
    preparedness: type: spontaneous ,
    purpose: degree: high ,Canal généraliste, dépêches AFP et commentaires sur l'actualité.

    Participants (extract)

    As explained in the tagUsageof the element post, the system does not offer unambiguous ways of identifying a participant when interacting in a given channel (over , possibly, several weeks). Tracking aliases' use may be one way of approching this identification, but is not completely reliable. Hence it is not possible to list here the list of participants. This identification may be a topic of investigation for future analyses.

    Person ID= cmr-get-c027-p9
    persName: Baku Siddhartha

    Person ID= cmr-get-c027-p17
    persName: Gck|ILJ|Away Gck|ILJ|school

    Person ID= cmr-get-c027-p127
    persName: delinquant45

    Person ID= cmr-get-c027-p129
    persName: Absence_De_Cerveau ELendil Elendil absence_de_cerveau


    Extracts of Interactions


    Composition of the corpus

    Collection cmr-getalp_org: list of files

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org---quizz---tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org--p-u-r--tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-18-25ans-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-allsoluces-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-anaisgirl-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-angel-corp-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-blondin-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-botstats-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cplusplus-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-caline-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cocktails-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-cstrike-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-darkcloud-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-dbz_legend-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-debian-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-deejays-world-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-deglingo-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-dragon-ball-z-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edelweiss-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-edensensuelcam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-enjoy-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-fac-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffmaniac-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffparadise-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ffx-2-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-fikx-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-foldingathome-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-france1-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-francophone-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-funkycops-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-g-faction-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-games-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-gck-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-greatnothing-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hikago-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hinatalove-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-hokutoteam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-humour-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-iquotes-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-irpg-chat-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-irpg-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ishtar-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-japanimation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-jump-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-koma-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-kyo-music-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-le-monde-des-reptiles-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-leseigneurdesanneaux-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-linux-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-madness-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-magnapoke-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manga-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manga4ever-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-manganimation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-mew-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-mixi-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nemo-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ninou-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nintendojofr-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-nokiagame.fr-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-php-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-planete-gundam-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-pokelord-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-politique-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-princedelu-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-programmation-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-qcradio-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-quebec-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-radioabf-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-radiofrhub-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ragnarok-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-ragot-chan-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-raysanctuary-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-rhone-alpes-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-rien-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-sc-team-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-scripts-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-slackware-tei-v1

    http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-[dmb]dreamchan-tei-v1


    Download the whole corpus: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-tei-v1.zip (ZIP file, 118.8 Mo )

    nbMotsMessages=48188 ; nbevenements=19042 ; nbcommandes=76 ; nbmessages=15022 ; nbmots=133929 ; nbparticipants=253 ; nbconnectes=1522 ; nbformes2=4055 ; nbformes1=4093 ; information computed and described by A. Falaise according to nbconnectes : le nombre d'utilisateurs uniques se connectant ou se déconnectant, déterminés d'après les logins de connexion (et non les pseudos); c'est à dire à peu de choses près le nombre d'utilisateurs connectés à un moment ou un autre sur le canal nbparticipants : le nombre d'intervenants uniques, déterminés d'après les pseudos (un même utilisateur peut avoir plusieurs pseudos, on comptera alors plusieurs intervenants); c'est à dire le nombre d'utilisateurs envoyant des messages et/ou des commandes. nbmessages : le nombre de balises "chat-message" (voir cette balise) nbevenements : le nombre de balises "chat-event" (voir cette balise) nbcommandes : le nombre de balises "chat-command" (voir cette balise) nbMotsMessages : le nombre de mots dans les interventions des balises "chat-message", paratexte (date, heure, pseudo de l'auteur) non compris. Un mot est défini par n'importe quoi compris entre deux blancs ou caractères .:/\'"+;!,?(){}[] nbmots : le nombre de mots dans toutes les interventions, paratexte (date, heure, pseudo de l'auteur) non compris. nbformes1 : le nombre de formes uniques apparaissant au moins une fois. nbformes2 : le nombre de formes uniques apparaissant au moins deux fois.


    Credits

    principal : Falaise Achille, Chanier Thierry.
    compiler : Falaise Achille .
    editor : Chanier Thierry .
    data inputter : Hriba Linda, Jin Kun.
    developer : Lotin Paul.
    participant : Wigham Ciara.
    publisher : ORTOLANG (Outils et Ressources pour un Traitement Optimisé de la LANGue), Nancy:France .

    Publication Statement and Rights

    Publisher(s)

    Date: 2014-05-01

    Identifier(s)

    uri: cmr-getalp_org-actu-tei-v1
    short-uri: cmr-get-c027
    url: http://hdl.handle.net/11403/comere/cmr-getalp_org/cmr-getalp_org-actu-tei-v1

    Licence

    http://creativecommons.org/licenses/by-nc-sa/4.0/

    This corpus can be freely distributed and shared subject only to attribution, non commercial use and share alike. The way to reference / cite the corpus is given in the titleSmt

    Rights holders of this corpus are: Kévin Labécot ; Achille Falaise ; Thierry Chanier