This site: http://hdl.handle.net/11403/comere |
Open Resources and TOols for LANGuage |
CoMeRe Repository: Corpora of Computer-Mediated Communication in French |
This repository includes corpora of mono or multimodal interactions mediated through networks (Internet, Phone, etc.). Three fundamental principles underlie CoMeRe: variety, standards, openness.
Reference : Chanier,T., Poudat,C., Sagot, B., Antoniadis, G., Wigham,C. R. , Hriba, L.,Longhi, J. & Seddah, D. (2014) « The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres ». Special issue on « Building And Annotating Corpora Of Computer-Mediated Discourse: Issues and Challenges at the Interface of Corpus and Computational Linguistics ». JLCL (Journal of Language Technology and Computational Linguistics). pp1-31. http://www.jlcl.org/2014_Heft2/Heft2-2014.pdf
SMS
- cmr-smslareunion - cmr-smsalpes Wiki discussions - cmr-wikiconflits |
Tweets - cmr-polititweets Weblog - cmr-infral |
Email - cmr-simuligne Discussion forum - cmr-simuligne |
Text chat - cmr-getalp_org - cmr-favi - cmr-simuligne |
Multimodal - cmr-copeas - cmr-tridem06 Multimodal + 3D - cmr-archi21 |
1 200 blogs messages ; 273 546 tokens ; 26 participants.
22 000 messages / SMS ; 449 000 tokens ; 359 participants.
nbparticipants=62 + 12 groups (tridems) ; This corpus contains a total of 4894 acts classified as follows: 2809 audio acts, 248 chat acts, 1058 production acts, 779 blog messages. It includes 184 594 tokens
nbparticipants=18 + 4 groups ; This corpus contains a total of 4811 acts classified as follows: 1690 audio acts, 669 chat acts, 2452 production (non verbal) acts. It includes 27 912 tokens
nbparticipants=16 + 2 groups ; This corpus contains a total of 15074 acts classified as follows: 7718 audio acts, 1566 chat acts, 5790 production acts. It includes 127228 tokens
5 Millions (M) textchat turns ; 72 M de tokens ; 53 000 participants.
12 622 messages / SMS ; 357 192 tokens ; 884 participants.
34 273 messages / tweets ; 567 851 tokens ; 205 accounts.
7 conflictual topics ; 3971 contributors ; 4456 posts / contributions in discussions ; 489 000 tokens in discussions (articles not counted) ; 330 Mo (7 sub-corpora zip)
11 506 messages (emails, discussion forum, texchat) ; 600 348 tokens ; 67 participants.
7 780 textchat turns ; 77 605 tokens ; 31 participants.
Université Blaise Pascal, Clermont : Thierry Chanier, Paul Lotin ; Université de Nice : Céline Poudat ; Ortolang : Kun Jin ; Consortium Corpus-écrits : Linda Hriba ; Université Cergy-Pontoise : Julien Longhi ; Université Rennes 2 : Gudrun Ledegen ; Université Stendhal, Grenoble : Georges Antoniadis ; Université Paris 7 / Inria : Benoit Sagot ; Université Lyon 2 : Ciara Wigham ; CNAM, Paris: Camille Paloques-Berges ; Université Paris 3 : Georgeta Cislaru
Consortium Corpus-écrits |
National Infrastructure for Digital Humanities |
Research Unit LRL, Clermont Université |
European SIG: TEI for CMC |
Digital Research Infrastructure for the Arts and Humanities |
Without further notice, this site is under licence :
CoMeRe Repository web site de http://hdl.handle.net/11403/comere est mis à disposition
selon les termes de la licence Creative Commons
Attribution 4.0 International.
Fondé(e) sur une œuvre à http://comere.org.