Overview of a CoMeRe corpus

>The initial aims of the researchers collecting these data was to be equiped with a corpus that would permit a research centred on the political vocabulary, from analyses of observables coming from the new communication methods. The document

cmr-polititweets-tei-v1-manuel.pdf

détails the way the accounts of 200 persons were selected. Here an extrait 1) we started with 7 personalities of 6 different French political groups : JLMelenchon, Bayrou, Copé, Fillon, Lepen, Ayraut, Cohn-Bendit 2) we gathered on all the lists quotations mentioning them => 7087 lists 3) we selected among these lists, the ones that had at least 6 user accounts / twittos and who contained the chain of characters *politic* in the name or description of the list => 120 listss (11K lignes) 4) On these 120 lists, we selected 2934 messages / tweets ; 5) to be sure to select only political twittos (and not journalistic...), we work by levels. By selecting only the accounts quoted on more than 12 lists, we obtain 205 political twittos. On the 205 accounts, we recovered the 200 last tweets of every person at the date of 27 March 2014, that is 34273 tweets. This has permitted to obtain a corpus centered on the period between two ballots of the local elections 2014, or, for the accounts that were less actives, the consideration of these eletions, or the previous ones (because, according to the density of the publication of tweets, the temporality of each account will be different : the oldest one is dated 2009-03-04 11:59:49).

This corpus is a subpart of the CoMeRe corpus databank. The CoMeRe (Communication Médiée par les Réseaux) project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH).

The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project which participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).

CMC Environment

tweet : Definition of the modality Tweet. Type of messages used in Tweets.

Structure of interactions
text: each text correspond to the set of tweets coming from the same Twitter account
post: one post corresponds to one tweet.

xml:idID of the posting.
whenis date of message on Twitter.
whoID of the twitter account, see listPerson .
typetype of message cf. taxononomy. Not displayed here. Default value for all tweets

p: This element appears inside the
distinct: This element appears inside

twitter-hashtag. Then the element contains ident with #, and ref with the URL of discussion topic
twitter-retweet. Then the element contains ident with RT
twitter-via. Then the element contains ident with via

addressingTerm: Addressing terms address an utterance to a particular interlocutor / twitto or refers to a twitto. It includes :

addressMarker with @
addressee refers to a Twitter account

trailer: This element appears inside

Data Collection

Data collected : From 2009-03-04 to 2014-03-27
location: Twitter website Out of the 30 twitter-account selected from French politicians, the last 200 last tweets have been extracted. Since most of the messages were sent at the end of 2013 and before the end of March 2014, they mainly refers to discussions betwwen the two rounds of voting in the municipal elections of March 2014 France

Language of the data: français

Types of interaction

channel: mode: w , Message sent through a Twitter account
constitution: Selected through automatic processing. See projectDesc for more information
derivation: type: original ,
domain: type: public , domain of a message: politics
factuality: type: fact ,
interaction: type: complete , active: many ,
preparedness: type: spontaneous ,
purpose: political local elections

Participants (extract)

The list or participants, i.e. twittos is given in sourceDesc is

head:Tweets de Jean-Luc Mélenchon

POST: xml:id: p_ts-a449071343175471104 | who: s-p80820758 | when: 2014-03-27T07:32:06 | xml:lang: fra
p: RT @LePG: A 07h50, @JLMelenchon est l'invité des #4vérités sur #France2. Nous live-tweeterons. #Chômage #Municipales2014
trailer:

POST: xml:id: p_ts-a448882215209144320 | who: s-p80820758 | when: 2014-03-26T19:00:34 | xml:lang: fra
p: Ce jeudi à 07h50, je suis l'invité des #4vérités sur @France2tv. Live-tweet sur @LePG. #Chômage #Municipales2014
trailer:

POST: xml:id: p_ts-a448846582491123712 | who: s-p80820758 | when: 2014-03-26T16:38:59 | xml:lang: fra
p: RT @LePG: .@JLMelenchon est dans le 20e, à #Paris, pour soutenir @simonnet2014 qui se maintient au 2e tour. #Paris2014 http://t.co/0FUqIjZg…
trailer:

POST: xml:id: p_ts-a448839614825259008 | who: s-p80820758 | when: 2014-03-26T16:11:17 | xml:lang: fra
p: Dans le 20e arrondissement de #Paris pour soutenir @simonnet2014, candidate du #FDG. #Paris2014 #Municipales2014 http://t.co/rbmyOOc8Bm
trailer:

POST: xml:id: p_ts-a448580952366010368 | who: s-p80820758 | when: 2014-03-25T23:03:27 | xml:lang: fra
p: En librairie depuis vendredi ! http://t.co/0czbsee8wC
trailer:

POST: xml:id: p_ts-a448560305640325120 | who: s-p80820758 | when: 2014-03-25T21:41:25 | xml:lang: fra
p: Avec Damien Vidal et Laurent Galandon, auteurs de la BD "Lip, des héros ordinaires". http://t.co/NogBFhAPQa
trailer:

POST: xml:id: p_ts-a448491380948885504 | who: s-p80820758 | when: 2014-03-25T17:07:32 | xml:lang: fra
p: RT @LePG: A #Bruxelles, @JLMelenchon conclue la rencontre-débat sur le #GMT. Nous live-tweetons - #UE #USA #Europe - http://t.co/hNGmnAagIM
trailer:

POST: xml:id: p_ts-a448489732725809152 | who: s-p80820758 | when: 2014-03-25T17:00:59 | xml:lang: fra
p: "L'accord sera mauvais, même si les tribunaux d'arbitrage en sont exclus" Paul Murphy - @SocialistParty - #GMT #UE #USA #Europe
trailer:

POST: xml:id: p_ts-a448488368666861568 | who: s-p80820758 | when: 2014-03-25T16:55:34 | xml:lang: fra
p: "La mobilisation se lève, en #Europe, contre le #GMT." Pia Eberhardt - @corporateeurope - #UE #USA #Europe
trailer:

head:Tweets de Chantal Jouanno

POST: xml:id: p_ts-a448872049034141697 | who: s-p102397481 | when: 2014-03-26T18:20:10 | xml:lang: fra
p: Si le journaliste sur @RTLFrance pouvait organiser la prise de parole, on comprendrait mieux le débat ....
trailer:

POST: xml:id: p_ts-a448771540658978816 | who: s-p102397481 | when: 2014-03-26T11:40:47 | xml:lang: fra
p: Réunion de travail @Les_Europeens sur notre projet. Une vraie chance pour notre pays. http://t.co/W39cSzcO2L
trailer:

POST: xml:id: p_ts-a448752194045878272 | who: s-p102397481 | when: 2014-03-26T10:23:55 | xml:lang: fra
p: RT @JeanMarieCAVADA: Nouvelle réunion de travail sur les européennes @Les_Europeens #EP2014 @Alternative_fr @desarnez @Chantal_Jouanno htt…
trailer:

POST: xml:id: p_ts-a448735470995116033 | who: s-p102397481 | when: 2014-03-26T09:17:28 | xml:lang: fra
p: Belle itw ce matin de @nk_m sur @Europe1 Tu as raison la dynamique est chez toi. Et nous a @UDI_off on t'M ;-)
trailer:

POST: xml:id: p_ts-a448731197095833600 | who: s-p102397481 | when: 2014-03-26T09:00:29 | xml:lang: eng
p: ;-) “@FabriqueSpinoza: @Chantal_Jouanno at @Brainpool_EU "NGOs like @FabriqueSpinoza useful to promote #BeyondGDP agenda across parties"”
trailer:

POST: xml:id: p_ts-a448391458123710464 | who: s-p102397481 | when: 2014-03-25T10:30:28 | xml:lang: fra
p: A l'@UDI_off au moins, il n'y a pas d'ambiguïté. http://t.co/cFDiQFlyBS
trailer:

POST: xml:id: p_ts-a448059449463558144 | who: s-p102397481 | when: 2014-03-24T12:31:11 | xml:lang: fra
p: RT @UDIjeunes: Vice-présidente de l’@UDI_off @Chantal_Jouanno salue un « bon résultat pour la droite » notamment de l’#UDI http://t.co/lQn…
trailer:

POST: xml:id: p_ts-a447861026042953729 | who: s-p102397481 | when: 2014-03-23T23:22:44 | xml:lang: fra
p: Nos candidats @UDI_off sont plébiscités : @ASantini_UDI @yvesjego @jclagarde @Herve_Morin @VigierPhilippe @SoniaLagarde ...;-)
trailer:

POST: xml:id: p_ts-a447832913342910465 | who: s-p102397481 | when: 2014-03-23T21:31:01 | xml:lang: fra
p: On marche sur la tête. On a le sentiment de faire des émissions sur la gloire du FN....
trailer:

Publisher(s)

ORTOLANG (Outils et Ressources pour un Traitement Optimisé de la LANGue), Nancy:France http://www.ortolang.fr
CoMeRe (Communication Médiée par les Réseaux) contact@comere.org http://comere.org

Date: 2014-05-02

Identifier(s)

uri: cmr-polititweets-c001-tei-v1
url: https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c001-tei-v1

Licence

http://creativecommons.org/licenses/by/4.0/

Rights holders of this corpus are: Julien Longhi ; Thierry Chanier

This corpus can be freely distributed and shared subject only to attribution. The way to reference / cite the corpus is given in the titleSmt

https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c001-tei-v1	https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c002-tei-v1
https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c003-tei-v1	https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c004-tei-v1
https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c005-tei-v1	https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c006-tei-v1
https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c007-tei-v1	https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-tei-v1-manuel.pdf

Polititweets : corpus de tweets provenant de comptes politiques influents 1

How to cite this resource

Overview of the corpus

Rationale for this corpus

Description of the Interaction Space

Extracts of Interactions

head:Tweets de Jean-Luc Mélenchon

head:Tweets de Chantal Jouanno

Composition of the corpus

Credits

Publication Statement and Rights