Polititweets : corpus de tweets provenant de comptes politiques influents 1
This page: https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c001-tei-v1
Back to corpus: https://hdl.handle.net/11403/comere/cmr-polititweets
This form has been automatically extracted from the TEI file. For the full contents, see https://hdl.handle.net/11403/comere/cmr-polititweets/cmr-polititweets-c001-tei-v1.xml.
Keywords : applied_linguistics
Longhi J.(2013). "Essai de caractérisation du tweet politique", L’Information grammaticale, n°136, p.25-32
>The initial aims of the researchers collecting these data was to be equiped with a corpus that would permit a research centred on the political vocabulary, from analyses of observables coming from the new communication methods. The document
1) we started with 7 personalities of 6 different French political groups : JLMelenchon, Bayrou, Copé, Fillon, Lepen, Ayraut, Cohn-Bendit 2) we gathered on all the lists quotations mentioning them => 7087 lists 3) we selected among these lists, the ones that had at least 6 user accounts / twittos and who contained the chain of characters *politic* in the name or description of the list => 120 listss (11K lignes) 4) On these 120 lists, we selected 2934 messages / tweets ; 5) to be sure to select only political twittos (and not journalistic...), we work by levels. By selecting only the accounts quoted on more than 12 lists, we obtain 205 political twittos. On the 205 accounts, we recovered the 200 last tweets of every person at the date of 27 March 2014, that is 34273 tweets. This has permitted to obtain a corpus centered on the period between two ballots of the local elections 2014, or, for the accounts that were less actives, the consideration of these eletions, or the previous ones (because, according to the density of the publication of tweets, the temporality of each account will be different : the oldest one is dated 2009-03-04 11:59:49).
This corpus is a subpart of the CoMeRe corpus databank. The CoMeRe (Communication Médiée par les Réseaux) project aims to build a kernel corpus assembling existing corpora of different CMC (Computer-Mediated Communication) genres and new corpora build on data extracted from the Internet. These heterogenous corpora will be structured and processed in a uniform way, complemented with metadata. CoMeRe will be released as OpenData through the national infrastructure Ortolang, following constraints which will be reused for the forthcoming “Corpus de Référence du Français”. Project supported by the national consortium Corpus-écrits, sub-part of Huma-Num, and Ortolang (French correspondant to DARIAH).
The TEI structure used is an extension of TEI for CMC genres. This extension is developped by a European project which participants are : Michael Beißwenger (DE), Thierry Chanier (FR), Isabella Chiari (IT), Maria Ermakova (DE), Maarten van Gompel (NL), Iris Hendrickx (NL), Axel Herold (DE), Henk van den Heuvel (NL), Lothar Lemnitzer (DE), Angelika Storrer (DE).
Structure of interactions
text: each text correspond to the set of tweets coming from the same Twitter account
post: one post corresponds to one tweet.
Data CollectionData collected : From 2009-03-04 to 2014-03-27
Types of interactionchannel: mode: w
Participants (extract)The list or participants, i.e. twittos is given in sourceDesc is
Rights holders of this corpus are: Julien Longhi ; Thierry Chanier
This corpus can be freely distributed and shared subject only to
attribution. The way to reference / cite the corpus is given in the