This page: http://hdl.handle.net/11403/comere/cmr-wikiconflits
|
Open Resources and TOols for LANGuage |
Corpus Wikiconflits : Conflits dans le Wikipédia francophone(cmr-wikiconflits-tei-v1) |
Poudat,C., Grabar , N. Kun, J. & Paloque-Berges, C. (2015). Corpus wikiconflits : Conflits dans le Wikipédia francophone. In Chanier T. (ed) Banque de corpus CoMeRe. Ortolang.fr : Nancy. [ cmr-wikiconflits-tei-v1 ; http://hdl.handle.net/11403/comere/cmr-wikiconflits ]
The corpus "Wikiconflits (cmr-wikiconflits-tei-v1) : Conflits dans le Wikipédia francophone" gathers conflictual discussions around a set of 7 (pseudo-)scientific topics: "Quotient Intellectuel","Igor et Grichka Bogdanoff", "Organismes génétiquement modifiés", "Chiropratique", "Histoire de la Logique", "Eolienne", "Psychanalyse" (see cmr-wikiconflits-tei-v4.1-manuel.pdf in the references for selection criteria). For each topic: 1) versions of the article have been trasnformed into TEI; 2) talk / discussions pages have been reorganized (taking into account archived discussions), alongside pages related to conflicts and neutral points of view, all formatted into TEI-CMC; 3) history pages have also been extracted as-is in HTML Wikipedia formats, as well as pages and talk pages of the more important contributors (left in wikicode format). In the talk pages, every contribution (TEI-CMC post element) has been automatically circumscribed and identified ; some errors of identification (due to authors who did not follow Wikipedia instructions) have been manually corrected. Around this 7 sets of data, pages and talk pages of the more important contributors (left in wikicode format) are also included. Lastly a unique TEI file lists all (3971) authors / contributors to the 7 topics.
Wikiconflits corpus has been created by the CoMeRe project, which aims to gather different corpora that represent the forms of communication in French on different networks (Internet, phone, etc.), all structured and informed in the same way, diffused in open access formats for research purposes. The CoMeRe projet has received the support of ORTOLANG (the French equivalent of DARIAH) and of the national consortium Written-Corpus ('Corpus-écrits') http://corpusecrits.corpus-ir.fr", subsection of Huma-Num.
Keywords: Wikipedia discussions and conflicts; Computer Mediated Communication; CMC;
In order to download Wikiconflits corpus, please acces each one of the 7 following topics and, for each one, download the corresponding sub-corpus Zip
This corpus can be freely distributed and shared subject only to attribution
and share-alike ( Wikipedia.fr recommendation for the licence). The way to reference / cite
the corpus is given in the bibliographicCitation
Rights holders of this corpus are:
Céline Poudat ; Thierry Chanier
http://creativecommons.org/licenses/by-sa/3.0/