************************************************
ACRONYMS
************************************************

- IGO = Intergovernmental Organization. In our file descriptions, "IGO" is used to refer to the various intergovernmental organizations involved in international climate governance (see below for the full list of institutions considered). Our corpus is composed of reports published by these organizations on the topic of climate change.

- MEDIA/Press = Used to designate data belonging to the journalistic corpus, described below.

- NGO = Non-Governmental Organization

- COP = Conference of the Parties

- CT = Candidate Term

- UC = Compound Unit

- US = Simple Unit

- FR = Frequency

- FR_rel = Relative Frequency

- FR_ex = Expected Frequency

***********************************************
FILE FORMAT
***********************************************

The statistical data are provided in two complementary formats:**

.csv (Comma-Separated Values):

Open, machine-readable format, recommended for reuse, automatic analysis, and interoperability.
Encoding used: UTF-8.
➤ This format is recommended for any reuse or computational processing.
.xlsx (Microsoft Excel):

Proprietary format, included as a supplement for easier human reading.
This format preserves certain presentation features (filters, colors, multiple sheets) that help with the initial understanding of the files.
Note: The content of the .csv and .xlsx files is identical, except for formatting.

***********************************************
HOW WERE THE DATA COLLECTED ?
***********************************************

All data were collected from the various corpora available on the storing deposit. For term extraction, we used the “specificities” function of the textometry software TXM and retained candidate terms with a specificity score higher than 3.09 (Drouin 2003). For the spreadsheets “IDFxspé_ONG,” “IDFxspé_OIG,” and “TGC ONGvs.OIG,” the TermoStat software (ibid.) was used to extract terminological units.

The reference corpus used to calculate specificity scores was an extract of the Corpus of Contemporary American English (COCA), containing over 12 million words and representing diverse genres (popular magazines, blogs and other web pages, fiction, news articles, subtitles from films and TV shows, and academic texts). This corpus was chosen because it was designed to be representative of general American English and because the selected extract is freely downloadable from the COCA website.

We then excluded candidate terms that did not meet the criteria described by Bureau (2023) and categorized the remaining units according to whether they belong to terminological layers 1, 2, or 3 (ibid.).

To extract co-occurrences, we used the dedicated function in TXM. The specificity and frequency thresholds for selecting co-occurrences were set to 2, and the window of words around the pivot term was set to 10 (words to the left and right). 


**********************************************************************
Spreadsheets: “UC_Experts,” “US_Experts,” “UC_Presse,” “US_Presse”
**********************************************************************


These spreadsheets provide a diachronic perspective on the terms used in climate expertise (represented by intergovernmental organisations – “OIG” corpus – and international NGOs – “ONG” corpus) and by the press (press corpus).

Terms were extracted from the diachronic sub-corpora (COP 15, COP 21, and COPs 25/26). The expert sub-corpora (OIG and ONG) were merged to allow observation of terminological variation over the represented period.

The columns “X2” and “X2 COP15-21” indicate the significance of frequency variation over the entire period and between COP15 and COP21, respectively. The column “FILTRE X2+FR” identifies terms where the X2 score over the full period exceeds 9.21 (indicating less than a 1% chance that the variation is due to corpus size differences) and where there are at least 30 occurrences across the corpus — a threshold used by Picton (2009: 116-117) to ensure the X2 score's reliability.

The column “RUPTURE FR” refers to terms that are absent from one or more sub-corpora of the diachronic corpus. This may indicate potential neologisms (green: emerged around COP21; blue: around COPs 25/26) or potentially obsolete units (pink: declined from COP21; purple: declined from COPs 25/26). These terms are absent in the older sub-corpora and therefore have zero frequency there (Picton 2009).


**********************************************
Spreadsheets "IDFxspé_ONG" and "IDFxspé_OIG"
**********************************************

These two spreadsheets present all the units extracted respectively from the ONG and OIG corpora and include:
1/ The inverse document frequency (IDF) score of the units as a percentage (scores were inverted so that higher scores correspond to more widespread units) – column: “IDF (out of 100)”.
2/ The specificity index (as a percentage) and frequency of units with a specificity score greater than 3.14 (the default specificity threshold in TermoStat) – column: “INDICE (out of 100)”.
3/ The product of the IDF score and the specificity index – column: “scoreIDFxspé”.
4/ The terminological layer of the extracted units based on Bureau (2023).

To obtain IDF scores, we divided our two sub-corpora into blocks of 500 sentences using R. This provided balanced excerpts (unlike segmenting based on initial report length) and prevented document size from skewing term presence or IDF scores. This yielded 221 sentence blocks for the ONG corpus and 174 for the OIG corpus. We then calculated IDF scores for all words using R and inverted the scores so that higher values indicate more widespread usage.


***************************
Spreadsheet TGCC_ONGvs.OIG"
***************************

This spreadsheet compares diastratic terminological variation between two communities recognized as experts in climate change: UN institutions and international NGOs, represented by the “OIG” and “ONG” corpora, respectively.

The data focus on the General Terminology of Climate Change (Bureau 2023, Drouin et al. 2018) used by each community. The extracted terms are both specific and widespread within their respective corpora.

In order to reflect these two parameters, we cross-referenced the specificity scores of the terms with an IDF-type dispersion index (see above). The terms with the highest "scoreIDFxspé" values are those that are both specific and widespread in each corpus. Only those with a specificity score greater than 3.14 were retained. Finally, each unit is assigned to a terminological layer (Bureau, in preparation), allowing for a qualitative interpretation of the types of terms that differ between these two expert communities. 


**********************************************
Spreadsheet "Circulation_Termes_Experts-Presse"
**********************************************

This spreadsheet aims to identify, among the units extracted from the different diachronic “Press” sub-corpora, those that also appear in the entire expert corpus (highlighted in green in the “Termes_Experts” column). It can thus help account for the circulation of terms between experts and the press, and thereby the diffusion of knowledge from the former to the latter.

The “Experts” term list contains 2,013 distinct units if terminological layer 3 is included (Bureau 2023: in preparation), and 1,939 if only terms from layers 1 and 2 are considered, which have a higher degree of termicity (Humbley 2018).

We rely on Excel formulas to automate this process, which allows us to highlight in green the terms from the expert list that are actually picked up by the media and thus easily identify them. The units highlighted in orange in the various “MEDIA-[COP]” columns are specific to the press: they do not appear in the list of terms representing climate expertise.

********************************************************
Spreadsheets "COOC_ONGvs.OIG" et "COOC_Presse_OIG_ONG"
********************************************************

These spreadsheets compare the main lemmatized co-occurrents of several key climate change domain terms across different corpora. The “COOC_ONGvs.OIG” spreadsheet compares the most specific co-occurrents in the “NGO” corpus to those in the “IGO” corpus. The “COOC_Presse_OIG_ONG” spreadsheet compares the most specific co-occurrents in the “Press” corpus to those in the “IGO” and “NGO” corpora, respectively.

For co-occurrence extraction, we used the dedicated function in TXM (a textometry software). The thresholds for specificity and frequency were set at 2, and the word window to the left and right of the pivot term was set at 10. We also selected “lemma” as the criterion for the query.

We then cleaned the top of the extracted co-occurrence lists (sorted in descending order by specificity score), since we only wanted to analyze the 30 most specific co-occurrents in each corpus. We therefore removed prepositions, articles, and punctuation marks.

In these spreadsheets, the “CoFrequence” column indicates how many times the co-occurrence appears with the pivot term in the respective corpus, while the “Index” column indicates the specificity score of the co-occurrence in that corpus.


****************************
Spreadsheet « Cosine_ONG_OIG »
****************************

The data in this spreadsheet corresponds to cosine similarity scores between the “IGO” and “NGO” versions of the most specific and well-distributed terms in climate change discourse produced by these two communities.

The scores were obtained using the word2vec_similarity() function from the “word2vec” package in R, after extracting word embeddings from each of the two corpora representing one of these two communities (the “IGO” and “NGO” corpora from the “climate-discourse” folder).

The “rank” column indicates the ranking of the term based on its cosine similarity score compared to all other terms in the list: for example, a rank of 2 for the term temperature rise means that the IGO version of this term is the second closest semantic neighbor to its NGO version.


************
References
************

Bureau, Pauline. 2023. « Variation terminologique et néologie dans le domaine du changement climatique ». PhD thesis in applied linguistics, University of Grenoble Alpes. 

Drouin, Patrick. 2003. « Term Extraction Using Non-Technical Corpora as a Point of Leverage ». Terminology 9/ 1, 99–117.

Drouin, Patrick, Marie-Claude L’Homme & Benoît Robichaud. 2018. « Lexical Profiling of Environmental Corpora ». Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) [Online]. Miyazaki (Japon) : European Language Resources Association (ELRA), 3419–25. URL : <https://aclanthology.org/L18-1539.pdf>.