Corpus Processing Protocol for the CLIMATE Repository and Term Lists Generated through Lexicometric Analysis This repository contains a reproducible protocol for converting PDF files into plain text files (.txt) using the pdftotext tool, integrated within an R Markdown script. Contents conversion_pdf.Rmd conversion_pdf.md nettoyage_corpus.Rmd nettoyage_corpus.md factorisation_listes_termes.Rmd factorisation_listes_termes.md Note: The .Rmd files contain the main documented script, while the .md files provide a readable output without requiring R. License These protocols are published under the Creative Commons CC-BY 4.0 license. Author Pauline Bureau, 2025. This protocol may be freely used and modified, provided that the author is credited.