SPPAS
Automatic Annotation of Speech
Brigitte Bigi - Laboratoire Parole et Langage - Aix-en-Provence - France

SPPAS Documentation

Brigitte Bigi

Version 1.7.5


References

Publications about SPPAS

PDF versions of the publications are available in the SPPAS package (folder documentation, sub-folder references).

By using SPPAS, you agree to cite one of these references.

Brigitte Bigi, Christine Meunier, Irina Nesterenko, Roxane Bertrand (2010). Automatic detection of syllable boundaries in spontaneous speech, Language Resource and Evaluation Conference, pages 3285-3292, La Valetta, Malte.

Summary: This paper presents the outline and performance of an automatic syllable boundary detection system. The syllabification of phonemes is performed with a rule-based system. The proposed phonemes, classes and rules are listed in an external configuration file of the tool.

Brigitte Bigi (2012). The SPPAS participation to Evalita 2011, Working Notes of EVALITA 2011, Rome (Italy), ISSN: 2240-5186.

Summary: EVALITA is an initiative devoted to the evaluation of Natural Language Processing and Speech tools for Italian2. In Evalita 2011 the "Forced Alignment on Spontaneous Speech" task was added. Training data is about 15 map task dialogues recorded by couples of speakers exhibiting a wide variety of Italian variants.

Brigitte Bigi (2012). SPPAS: a tool for the phonetic segmentations of Speech, The eight international conference on Language Resources and Evaluation, Istanbul (Turkey), pages 1748-1755, ISBN 978-2-9517408-7-7.

Summary: This paper is a detailed presentation of SPPAS. It presents all the features of the software: an overview, annotation steps, resources and the architecture of the tool.

Brigitte Bigi, Daniel Hirst (2012) SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody, Speech Prosody, Tongji University Press, ISBN 978-7-5608-4869-3, pages 19-22, Shanghai (China).

Summary: This paper is an overview of SPPAS annotation steps and resources.

Brigitte Bigi, Daniel Hirst (2013). What's new in SPPAS 1.5?, Tools ans Resources for the Analysis of Speech Prosody, Aix-en-Provence, France, pp. 62-65.

Summary: During Speech Prosody 2012, we presented SPPAS, SPeech Phonetization Alignment and Syllabification, a tool to automatically produce annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. SPPAS is open source software issued under the GNU Public License. SPPAS is multi-platform (Linux, MacOS and Windows) and it is specifically designed to be used directly by linguists in conjunction with other tools for the automatic analysis of speech prosody. This paper presents various improvements implemented since the previously described version.

Brigitte Bigi (2013). A phonetization approach for the forced-alignment task, 3rd Less-Resourced Languages workshop, 6th Language & Technology Conference, Poznan (Poland).

Dafydd Gibbon (2013). TGA: a web tool for Time Group Analysis, Tools ans Resources for the Analysis of Speech Prosody, Aix-en-Provence, France, pp. 66-69.

Summary: Speech timing analysis in linguistic phonetics often relies on annotated data in de facto standard formats, such as Praat TextGrids, and much of the analysis is still done largely by hand, with spreadsheets, or with specialised scripting (e.g. Praat scripting), or relies on cooperation with programmers. The TGA (Time Group Analyser) tool provides efficient ubiquitous web-based computational support for those without such computational facilities. The input module extracts a specified tier (e.g. phone, syllable, foot) from inputs in common formats; user-defined settings permit selection of sub-sequences such as inter-pausal groups, and duration difference thresholds. Tabular outputs provide descriptive statistics (including modified deviation models like PIM, PFD, nPVI, rPVI), linear regression, and novel structural information about duration patterns, including difference n-grams and Time Trees (temporal parse trees).

Brigitte Bigi (2014). Automatic Speech Segmentation of French: Corpus Adaptation. 2nd Asian Pacific Corpus Linguistics Conference, p. 32, Hong Kong.

Brigitte Bigi, Roxane Bertrand, Mathilde Guardiola (2014). Automatic detection of other-repetition occurrences: application to French conversational speech, 9th International conference on Language Resources and Evaluation (LREC), Reykjavik (Iceland), pages 2648-2652. ISBN: 978-2-9517408-8-4.

Summary: This paper investigates the discursive phenomenon called other-repetitions
(OR), particularly in the context of spontaneous French dialogues. It focuses on their automatic detection and characterization. A method is proposed to retrieve automatically OR: this detection is based on rules that are applied on the lexical material only. This automatic detection process has been used to label other-repetitions on 8 dialogues of CID - Corpus of Interactional Data. Evaluations performed on one speaker are good with a F1-measure of 0.85. Retrieved OR occurrences are then statistically described: number of words, distance, etc.

Brigitte Bigi, Tatsuya Watanabe, Laurent Prévot (2014). Representing Multimodal Linguistics Annotated Data, 9th International conference on Language Resources and Evaluation (LREC), Reykjavik (Iceland), pages 3386-3392. ISBN: 978-2-9517408-8-4.

Summary: The question of interoperability for linguistic annotated resources requires to cover different aspects. First, it requires a representation framework making it possible to compare, and potentially merge, different annotation schema. In this paper, a general description level representing the multimodal linguistic annotations is proposed. It focuses on time and data content representation: This paper reconsiders and enhances the current and generalized representation of annotations. An XML schema of such annotations is proposed. A Python API is also proposed. This framework is implemented in a multi-platform software and distributed under the terms of the GNU Public License.

Brigitte Bigi (2014). A Multilingual Text Normalization Approach, Human Language Technologies Challenges for Computer Science and Linguistics LNAI 8387, Springer, Heidelberg. ISBN: 978-3-319-14120-6. Pages 515-526.

Summary: The creation of text corpora requires a sequence of processing steps in order to constitute, normalize, and then to directly exploit it by a given application. This paper presents a generic approach for text normalization and concentrates on the aspects of methodology and linguistic engineering, which serve to develop a multi-purpose multilingual text corpus. This approach was applied on written texts of French, English, Spanish, Vietnamese, Khmer and Chinese and on speech transcriptions of French, English, Italian, Chinese and Taiwanese. It consists in splitting the text normalization problem in a set of minor sub-problems as language-independent as possible. A set of text corpus normalization tools with linked resources and a document structuring method are proposed and distributed under the terms of the GPL license.

Brigitte Bigi, Caterina Petrone, Leonardo Lancia (2014). Automatic Syllabification of Italian: adaptation from French. Laboratory Approaches to Romance Phonology VII, Aix-en-Provence (France).

SPPAS in research projects

MULTIPHONIA

SPPAS was used to annotate the MULTIPHONIA corpus (MULTImodal database of PHONetics teaching methods in classroom InterActions) created by Charlotte ALAZARD, Corine ASTESANO, Michel BILLIÈRES.

This database consists of audio-video classroom recording comparing two methods of phonetic correction (the "traditional" articulatory method, and the Verbo-Tonal Method). This database is composed of 96 hours of pronunciation classes with beginners and advanced students of French as a Foreign Language. Every class lasted approximatively 90 minutes. This multimodal database constitutes an important resource for Second Language Acquisition's researchers. Hence, MULTIPHONIA will be enriched at many different levels, to allow for segmental, prosodic, morphosyntaxic, syntaxic, lexical and gestural analyses of L2 speech.

Charlotte Alazard, Corine Astésano, Michel Billières MULTIPHONIA: a MULTImodal database of PHONetics teaching methods in classroom InterActions, Language Resources and Evaluation Conference, Istanbul (Turkey), May 2012.

MULTIPHONIA: http://www.sldr.org/sldr000780/en

Amennpro

SPPAS was used for the annotation of the French part of the AixOx corpus.

Download the AixOx corpus: http://www.sldr.fr/sldr000784/

Remark:
Some examples are located in the samples-fra directory (files F_F_*.*) and
in the samples-eng (files E_E*.*).

Evalita 2011: Italian phonetization and alignment

Evalita 2011 was the third evaluation campaign of Natural Language Processing and Speech tools for Italian, supported by the NLP working group of AI*IA (Associazione Italiana per l'Intelligenza Artificiale/Italian Association for Artificial Intelligence) and AISV (Associazione Italiana di Scienze della Voce/Italian Association of Speech Science).

SPPAS participated to the Forced Alignment on Spontaneous Speech for both tasks:

The corpus was a set of Dialogues, map-tasks:

Orthographic Transcription: Impact on Phonetization

SPPAS Phonetization was evaluated on a French Corpus (MARC-Fr). SPPAS Alignment was also evaluated on this corpus.

Results are reported in the following publications:

B. Bigi, P. Péri R. Bertrand (2012). Orthographic Transcription: Which Enrichment is required for Phonetization?, Language Resources and Evaluation Conference, Istanbul (Turkey), May 2012

Brigitte Bigi (2014). Automatic Speech Segmentation of French: Corpus Adaptation. 2nd Asian Pacific Corpus Linguistics Conference, p. 32, Hong Kong.

After registration, MARC-Fr can be freely downloaded at: http://www.sldr.fr/sldr000786/fr

Cofee: Conversational Feedback

In a conversation, feedback is mostly performed through short utterances produced by another participant than the main current speaker. These utterances are among the most frequent in conversational data. They are also considered as crucial communicative tools for achieving coordination in dialogue. They have been the topic of various descriptive studies and often given a central role in applications such as dialogue systems. Cofee project addresses this issue from a linguistic viewpoint and combines fine-grained corpus analyses of semi- controlled data with formal and statistical modelling.

Cofee is managed by Laurent Prévot: http://cofee.hypotheses.org/

Cofee corpora and the use of SPPAS on such corpora is presented in:

Jan Gorish, Corine Astésano, Ellen Gurman Bard, Brigitte Bigi, Laurent Prévot (2014). Aix Map Task corpus: The French multimodal corpus of task-oriented dialogue, 9th International conference on Language Resources and Evaluation (LREC), Reykjavik (Iceland).

Variamu: Variations in Action: a MUltilingual approach

Variamu is an international collaborative joint project co-funded by Amidex. The scientific object of the collaboration is the issue of language variation addressed from a comparative perspective. Speech and language knowledge supports a growing number of strategic domains such as Human Language Technologies (HLT), Language Learning (LL), and Clinical Linguistics (CL). A crucial issue to these domains is language variation, which can result from dysfunction, proficiency, dialectal specificities, communicative contexts or even inter-individual differences. Variation is often an important part of the question itself.

This network will be structured around 3 research axes, all centered on the variation concept:

  1. Language technologies,
  2. Linguistics and Phonetics,
  3. Speech and Language Pathologies.

SPPAS is mainly concerned by the first axe.

The first result of this project is the participation of SPPAS at the Evalita 2014 campaign, for the FACS task: Forced-Alignment on Children Speech.

The second result of this project is the support of Cantonese into SPPAS, thanks to a collaboration with Prof. Tan Lee of the University of Hong Kong.