SPPAS Documentation

Brigitte Bigi

Version 1.7.5

Introduction

What is SPPAS?

Main features

SPPAS - Automatic Annotation of Speech is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France.

Available for free, with open source code, there is simply no other package for linguists to simple use in automatic segmentation of speech. SPPAS is daily developed with the aim to provide a robust and reliable software for the automatic annotation and for the exploitation of annotated-data.

This documentation will assume that you are using a relatively recent version of SPPAS.

There's no reason not to download the latest version whenever released: it's easy and fast!

Copyright and Licenses

SPPAS software is distributed under the terms of the GNU GENERAL PUBLIC LICENSE.

SPPAS resources are distributed:

under the terms of the GNU GENERAL PUBLIC LICENSE, or
on the terms of the "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License"

A copy of both licenses is available in the package. See the "Resources" chapter for details about the license of each proposed resource.

User engagement

By using SPPAS, you agree to cite references in your publications.

See the "References" section of this documentation and/or see PDF files included in the package.

Need help

When looking for more detail about some subject, one can search this documentation. This documentation is available in-line (see the SPPAS website), it is also included in the package (in PDF format) and it can also be explored with the Graphical User Interface by clicking on the 'Help' button.
Many problems can be solved by updating the version of SPPAS.
There is a SPPAS Users discussion group where queries and allied topics are discussed, with responses from colleagues or from the author. Topics can range from elementary "how do I" queries to advanced issues in scriptwriting. There's (or there will be) something there for everybody. It is recommended to sign up to become a member on the website: https://groups.google.com/forum/#!forum/sppas-users (neither spam or e-mails will be sent directly to members).
If none of the above helps, you may send e-mail to the author. It is very important to indicate clearly:

1/ your operating system and its version, 2/ the version of SPPAS (supposed to be the last one), and 3/ for automatic annotations, send the log file, and a sample of the data on which a problem occurs.

And/Or, if you have any question, if you want to contribute to SPPAS either to improve the quality of resources or to help in development, or anything else, do not hesitate to contact the author by e-mail at: brigitte.bigi@gmail.com.

Supports

2011-2012:

Partly supported by ANR OTIM project (Ref. Nr. ANR-08-BLAN-0239), Tools for Multimodal Information Processing.

Read more at: http://www.lpl-aix.fr/~otim/

2013-2015:

Partly supported by ORTOLANG (Ref. Nr. ANR-11-EQPX-0032) funded by the « Investissements d'Avenir » French Government program managed by the French National Research Agency (ANR).

2014-2015:

SPPAS is also partly carried out thanks to the support of the following projects or groups:

CoFee - Conversational Feedback http://cofee.hypotheses.org
Variamu - Variations in Action: a MUltilingual approach http://variamu.hypotheses.org
Team C3i of LPL http://www.lpl-aix.fr/~c3i

Contributors

Here is the list of contributors:

Since January 2011: Brigitte Bigi is the main author;
April 2012-June 2012: Alexandre Ranson;
April 2012-July 2012: Cazembé Henry;
April 2012-June 2013: Bastien Herbaut;
March 2013-March 2014: Tatsuya Watanabe;
April 2015-June 2015: Nicolas Chazeau;
April 2015-June 2015: Jibril Saffi.

Getting and installing

Websites

In the past, SPPAS - Automatic Annotation of Speech, was hosted by "Laboratoire Parole et Langage" (see http://www.lpl-aix.fr. SPPAS is hosted by Speech and Language Data Repository (SLDR), since January 2015, and is located at the following URL:

http://sldr.org/sldr000800/preview

The source code with recent stable releases is now migrated on github.

https://github.com/brigittebigi/

From this website, anyone can download the development version, contribute, send comments and/or declare an issue.

Dependencies

On the main website, you will find information about the software requirements. In fact, other programs are required for SPPAS to operate. Of course, they must be installed before using SPPAS, and only once. This operation takes from 5 to 15 minutes depending on the operating system. The following software are required:

Python, version 2.7.x
wxPython >= 3.0
julius >= 4.1

An installation guide is available on the website, depending on your operating system. Please, closely follow the instructions. Administrator rights are required to perform these installations.

Download and install SPPAS

The website lets to go to the Download Page to download a new version or Subscribe to the User's group.

SPPAS is ready to run, so it does not need elaborate installation, except for its dependencies (other software required for SPPAS to work properly). All you need to do is to copy the SPPAS package from the website to somewhere on your computer. Preferably, choose a location without spaces nor accentuated characters in the name of the path.

The SPPAS package is compressed and zipped, so you will need to decompress and unpack it once you've got it.

There is a unique version of SPPAS which does not depend on your operating system. The only obvious difference depending on the system is how it looks on the computer screen.

The SPPAS package

Unlike many other software, SPPAS is not what is called a "black box". Instead, everything is done so that users can check / change operation. It is particularly suitable for automatic annotations. It allows any user to adapt automatic annotations to its own needs.

The package of SPPAS is then a folder with content as files and sub-folders.

The SPPAS package contains:

the README.txt file, which aims to be read by users!
the files sppas.bat and sppas.command to execute the Graphical User Interface of SPPAS
the resources used by automatic annotations (lexicons, dictionaries, ...)
the samples are sets of annotations freely distributed to test SPPAS
the sppas directory contains the program itself
the documentation, which contains:
- the file CHANGES.txt is a Release History It shows an overview of the differences between the succeeding versions of SPPAS
- the copyright and a copy of the licenses
- the documentation in PDF
- the slides of the document SPPAS for Dummies
- the references sub-folder includes PDF files of some publications about SPPAS
- the solutions of the exercises proposed in the chapter "Scripting with Python and SPPAS"
- the etc directory is for internal use: never modify or remove it!

Update

SPPAS is constantly being improved and new packages are published frequently (about 10 versions a year). It is important to update regularly in order to get the latest functions and corrections.

Updating SPPAS is very (very very!) easy and fast:

Optionally, put the old package into the Trash,
Download and unpack the new version.

Capabilities

What SPPAS can do?

Here is the list of functionalities available to annotate automatically speech data and to analyse annotated files:

Automatic Annotations
- Momel/INTSINT: modelling melody
- IPUs segmentation: utterance level segmentation
- Tokenization: text normalization
- Phonetization: grapheme to phoneme conversion
- Alignment: phonetic segmentation
- Syllabification: group phonemes into syllables
- Repetitions: detect self-repetitions, and other-repetitions (not in the GUI).
Components
- IPUScribe: Manual orthographic transcription
- SndPlayer: Play sounds (mono wav) and display main information
- Statistics: Estimates/Save statistics on annotated files
- DataRoamer: Manipulate annotated files
- DataFilter: Extract data from annotated files
- SppasEdit: Display sound and annotated files (development version, unstable)
Plugins
- TierMapping-plugin: Create tier by mapping annotation labels
- MarsaTag-plugin: Use the POS-Tagger MarsaTag from SPPAS (French only)

How to use SPPAS?

There are three main ways to use SPPAS:

The Graphical User Interface (GUI) is as user-friendly as possible:
- double-click on the sppas.bat file, under Windows;
- double-click on the sppas.command file, under MacOS or Linux.
The Command-line User Interface (CLI), with a set of programs, each one essentially independent of the others, that can be run on its own at the level of the shell.
Scripting with Python and SPPAS provides the more powerful way.

Interoperability and compatibility

SPPAS is able to open/save files from the following software, with the expected extension:

Praat: TextGrid, PitchTier, IntensityTier
Elan: eaf
Sclite: ctm, stm
HTK: lab, mlf
Subtitles: srt, sub
Phonedit: mrk
Signaix: hz
Excel/OpenOffice/R: csv

It can also import and export data from:

Annotation Pro: antx

And it can also import data from:

ANVIL: anvil
Transcriber: trs

File formats SPPAS can open/save and import

Main and important recommendations

About files

There is a list of important things to keep in mind while using SPPAS. They are summarized as follows and detailed in chapters of this documentation:

Speech files:
- only wav, aiff and au audio files
- only mono (= one channel)
- frame rates preferably: 16000hz, 32000hz, 48000hz
- bit rate preferably: 16 bits
- good recording quality is expected. It is, of course, important to never convert from a compressed file (as mp3 for example).
Annotated files:
- UTF-8 encoding only
- it is recommended to NOT use accentuated characters in file names nor in the paths

About automatic annotations

The quality of the results for some annotations is highly influenced by the quality of the data the annotation takes in input (this is a politically correct way to say: Garbage in, garbage out!)

Annotations are based on the use of resources, that are freely available in the SPPAS package. The quality of the automatic annotations is also largely influenced by such resources. However... all the resources can be modified by any user!