-
SPPAS is a scientific computer software package written and maintained by Brigitte Bigi of the Laboratoire Parole et Langage, in Aix-en-Provence, France.
Operating systems:
-
GNU Public License, version 3
Web site: http://sldr.org/sldr00800/preview/
Update SPPAS regularly:
only wav
, aiff
and au
files
channels: 1 (mono)
sample width: 16 bits
frame rates: 16000 Hz
NEVER convert from a compressed file (mp3, ...)
Good recording quality is expected
Open speech file(s) with SndRoamer component for a diagnosis
UTF-8 encoding only
No accentuated characters in file names (nor in the path)
Supported file formats to open/save (software, extension):
Brigitte Bigi (2012). “SPPAS: a tool for the phonetic segmentations of Speech”, Language Resources and Evaluation Conference, ISBN 978-2-9517408-7-7, pages 1748-1755, Istanbul (Turkey).
Brigitte Bigi, Daniel Hirst (2012). “SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody”, Speech Prosody, Tongji University Press, ISBN 978-7-5608-4869-3, pages 19-22, Shanghai (China).
Automatic Annotations:
... and many other things!
IPUScribe: Manual transcription
SndRoamer: Play sound (mono wav)
Statistics: Estimates/Save statistics of tiers
DataRoamer: Manipulate annotated files
DataFilter: Select/Filter annotations of tiers
SppasEdit: Display wav and annotated files
TierMapping-plugin: Create tier by mapping annotations
MarsaTag-plugin: Use the POS-Tagger MarsaTag from SPPAS (French only)
Open the file explorer of your system
Go to the SPPAS folder location
sppas.bat
filesppas.command
fileClick on the 'Add File' button
Explore the samples
folder and choose as many wav files as expected
All files with the same name as the selected wav files will be added into the list
Click (and/or ctrl+click) on some files in this list
Choose what you want to do with your selection (a component, automatic annotations, plugin)
All the automatic annotations are based on language independent approaches
-
This means:
The process of taking the text transcription of an audio speech segment and determining where in time particular phonemes occur in the speech segment
-
-
Enriched Orthographic transcription:
It must includes:
Audio: mono wav file, 16KHz, 16 bits
Tokenization requires a list of words (lexicon)
To create/edit a lexicon:
Input example:
Et euh donc donc du coup c'est toi c'est un peu toi q(ui) a les premiers contacts avec le avec le gosse quoi + et puis là ils te demandent le prénom donc faut ce soit prêt là @ parce que putain.
Output:
et euh donc donc du coup c' est toi c'est un_peu toi qui a les premiers contacts avec le avec le gosse quoi + et puis là ils te demandent le prénom donc faut ce soit prêt là @ parce_que putain
Phonetization requires a pronunciation dictionary
To create/edit a dictionary:
In the phonetization output, by convention, spaces separate words, dots separate phones and pipes separate phonetic variants of a word. Example:
the flight
dh.ax|dh.ah|dh.iy f.l.ay.t
If a word is missing of the dictionary, SPPAS generates a pronunciation.
-
Each automatic annotation generates a file and...
Open such file(s) in the SppasEdit component, or Praat, or Elan, ...
-
Save/Export any file into any format (XRA, TextGrid, EAF, CSV) with one of the 'Export' buttons
You are now ready to test SPPAS with the proposed set of samples...
... and do not forget to read the documentation: it contains most of the answers to your questions!
-