An annotation workflow

Information

Which annotations (in general)?

A very large number of dimensions have been annotated in the past on mono and multimodal corpora. To quote only a few, some frequent speech or language based annotations are speech transcript, segmentation into words, utterances, turns, or topical episodes, labeling of dialogue acts, and summaries; among video-based ones are gesture, posture, facial expression [...]. (Popescu-Belis, 2010)

Which annotations (in general)?

Which annotations (in this tutorial)?

In this tutorial, we will report on:

  1. IPUs segmentation
  2. Speech transcript (manual)
  3. Phonemes and words time-alignement
  4. Syllables segmentation
  5. Repetitions detection
  6. Morpho-syntax
  7. Momel and INTSINT
  8. Gestures (manual)

The annotation workflow

The annotation workflow

The main principle is...

Garbage in, Garbage out.

Record

Capturing and recording multimodal data

The capture of multimodal corpora requires complex settings such as instrumented lecture and meetings rooms, containing capture devices for each of the modalities that are intended to be recorded, but also, most challengingly, requiring hardware and software for digitizing and synchronizing the acquired signals. (Popescu-Belis, 2010)

IPUs Segmentation

IPUs Segmentation: definition

Example of IPUs segmentation: Silences are annotated with # and speech intervals are filled with ipu number
Example of IPUs segmentation: Silences are annotated with # and speech intervals are filled with ipu number

Orthographic Transcription

Orthographic Transcription

Orthographic Transcription

Speech may be annotated for:

 ⇒ Enriched Orthographic Transcription

Enriched Orthographic Transcription

Enriched Orthographic Transcription

Enriched Orthographic Transcription

Enriched Orthographic Transcription: convention

Train you first to transcribe and to use the annotation software!
Train you first to transcribe and to use the annotation software!

SPPAS transcription convention

Transcription example 1 (Conversational speech)

Transcription example 2 (Conversational speech)

Transcription example 3 (GrenelleII)

Orthographic Transcription... to sum up

The automatic systems must be adapted to deal with EOT

Phonemes/Tokens time-alignment

Phonemes and Tokens time-alignment

A problem divided into 3 sub-tasks:

  1. tokenization : text normalization, word segmentation
  2. phonetization : grapheme to phoneme conversion
  3. alignment : speech segmentation Time-alignment process

Tokenization

Tokenization is also known as "Text Normalization".


The main steps in SPPAS are:

Tokenization in SPPAS

Phonetization

Phonetization is also known as grapheme-phoneme conversion

Converting from written text into actual sounds, for any language, cause several problems that have their origins in the relative lack of correspondence between the spelling of the lexical items and their sound contents.

Phonetization in SPPAS

SPPAS implements: (Bigi 2013)

Convention: spaces separate words, dots separate phones and pipes separate phonetic variants
Convention: spaces separate words, dots separate phones and pipes separate phonetic variants

Impact of the Orthographic Transcription on automatic phonetization

Alignment

Time-alignment process
Time-alignment process

Manual alignment has been reported to take between 11 and 30 seconds per phoneme. (Leung and Zue, 1984)

How to perform Speech Segm. ?

  1. Many freely available tool boxes
    • HTK - Hidden Markov Model Toolkit
    • CMU Sphinx
    • Open Source Large Vocabulary CSR Engine Julius
  2. Wrappers for such tool boxes:
    • Prosodylab-Aligner: python+HTK
    • P2FA: python+HTK
  3. Web-services:
    • WebMAUS
    • Train&Align
  4. Packaged softwares
SPPAS (python+Julius), available for English, French, Italian, Spanish, Catalan, Polish, Japanese, Mandarin Chinese, Taiwanese, Cantonese

Alignment results in SPPAS

Results on vowels of French conversational speech
Results on vowels of French conversational speech

Syllables segmentation

Syllabification by SPPAS

(Bigi et al. 2010)

Syllabification by SPPAS

Repetitions detection

Repetitions

(Bigi et al. 2014)

Repetitions with SPPAS

Morpho-syntax

Morpho-syntax

Example of Morpho-syntax in CID

Example of time-aligned morpho-syntax on conversational speech
Example of time-aligned morpho-syntax on conversational speech

Momel and INTSINT

Momel and INTSINT

INTSINT

Example of Momel and INTSINT

Momel and INTSINT: software

(Hirst and Espesser, 1993)

Gestures

Gestures

(Tellier 2014)

Summary

 
  1. Introduction
  2. Softwares
  3. An annotation workflow
  4. Exploring
  5. Sharing
  6. References