PENTAtrainer ---- A Praat script for extracting pitch targets from vocal signals (Version 1.1)

by Yi Xu and Santitham Prom-on

An interactive Praat script that allows you to:

Automatically extract pitch target parameters (slope, height, strength) based on qTA (Prom-on, Xu & Thipakorn, 2009)
Resynthesize F0 contours based on the extracted target parameters
Specify target location and restrict direction of target slope
Manually rectify vocal pulse markings for accurate f0 tracking
Exhaustively process all wav files in a folder
Perform the same f0 analysis as ProsodyPro
Collect extracted parameters of all sounds in a folder and save them in ensemble files

Original _ _ _ _ _Resynthesis

_ _

Explanation

This script is for automatic extraction of pitch target parameters. A pitch target is the ideal f0 trajectory associated with a segmental unit, which is defined by three parameters: slope, height and strength. The target notion is the core of the PENTA model (Parallel Encoding and Target Approximation, cf. Xu, 2005). The current script is based on the qTA implementation of PENTA (Prom-on, Xu & Thipakorn, 2009).

In qTA, a target is defined by the linear equation f0 = mt + b, where f0 is the surface f0, m is the slope of the target and b is the height of the target defined as the intercept of the target offset with the y-axis. The surface f0 is the outcome of sequential asymptotic approximation of successive pitch targets based on a critically damped 3rd-order linear system.

The extraction of target parameters in this script is done by analysis-by-synthesis. For each target interval, the script uses all possible combinations of the three parameters within the search range to generates f0 contours based on qTA, at a certain step size, and the difference between the synthesized and original contours is computed in terms of sum of squared errors (SSE). The parameter set with the least SSE is chosen as the target of the interval.

The target intervals are defined by user by marking its boundaries and entering a label in the top tier of the TextGrid. No targets are extracted from intervals with no labels.

The target search ranges can be restricted by user in a number of ways:

In the startup window, users can change the global search ranges defined by the maximum and minimum parameter values.
For each sound file, blank intervals in the target tier (2) are given full search ranges defined by the maximum and minimum parameter values.
Tier 2 intervals labeled as H, M, L, h, m or l are given a fixed 0 slope.
Tier 2 intervals labeled as R or r are searched only for positive slopes.
Tier 2 intervals labeled as F or f are searched only for negative slopes.

PENTAtrainer is useful not only for resynthesizing f0 contours of individual sentences, but also as a research tool. Here are some examples:

Determining pitch targets corresponding to specific communicative functions, e.g., lexical contrast marked by tone. Targets can be determined by extracting target parameters from many tokens of a functional unit, and the average values of the parameters can be considered as characteristic of the target (cf. Prom-on, Xu & Thipakorn, 2009).
Identifying contributions of different communications by varying functional specificity when averaging the target parameters, e.g., based on focus condition, position in sentence or phrase, or sentence type (statement vs. question), etc. (cf. Prom-on, Xu & Thipakorn, 2009).
Exploring what is the best target interval, e.g., voiced section, syllable, word, accent or phrase. Our initial testing shows that the syllable is the best target interval for English.
Testing hypotheses about pitch targets of a language. For example, to determine if a tone is high or rising, one may compare the rmse and correlation values of resticting target slope to either 0 or positive (reported in output files X.means and targets.txt).

Instructions

PENTAtrainer consists of PENTAtrainer.praat -- a Praat script, and learnqta.exe (learnqta for Mac) -- an executable called by the script. See Download
Put both files in the folder containing the sound files to be analyzed, and launch Praat;
Select Open Praat Script... from the top menu;
Locate PENTAtrainer.praat in the dialogue window and select it;
When the script window opens in Praat, select Run from the Run menu (or type the key shortcut command-r or control-r);
In the startup window, check or uncheck the boxes according to your need, and set appropriate values in the text fields or simply use the default values. Select the task by checking the appropriate radio button.
Click OK and three windows will appear. The first window (PointProcess) displays the waveform together with vocal cycle marks (vertical lines) generated by Praat. This is where you can manually add the missing marks and delete the redundant ones. You need to do this only for the named intervals, as explained next.
The second window (TextGrid) displays the waveform and spectrogram of the current sound together with optional pitch track and formant tracks in the spectrogram panel, and vocal pulse marks in the waveform panel. (These tracks and marks cannot be manually changed. So you can hide them to reduce processing time by using the corresponding menu.)
At the bottom of this window are three TextGrid tiers, where you can insert interval boundaries (Tier 1) and define search restrictions (Tier 2). For any interval that you want to have results saved, a label in Tier 1 is required. The label can be as simple as a, b, c or 1, 2, 3.
The third window (PENTAtrainer) displays pitch targets (grey straight lines) and synthesized f0 (red curve) against the original f0 (blue curve). When there are no labeled intervals, only the original f0 is displayed. After labeling the intervals, press "Replot" on the left side of the window and you will see both synthesized and original f0 contours. The green vertical lines indicate interval boundaries.
The PENTAtrainer window allows you to inspect the f0 contours in various ways: zooming in and out, scrolling left and right, and playing part or the whole of the original or resynthesized signal. The window also allows you to move to the next or previous sound file.
When you click "Next" or "Previous" in the PENTAtrainer window, the TextGrid and PointProcess windows will be refreshed, displaying the spectrogram, waveform and vocal cycle marks of the next sound. You can repeat this process until all the sounds in the folder are processed. Or you can finish any time by clicking "Exit".

Output

Each time you press "Next" in the PENTAtrainer window, various analysis results are saved for the current sound as text files (Red ones are directly relevant for target extraction):

X.rawf0 -- raw f0 with real time computed directly from the pulse markings
X.f0 -- smoothed f0 with the trimming algorithm (Xu, 1999)
X.samplef0 -- f0 values at fixed time intervals specified by "f0 sample rate"
X.timenormf0 -- time-normalized f0. The f0 in each interval is divided into the same number of points (default = 10).
X.actutimenormf0 -- time-normalized f0 with each interval divided into the same number of points (default = 10). But the time scale is the original, except that the onset time of interval 1 is set to 0, unless the "Set initial time to 0" box in the startup window is unchecked.
X.f0velocity -- velocity profile (instantaneous rates of F0 change) of f0 contour in semitone/s at fixed time intervals specified by "f0 sample rate"
X.means -- Containing the following values (in the order of the columns):
1. maxf0
2. minf0
3. excursion size
4. finalf0 -- Indicator of target height (taken at a point specified by "Final offset" in the startup window)
5. mean intensity
6. duration
7. max_velocity
8. final_velocity -- Indicator of target slope (taken also at a point earlier than the interval offset by time specified by "Final offset" in the startup window)
9. initialf0 -- Initial f0 of the first labeled interval (used in target search)
10. target_slope
11. target_height
12. strength
13. duration (in seconds)
14. rmse (root mean squared error between original and synthesized f0)
15. correlation (between original and synthesized f0)

If you want to change certain analysis parameters after processing all the sound files, you can rerun the script, set the "Input File No" to 1 in the startup window and check the button "Process all sounds without pause" before pressing "OK". The script will then run by itself and cycle through all the sound files in the folder one by one.

After the analysis of all the individual sound files are done, you can gather the analysis results into a number of ensemble files by running the script again and checking the button "Get ensemble results" in the startup window. The following ensemble files will be saved:

targets.txt
means.txt
normf0.txt
normactutime.txt
samplef0.txt
f0velocity.txt
maxf0.txt
minf0.txt
excursionsize.txt
meanf0.txt
duration.txt
maxvelocity.txt
finalf0.txt
finalvelocity.txt
meanintensity.txt

Note that you can generate the ensemble files only if you have analyzed at least one sound following the steps described earlier.

Need more help?

Detailed instructions can be also found at the beginning of the script. If you are still stuck, please contact me at yi.xu at ucl.ac.uk.

How to cite

Xu, Y. & Prom-on, S. (2010). PENTAtrainer.praat. Available from: http://crdo.fr/crdo000721