|
|
Audio/Visual Acquisition Environment |
In order to collect the articulatory and synchronous acoustic data, an automatic optotracking movement analyzer for 3D kinematics data acquisition called ELITE was used. ELITE (see Figure) is a fully automatic movement analyzer for 3D kinematics data acquisition.
.jpg)
This system ensures a high accuracy and minimum discomfort to the subject. In fact, only small, non obtrusive, passive markers of 2mm of diameter, realized by reflective paper, are attached onto the speaking subject's face as illustrated in Figure 2. The subjects are placed in the field of view of two CCD TV cameras at 1.5 meters from them. These cameras light up the markers by an infrared stroboscope, not visible in order to avoid any disturbance to the subject. ELITE is characterized by a two level architecture. The first level includes an interface to the environment and a fast processor for shape recognition (FPSR). The outputs of the TV cameras are sent at a frame rate of 100 Hz to the FPSR which provides for markers recognition based on a cross-correlation algorithm implemented in real-time by a pipe-lined parallel hardware. This algorithm allows the use of the system also in adverse lighting conditions, being able to discriminate between markers and reflexes of different shapes although brighter. Furthermore, since for each marker several pixels are recognized, the cross-correlation algorithm allows the computation of the weighted center of mass increasing the accuracy of the system up to 0.1mm for a 28x28x28 cm cube as a field of view. The coordinates of the recognized markers are sent to the second level which is constituted by a general purpose personal computer. This level provides for 3D coordinate reconstruction, starting from the 2D perspective projections, by means of a stereophotogrammetric procedure which allows a free positioning of the TV cameras. The 3D data coordinates are then used to calculate and evaluate the parameters described hereinafter.
Two different configurations have been adopted for articulatory data collection: the first one, specifically designed for the analysis of labial movements, considers a simple scheme with only 8 reflecting markers (bigger grey markers on Figure) while the second, adapted to the analysis of expressive and emotive speech, utilizes the full and complete set of 28 reflecting markers.
.jpg)
Position of the reflecting markers and of the reference planes for the articulatory movement data collection.
All the movements of the 8 or 28 markers, depending on the adopted acquisition pattern, are recorded and collected together with their velocity and acceleration simultaneously with the co-produced speech.
As for the analysis of the labial movements, the most common parameters selected to quantify the labial configuration modifications, as illustrated in the the Figure, are the following:
![]() |
| Speech signal and time evolution of some labial kinematic parameters ( LO, LR, ULP, LLP, UL, LL, ASYMX and ASYMY) associated with the sequence /'aba/ expressing disgust. |
• Lip Opening (LO), calculated as the distance between markers placed on the central points of the upper and lower lip vermillion borders ; this parameter correlates with the HIGH-LOW phonetic dimension.
• Lip Rounding (LR), corresponding to the distance between the left and right corners of the lips , which correlates with the ROUNDED-UNROUNDED phonetic dimension: negative values correspond to the lip spreading.
• Anterior/posterior movements (Protrusion) of Upper Lip and Lower Lip (ULP and LLP), calculated as the distance between the marker placed on the central points of either the upper and lower lip and the frontal plane D containing the line crossing the markers placed on the lobes of the ears and perpendicular to W plane . These parameters correlate with the feature PROTRUDED-RETRACTED: negative values quantify the lip retraction.
• Upper and Lower Lip vertical displacements (UL, LL), calculated as a distance between the markers placed on the central point of either upper and lower lip and the transversal plane W passing through the tip of the nose and the markers on the ear lobes . Hence, positive values correspond to a reduction of the displacement of the markers from the W plane. As told before, these parameters are normalized in relation to the lip resting position.
• Left and Right Corner horizontal displacements (LCX and RCX), calculated as the distance between the markers placed on the left and the right lip corner and the sagittal plane S passing through the tip of the nose and perpendicular to the W plane . (these parameters are not visualized in Fig. 3) .
• Left and Right Corner vertical displacements (LCY and RCY), calculated as the distance between the markers placed on the left and right lip corner and the transversal plane W , containing the line crossing the markers placed on the lobes of the ears and on the nose . (these parameters are not visualized in Fig. 3).
• The asymmetry parameters (ASYMX and ASYMY) were calculated as the difference between right and left corner position along the x (RCX-LCX) and y (RCY-LCY) axes. Both for ASYMX and ASYMY values different from zero indicate the presence of an asymmetry. Positive values for ASYMY mean that the right lip corner moves in an asymmetric higher position along the vertical axis than the left corner. Positive values for ASYMX indicate that the lips are displaced in a right asymmetrical way along the horizontal axis.
The speech signal, is recorded synchronously with the lip movements and is usually segmented and analyzed by means of a voice analysis software (PRAAT, ) which computes also intensity, duration, spectrograms, formants, pitch synchronous F0, and various voice quality parameters in the case of emotive and expressive speech.
For more information please contact :
| Piero Cosi |
Istituto di Scienze e Tecnologie della Cognizione - Sezione di
Padova "Fonetica e Dialettologia" CNR di Padova (e-mail: cosi@csrf.pd.cnr.it). |