Computer-vision analysis reveals facial movements made during mandarin tone production align with pitch trajectories

Publication date: Available online 17 August 2019Source: Speech CommunicationAuthor(s): Saurabh Garg, Ghassan Hamarneh, Allard Jongman, Joan A. Sereno, Yue WangAbstractUsing computer-vision and image processing techniques, we aim to identify specific visual cues as induced by facial movements made during Mandarin tone production and examine how they are associated with each of the four Mandarin tones. Audio-video recordings of 20 native Mandarin speakers producing Mandarin words involving the vowel /3/ with each of the four tones were analyzed. Four facial points of interest were detected automatically: medial point of left eyebrow, nose tip (proxy for head movement), and midpoints of the upper and lower lips. The detected points were then automatically tracked in the subsequent video frames. Critical features such as the distance, velocity, and acceleration describing local facial movements with respect to the resting face of each speaker were extracted from the positional profiles of each tracked point. Analysis of variance and feature importance analysis based on random forest were performed to examine the significance of each feature for representing each tone and how well these features can individually and collectively characterize each tone. Results suggest alignments between articulatory movements and pitch trajectories, with downward or upward head and eyebrow movements following the dipping and rising tone trajectories respectively, lip closing movement being associ...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research