Prosodic stress detection for fixed stress languages using formal atom decomposition and a statistical hidden Markov hybrid

Publication date: September 2018Source: Speech Communication, Volume 102Author(s): György Szaszák, Máté Ákos Tündik, Branislav GerazovAbstractThe detection of prosodic events, prosodic stress, and speech segmentation based on prosody have received much attention in the research community in the past decades. Prosody is relevant for both main areas of speech technology, text-to-speech synthesis and automatic speech recognition and understanding, and is exploited increasingly: besides providing redundancy, prosody is recognized to carry information unavailable from other sources and also contributes to the naturalness of the perceived speech. This paper addresses a recently proposed intonation analysis technique, called Weighted Correlation based Atom Decomposition (WCAD). The WCAD approach is inspired by the physiology of speech production and the Fujisaki-model used in speech synthesis, however, it is employed in an analytic, and not in a generative approach: the intonation contour is decomposed into a set of elementary components, called atoms, by a pattern matching algorithm. The obtained atom decomposition is used for prosodic stress detection and automatic phonological phrasing. We compare and also combine the WCAD approach to a phonological approach, which relies on automatic segmentation for phonological phrases using a Gaussian Mixture Model (GMM) / Hidden Markov Model (HMM) model and Viterbi-alignment. Results show comparable performance of the physiologically i...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research