Improving Children’s Mismatched ASR using Structured Low-Rank Feature Projection

Publication date: Available online 8 November 2018Source: Speech CommunicationAuthor(s): S. Shahnawazuddin, Hemant K. Kathania, Abhishek Dey, Rohit SinhaAbstractThe work presented in this paper explores the issues in automatic speech recognition (ASR) of children’s speech on acoustic models trained on adults’ speech. In such contexts, due to a large acoustic mismatch between training and test data, highly degraded recognition rates are noted. Even with the use of vocal tract length normalization (VTLN), the mismatched case recognition performance is still much below that for the matched case. Our earlier studies have shown that, for commonly used mel-filterbank-based cepstral features, the acoustic mismatch is exacerbated by insufficient smoothing of pitch harmonics for child speakers. To address this problem, a structured low-rank projection of the features vectors prior to learning the acoustic models as well as before decoding is proposed in this paper. To accomplish this, first a low-rank transform is learned on the training data (adults’ speech). Any dimensionality reduction technique which depends on the variance of the training data may be used for this purpose. In this work, principal component analysis and heteroscedastic linear discriminant analysis have been explored for the same. When the derived low-rank projection is applied in the mismatched testing case, it alleviates the pitch-dependent mismatch. The proposed approach provides a relative recognition per...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research