Training of Reduced-Rank Linear Transformations for Multi-layer Polynomial Acoustic Features for Speech Recognition
Publication date: Available online 8 April 2019Source: Speech CommunicationAuthor(s): Muhammad Ali Tahir, Heyun Huang, Albert Zeyer, Ralf Schlüter, Hermann NeyAbstractThe use of higher-order polynomial acoustic features can improve the performance of automatic speech recognition (ASR). However, dimensionality of polynomial representation can be prohibitively large, making acoustic model training using polynomial features infeasible for large vocabulary ASR systems. This paper presents a multi-layer polynomial training framework for acoustic modeling, which recursively expands the acoustic features into their second-order ...
Source: Speech Communication - April 9, 2019 Category: Speech-Language Pathology Source Type: research

Dysarthric speech classification from coded telephone speech using glottal features
Publication date: Available online 8 April 2019Source: Speech CommunicationAuthor(s): N.P. Narendra, Paavo AlkuAbstractThis paper proposes a new dysarthric speech classification method from coded telephone speech using glottal features. The proposed method utilizes glottal features, which are efficiently estimated from coded telephone speech using a recently proposed deep neural net-based glottal inverse filtering method. Two sets of glottal features were considered: (1) time- and frequency-domain parameters and (2) parameters based on principal component analysis (PCA). In addition, acoustic features are extracted from co...
Source: Speech Communication - April 9, 2019 Category: Speech-Language Pathology Source Type: research

Speech-Driven Animation with Meaningful Behaviors
This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedd...
Source: Speech Communication - April 5, 2019 Category: Speech-Language Pathology Source Type: research

Speech Enhancement using ultrasonic doppler sonar
This study validated the use of ultrasonic doppler frequency shifts caused by facial movements for enhancing audio speech contaminated by high levels of acoustic noise. A 40kHz ultrasonic beam is incident to a speaker’s face. The received signals were first demodulated and converted to a spectral feature parameter. The spectral feature derived from the ultrasonic Doppler signal (UDS) was concatenated with spectral features from noisy speech, which were then used to estimate the magnitude of the spectrum of clean speech. A nonlinear regression approach was employed in this estimation where the relationship between audio-U...
Source: Speech Communication - April 4, 2019 Category: Speech-Language Pathology Source Type: research

Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model
This study presents a scheme for multilingual speech emotion recognition. Determining the emotion of speech in general relies upon specific training data, and a different target speaker or language may present significant challenges. In this regard, we first explore 215 acoustic features from emotional speech. Second, we carry out speaker normalization and feature selection to develop a shared standard acoustic parameter set for multiple languages. Third, we use a three-layer model composed of acoustic features, semantic primitives, and emotion dimensions to map acoustics into emotion dimensions. Finally, we classify the c...
Source: Speech Communication - April 4, 2019 Category: Speech-Language Pathology Source Type: research

Analysis of phonation onsets in vowel production, using information from glottal area and flow estimate
Publication date: Available online 1 April 2019Source: Speech CommunicationAuthor(s): Tiina Murtola, Jarmo Malinen, Ahmed Geneid, Paavo AlkuAbstractA multichannel dataset comprising high-speed videoendoscopy images, and electroglottography and free-field microphone signals, was used to investigate phonation onsets in vowel production. Use of the multichannel data enabled simultaneous analysis of the two main aspects of phonation, glottal area, extracted from the high-speed videoendoscopy images, and glottal flow, estimated from the microphone signal using glottal inverse filtering. Pulse-wise parameterization of the glotta...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

Speaker recognition using PCA-based feature transformation
Publication date: Available online 2 April 2019Source: Speech CommunicationAuthor(s): Ahmed Isam Ahmed, John Chiverton, David Ndzi, Victor BecerraAbstractThis paper introduces a Weighted-Correlation Principal Component Analysis (WCR-PCA) for efficient transformation of speech features in speaker recognition. A Recurrent Neural Network (RNN) technique is also introduced to perform the weighted PCA. The weights are taken as the log-likelihood values from a fitted Single Gaussian-Background Model (SG-BM). For speech features, we show that there are large differences between feature variances which makes covariance based PCA l...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

Output-based Speech Quality Assessment Using Autoencoder and Support Vector Regression
Publication date: Available online 2 April 2019Source: Speech CommunicationAuthor(s): Jing Wang, Yahui Shan, Xiang Xie, Jingming KuangAbstractThe output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping m...
Source: Speech Communication - April 2, 2019 Category: Speech-Language Pathology Source Type: research

New insights on the optimality of parameterized wiener filters for speech enhancement applications
Publication date: Available online 27 March 2019Source: Speech CommunicationAuthor(s): Rafael Attili Chiea, Márcio Holsbach Costa, Guillaume BarraultAbstractThis work presents a unified framework for defining a family of noise reduction techniques for speech enhancement applications. The proposed approach provides a unique theoretical foundation for some widely-applied soft and hard time-frequency masks, which encompasses the well-known Wiener filter and the heuristically-designed Binary mask. These techniques can now be considered as optimal solutions of the same minimization problem. The proposed cost function is define...
Source: Speech Communication - March 28, 2019 Category: Speech-Language Pathology Source Type: research

Low-rank and Sparse Subspace Modeling of Speech for DNN Based Acoustic Modeling
Publication date: Available online 26 March 2019Source: Speech CommunicationAuthor(s): Pranay Dighe, Afsaneh Asaei, Hervé BourlardAbstractTowards the goal of improving acoustic modeling for automatic speech recognition (ASR), this work investigates the modeling of senone subspaces in deep neural network (DNN) posteriors using low-rank and sparse modeling approaches. While DNN posteriors are typically very high-dimensional, recent studies have shown that the true class information is actually embedded in low-dimensional subspaces. Thus, a matrix of all posteriors belonging to a particular senone class is expected to have a...
Source: Speech Communication - March 27, 2019 Category: Speech-Language Pathology Source Type: research

Temporal envelope cues and simulations of cochlear implant signal processing
Publication date: Available online 21 March 2019Source: Speech CommunicationAuthor(s): Raymond L. GoldsworthyABSTRACTConventional signal processing implemented on clinical cochlear implant (CI) sound processors is based on envelope signals extracted from overlapping frequency regions. Conventional strategies do not encode temporal envelope or temporal fine-structure cues with high fidelity. In contrast, several research strategies have been developed recently to enhance the encoding of temporal envelope and fine-structure cues. The present study examines the salience of temporal envelope cues when encoded into vocoder repr...
Source: Speech Communication - March 22, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: April 2019Source: Speech Communication, Volume 108Author(s): (Source: Speech Communication)
Source: Speech Communication - March 22, 2019 Category: Speech-Language Pathology Source Type: research

Speech Reverberation Suppression for Time-Varying Environments Using Weighted Prediction Error Method With Time-Varying Autoregressive Model
Publication date: Available online 11 March 2019Source: Speech CommunicationAuthor(s): Mahdi Parchami, Hamidreza Amindavar, Wei-Ping ZhuAbstractIn this paper, a novel approach for the task of speech reverberation suppression in non-stationary (changing) acoustic environments is proposed. The suggested approach is based on the popular weighted prediction error (WPE) method, yet, instead of considering fixed reverberation prediction weights, our method takes into account the more generic time-varying autoregressive (TV-AR) model which allows dynamic estimation and updating for the prediction weights over time. We use an init...
Source: Speech Communication - March 11, 2019 Category: Speech-Language Pathology Source Type: research

Why listening in background noise is harder in a non-native language than in a native language: A review
Publication date: Available online 8 March 2019Source: Speech CommunicationAuthor(s): Odette Scharenborg, Marjolein van OsAbstractThere is ample evidence that recognising words in a non-native language is more difficult than in a native language, even for those with a high proficiency in the non-native language involved, and particularly in the presence of background noise. Why is this the case? To answer this question, this paper provides a systematic review of the literature on non-native spoken-word recognition in the presence of background noise, and posits an updated theory on the effect of background noise on native ...
Source: Speech Communication - March 9, 2019 Category: Speech-Language Pathology Source Type: research

Multiple Description Coding Technique to Improve the Robustness of ACELP Based Coders AMR-WB
Publication date: Available online 2 March 2019Source: Speech CommunicationAuthor(s): Hocine Chaouch, Fatiha Merazka, Philippe MarthonAbstractIn this paper, a concealment method based on multiple-description coding (MDC) is presented, to improve speech quality deterioration caused by packet loss for algebraic code-excited linear prediction (ACELP) based coders. We apply to the ITU-T G.722.2 coder, a packet loss concealment (PLC) technique, which uses packetization schemes based on MDC. This latter is used with two new designed modes, which are modes 5 and 6 (18,25 and 19,85 kbps, respectively). We introduce our new second-...
Source: Speech Communication - March 4, 2019 Category: Speech-Language Pathology Source Type: research