Estimation of the glottal source from coded telephone speech using deep neural networks
Publication date: Available online 8 December 2018Source: Speech CommunicationAuthor(s): NP Narendra, Manu Airaksinen, Brad Story, Paavo AlkuAbstractEstimation of glottal source information can be performed non-invasively from speech by using glottal inverse filtering (GIF) methods. However, the existing GIF methods are sensitive even to slight distortions in speech signals under different realistic scenarios, for example, in coded telephone speech. Therefore, there is a need for robust GIF methods which could accurately estimate glottal flows from coded telephone speech. To address the issue of robust GIF, this paper prop...
Source: Speech Communication - December 8, 2018 Category: Speech-Language Pathology Source Type: research

Segmental contributions to cochlear implant speech perception
Publication date: Available online 3 December 2018Source: Speech CommunicationAuthor(s): Fei Chen, Yi HuAbstractThe present work assessed segmental contributions to speech perception by listeners who had been bilaterally fitted with cochlear implants (CIs). TIMIT sentences were edited to contain vowels (Vs) (replacing consonants with silence) or consonants (Cs) (replacing vowels with silence) and vowel-consonant (V-C) transitions, then presented to unilaterally or bilaterally fitted CI listeners for recognition. Experimental results showed that segmental interruption had a significant influence on CI speech perception. Vow...
Source: Speech Communication - December 4, 2018 Category: Speech-Language Pathology Source Type: research

Joint Dictionary Learning Using a New Optimization Method for Single-channel Blind Source Separation
Publication date: Available online 30 November 2018Source: Speech CommunicationAuthor(s): Linhui Sun, Keli Xie, Ting Gu, Jia ChenAbstractCross projection is often produced between sub-dictionaries when a mixed speech signal is represented over a joint dictionary in single-channel blind source separation (SCBSS), which leads to poor separation performance. To solve this problem, we introduce a new optimization function of joint dictionary learning for SCBSS, which trains the identity sub-dictionaries and common sub-dictionary simultaneously. The existence of a common sub-dictionary can effectively avoid one source signal be...
Source: Speech Communication - December 1, 2018 Category: Speech-Language Pathology Source Type: research

Voice Conversion with SI-DNN and KL Divergence Based Mapping without Parallel Training Data
Publication date: Available online 30 November 2018Source: Speech CommunicationAuthor(s): Feng-Long Xie, Frank K. Soong, Haifeng LiAbstractWe propose a Speaker Independent Deep Neural Net (SI-DNN) and Kullback- Leibler Divergence (KLD) based mapping approach to voice conversion without using parallel training data. The acoustic difference between source and target speakers is equalized with SI-DNN via its estimated output posteriors, which serve as a probabilistic mapping from acoustic input frames to the corresponding symbols in the phonetic space. KLD is chosen as an ideal distortion measure to find an appropriate mappin...
Source: Speech Communication - December 1, 2018 Category: Speech-Language Pathology Source Type: research

An Iterative Mask Estimation Approach to Deep Learning Based Multi-Channel Speech Recognition
Publication date: Available online 26 November 2018Source: Speech CommunicationAuthor(s): Yan-Hui Tu, Jun Du, Lei Sun, Feng Ma, Hai-Kun Wang, Jing-Dong Chen, Chin-Hui LeeAbstractWe propose a novel iterative mask estimation (IME) framework to improve the state-of-the-art complex Gaussian mixture model (CGMM)-based beamforming approach in an iterative manner by leveraging upon the complementary information obtained from different deep models. Although CGMM has been recently demonstrated to be quite effective for multi-channel, automation speech recognition (ASR) in operational scenarios, the corresponding mask estimation, ho...
Source: Speech Communication - November 27, 2018 Category: Speech-Language Pathology Source Type: research

DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters
In this study, we look at different performance measures to estimate the word error rates of simulated behind-the-ear hearing aid signals and detect the azimuth angle of the target source in 180-degree spatial scenes. These measures derive from phoneme posterior probabilities produced by a deep neural network acoustic model. However, the more complex the model is, the more computationally expensive it becomes to obtain these measures; therefore, we assess how the model size affects prediction performance. Our findings suggest smaller nets are suitable to predict error rates of more complex models reliably enough to be impl...
Source: Speech Communication - November 27, 2018 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: December 2018Source: Speech Communication, Volume 105Author(s): (Source: Speech Communication)
Source: Speech Communication - November 27, 2018 Category: Speech-Language Pathology Source Type: research

VoiceHome-2, an extended corpus for multichannel speech processing in real homes
We describe the corpus specifications and annotations and the data recorded so far, and we report baseline results. (Source: Speech Communication)
Source: Speech Communication - November 22, 2018 Category: Speech-Language Pathology Source Type: research

Enhanced Feature Network for Monaural Singing Voice Separation
Publication date: Available online 19 November 2018Source: Speech CommunicationAuthor(s): Weitao Yuan, Boxin He, Shengbei Wang, Jianming Wang, Masashi UnokiAbstractDeep Recurrent Neural Network (DRNN) based monaural singing voice separation (MSVS) methods have recently obtained impressive separation results. Most of DRNN based methods directly take the magnitude spectra of the mixture signal as the input feature, which has high dimensionality and contains redundant information. The DRNN based models, however, cannot extract the effective low-dimensional and de-redundant representations from the magnitude spectra. In this p...
Source: Speech Communication - November 21, 2018 Category: Speech-Language Pathology Source Type: research

Features and Results of a Speech Improvement Experiment on Hard of Hearing Children
Publication date: Available online 16 November 2018Source: Speech CommunicationAuthor(s): László Czap, Judit Mária Pintér, Erika Baksa-VargaAbstractIn this paper we present a two-year-long speech training experiment in which we have studied the extent of improvement owing to multimodal visual support as compared to traditional methods. The hypothesis that children having more severe hearing impairment benefit more from visual assistance was also tested. 30 children had extracurricular lessons with the visual support of the Speech Assistant (SA) system providing complex services. The control group (CG) – that was comp...
Source: Speech Communication - November 16, 2018 Category: Speech-Language Pathology Source Type: research

Improving Children’s Mismatched ASR using Structured Low-Rank Feature Projection
Publication date: Available online 8 November 2018Source: Speech CommunicationAuthor(s): S. Shahnawazuddin, Hemant K. Kathania, Abhishek Dey, Rohit SinhaAbstractThe work presented in this paper explores the issues in automatic speech recognition (ASR) of children’s speech on acoustic models trained on adults’ speech. In such contexts, due to a large acoustic mismatch between training and test data, highly degraded recognition rates are noted. Even with the use of vocal tract length normalization (VTLN), the mismatched case recognition performance is still much below that for the matched case. Our earlier studies have s...
Source: Speech Communication - November 9, 2018 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: November 2018Source: Speech Communication, Volume 104Author(s): (Source: Speech Communication)
Source: Speech Communication - November 8, 2018 Category: Speech-Language Pathology Source Type: research

Multi-domain adversarial training of neural network acoustic models for distant speech recognition
Publication date: Available online 3 November 2018Source: Speech CommunicationAuthor(s): Seyedmahdad Mirsamadi, John H.L. HansenAbstractBuilding deep neural network acoustic models directly based on far-field speech from multiple recording environments with different acoustic properties is an increasingly popular approach to address the problem of distant speech recognition. The currently common approach to building such multi-condition (multi-domain) models is to compile available data from all different environments into a single train set, discarding information regarding the specific environment to which each utterance...
Source: Speech Communication - November 3, 2018 Category: Speech-Language Pathology Source Type: research

Time-varying Sinusoidal Demodulation for Non-stationary Modeling of Speech
Publication date: Available online 1 November 2018Source: Speech CommunicationAuthor(s): Neeraj Kumar Sharma, Thippur V. SreenivasAbstractSpeech signals contain a fairly rich time-evolving spectral content. Accurate analysis of this time-evolving spectrum is an open challenge in signal processing. Towards this, we visit time-varying sinusoidal modeling of speech and propose an alternate model estimation approach. The estimation operates on the whole signal without any short-time analysis. The approach proceeds by extracting the fundamental frequency sinusoid (FFS) from speech signal. The instantaneous amplitude (IA) of the...
Source: Speech Communication - November 2, 2018 Category: Speech-Language Pathology Source Type: research

Binaural speech intelligibility through personal and non-personal HRTF via headphones, with added artificial noise and reverberation
Publication date: Available online 1 November 2018Source: Speech CommunicationAuthor(s): Felipe Orduña-Bustamante, A.L. Padilla-Ortiz, Edgar A. Torres-GallegosAbstractSubjective intelligibility tests were carried out by processing speech through personal and non-personal Head-Related Transfer Functions (HRTF) for azimuth angle θ=+30∘ (sound source to the right), presented through headphones, under simulated adverse listening conditions. Tests with noise disturbance were also conducted at azimuth angles of θ=0∘, 15° and 45°. Phonetically balanced bi-syllable words in Spanish, uttered by a mexican female speaker, we...
Source: Speech Communication - November 2, 2018 Category: Speech-Language Pathology Source Type: research