Audio-visual speech comprehension in noise with real and virtual speakers.
Publication date: Available online 20 November 2019Source: Speech CommunicationAuthor(s): Jens Nirme, Birgitta Sahlén, Viveka Lyberg Åhlander, Jonas Brännström, Magnus HaakeAbstractThis paper presents a study a 3D motion-capture animated ‘virtual speaker’ is compared to a video of a real speaker with regards to how it facilitates children's speech comprehension of narratives in background multitalker babble noise. As secondary measures, children self-assess the listening- and attentional effort demanded by the task and associates words describing positive or negative social traits to the speaker. The results show t...
Source: Speech Communication - November 21, 2019 Category: Speech-Language Pathology Source Type: research

Detection of glottal closure instant and glottal open region from speech signals using spectral flatness measure
Publication date: Available online 18 November 2019Source: Speech CommunicationAuthor(s): Sudarsana Reddy Kadiri, RaviShankar Prasad, B. YegnanarayanaAbstractThis paper proposes an approach using spectral flatness measure to detect the glottal closure instant (GCI) and the glottal open region (GOR) within each glottal cycle in voiced speech. The spectral flatness measure is derived from the instantaneous spectra obtained in the analysis of speech using single frequency filtering (SFF) and zero time windowing (ZTW) methods. The Hilbert envelope of the numerator of group delay (HNGD) spectrum at each instant of time is obtai...
Source: Speech Communication - November 19, 2019 Category: Speech-Language Pathology Source Type: research

Harmonic Beamformers for Speech Enhancement and Dereverberation in the Time Domain
Publication date: Available online 9 November 2019Source: Speech CommunicationAuthor(s): J.R. Jensen, S. Karimian-Azari, M.G. Christensen, J. BenestyAbstractThis paper presents a framework for parametric broadband beamforming that exploits the frequency-domain sparsity of voiced speech to achieve more noise reduction than traditional nonparametric broadband beamforming without introducing additional distortion. In this framework, the harmonic model is used to parametrize the signal of interest by a single parameter, the fundamental frequency, whereby both speech enhancement and derevereration can be performed. This framewo...
Source: Speech Communication - November 10, 2019 Category: Speech-Language Pathology Source Type: research

Golden speaker builder – An interactive tool for pronunciation training
We describe the overall system design, including the web application with its user interface, and the underlying speech analysis/synthesis algorithms. Next, we present results from a series of listening tests, which show that GSB is capable of synthesizing such golden-speaker voices. Finally, we present results from a user study in a language-instruction setting, which show that practising with GSB leads to improved fluency and comprehensibility. We suggest reasons for why learners improved as they did and recommendations for the next iteration of the training. (Source: Speech Communication)
Source: Speech Communication - November 10, 2019 Category: Speech-Language Pathology Source Type: research

A low-complexity permutation alignment method for frequency-domain blind source separation
Publication date: Available online 6 November 2019Source: Speech CommunicationAuthor(s): Fang Kang, Feiran Yang, Jun YangAbstractFrequency-domain blind source separation is an effective way to separate the signals from convolutive mixtures. The independence component analysis (ICA) is commonly employed to separate signals in each frequency bin, resulting in the well-known permutation problem. To resolve this problem, we present a low-complexity permutation alignment method based on the inter-frequency dependence of signal power ratio. A bin-wise permutation alignment is first carried out across all the frequency bins by me...
Source: Speech Communication - November 7, 2019 Category: Speech-Language Pathology Source Type: research

Speech Enhancement Using a Risk Estimation Approach
Publication date: Available online 6 November 2019Source: Speech CommunicationAuthor(s): Jishnu Sadasivan, Chandra Sekhar Seelamantula, Nagarjuna Reddy MurakaAbstractThe goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure (risk). Often, this results in an estimate that depends on the unknown clean signal or its statistics. Since access to such priors is limited or impractical, one has to rely on an estimate of the clean signal statistics. In this paper, we develop a risk estimation framework for speech enhancement, in which one optimiz...
Source: Speech Communication - November 6, 2019 Category: Speech-Language Pathology Source Type: research

Individual Differences in Acoustic-Prosodic Entrainment in Spoken Dialogue
Publication date: Available online 1 November 2019Source: Speech CommunicationAuthor(s): Andreas Weise, Sarah Ita Levitan, Julia Hirschberg, Rivka LevitanAbstractThe tendency of conversation partners to adjust to each other to become similar, known as entrainment, has been studied for many years. Several studies have linked differences in this behavior to gender, but with inconsistent results. We analyze individual differences in two forms of local, acoustic-prosodic entrainment in two large corpora between English and Chinese native speakers conversing in English. The few previous studies of the effect of non-nativeness o...
Source: Speech Communication - November 3, 2019 Category: Speech-Language Pathology Source Type: research

Mechanisms of Tone Sandhi Rule Application by Tonal and Non-tonal Non-native Speakers
This study is the first comprehensive acoustic study to examine the acquisition of two Mandarin tone sandhi rules: the third tone sandhi and the more phonetically motivated, half-third sandhi rule by both tonal (Cantonese) and non-tonal (American English) speakers using a Wug Test. Participants were asked to form disyllables from two monosyllabic morphemes. To test for the operation of the lexical versus the computation mechanisms in sandhi rule application, both real and various types of wug (nonsense) morphemes were included. Functional data analysis revealed that Cantonese and American speakers apply the two rules simil...
Source: Speech Communication - November 1, 2019 Category: Speech-Language Pathology Source Type: research

Golden Speaker Builder - An interactive tool for pronunciation training
We describe the overall system design, including the web application with its user interface, and the underlying speech analysis/synthesis algorithms. Next, we present results from a series of listening tests, which show that GSB is capable of synthesizing such golden-speaker voices. Finally, we present results from a user study in a language-instruction setting, which show that practising with GSB leads to improved fluency and comprehensibility. We suggest reasons for why learners improved as they did and recommendations for the next iteration of the training. (Source: Speech Communication)
Source: Speech Communication - November 1, 2019 Category: Speech-Language Pathology Source Type: research

Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect
Publication date: Available online 30 October 2019Source: Speech CommunicationAuthor(s): Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper JensenAbstractWhen speaking in presence of background noise, humans reflexively change their way of speaking in order to improve the intelligibility of their speech. This reflex is known as Lombard effect. Collecting speech in Lombard conditions is usually hard and costly. For this reason, speech enhancement systems are generally trained and evaluated on speech recorded in quiet to which noise is artificially added. Since these systems are often used in situations where Lom...
Source: Speech Communication - November 1, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: November 2019Source: Speech Communication, Volume 114Author(s): (Source: Speech Communication)
Source: Speech Communication - October 26, 2019 Category: Speech-Language Pathology Source Type: research

Speech Emotion Recognition Based on DNN-Decision Tree SVM Model
Publication date: Available online 19 October 2019Source: Speech CommunicationAuthor(s): Linhui Sun, Bo Zou, Sheng Fu, Jia Chen, Fu WangAbstractMotivated by the development of DNN technology, a speech emotion recognition method based on DNN-decision tree SVM model is proposed. The proposed method can not only excavate the deep emotion information of the speech signal, but also extract more distinctive emotion features from the easily confused emotions. In this method, the decision tree SVM structure is firstly constructed by computing the confusion degree of emotion, and then different DNN networks are trained for diverse ...
Source: Speech Communication - October 20, 2019 Category: Speech-Language Pathology Source Type: research

Automatic Depression Classification Based on Affective Read Sentences: Opportunities for Text-Dependent Analysis
Publication date: Available online 14 October 2019Source: Speech CommunicationAuthor(s): Brian Stasak, Julien Epps, Roland GoeckeAbstractIn the future, automatic speech-based analysis of mental health could become widely available to help augment conventional healthcare evaluation methods. For speech-based patient evaluations of this kind, protocol design is a key consideration. Read speech provides an advantage over other verbal modes (e.g. automatic, spontaneous) by providing a clinically stable and repeatable protocol. Further, text-dependent speech helps to reduce phonetic variability and delivers controllable linguist...
Source: Speech Communication - October 15, 2019 Category: Speech-Language Pathology Source Type: research

Perceptual motivation for rhotics as a class
Publication date: Available online 10 October 2019Source: Speech CommunicationAuthor(s): Phil J. Howson, Philip J. MonahanAbstractFinding phonetic correlates of rhotics as a natural class has been elusive, leading to the suggestion that any class-based relationship between different rhotic categories is purely phonological in nature. This paper examines native English speakers’ perception of three different non-native rhotics (i.e., /r ɻ ʀ/) compared to non-native sounds from four other manners of articulation (stops, nasals, fricatives, and laterals). The results revealed that speakers cannot reliably discriminate bet...
Source: Speech Communication - October 11, 2019 Category: Speech-Language Pathology Source Type: research

Nonlinear Kronecker Product Filtering for Multichannel Noise Reduction
We present a modified optimization criterion according to which the proposed filters may be derived, and compare their performances to conventional multichannel noise reduction filters. We show that the new approach is preferable, in particular when the input signal-to-noise ratio (SNR) is low or the number of sensors is small. (Source: Speech Communication)
Source: Speech Communication - October 4, 2019 Category: Speech-Language Pathology Source Type: research