Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space
Publication date: Available online 19 July 2018Source: Speech CommunicationAuthor(s): Yawen Xue, Yasuhiro Hamada, Masato AkagiAbstractThis paper proposes a rule-based voice conversion system for emotion which is capable of converting neutral speech to emotional speech using dimensional space (arousal and valence) to control the degree of emotion on a continuous scale. We propose an inverse three-layered model with acoustic features as output at the top layer, semantic primitives at the middle layer and emotion dimension as input at the bottom layer; an adaptive-based fuzzy inference system acts as connectors to extract the...
Source: Speech Communication - July 19, 2018 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): (Source: Speech Communication)
Source: Speech Communication - July 19, 2018 Category: Speech-Language Pathology Source Type: research

On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks
This study surveys several challenging domains in formulating effective solutions in realistic speech data, and in particular the notion of using naturalistic data to better reflect the potential effectiveness of new algorithms. Our main focus is on intra-speaker mismatch and speech variability issues due to (i) differences in noisy speech with and without Lombard effect and a communication factor, (ii) realistic field data in noisy and increased cognitive load conditions, (iii) speech variability introduced by whispered speech, and (iv) dialect identification using found data. Finally, we study speaker–environment and s...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Unsupervised visualization of Under-resourced speech prosody
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Moses Ekpenyong, Udoinyang Inyang, EmemObong UdohAbstractIn this paper, an unsupervised visualization framework for analyzing under-resourced speech prosody is proposed. An experiment was carried out for Ibibio–a Lower Cross Language of the New Benue Congo family, spoken in the Southeast coastal region of Nigeria, West Africa. The proposed methodology adopts machine learning, with semi-automated procedure for extracting prosodic features from a translated prosodically stable corpus ‘The Tiger and the Mouse’—a text corpus that demonstrates...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Speech enhancement in spectral envelop and details subspaces
In this study, we address this challenge through a combination strategy of spectral modulation decoupling and low-rank and sparsity oriented decomposition. Specifically, supervised low-rank and sparse decompositions with energy thresholding are developed in the spectral envelop subspace, In the spectral details subspace, an unsupervised robust principal component analysis is utilized to extract the fine structure. The validation results show that, compared with five speech enhancement algorithms, including MMSE-SPP, NMF-RPCA, RPCA, LARC and BNMF, the proposed algorithms achieves satisfactory performance on improving both p...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Non-intrusive codebook-based intelligibility prediction
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Charlotte Sørensen, Mathew Shaji Kavalekalam, Angeliki Xenaki, Jesper Bünsow Boldt, Mads Græsbøll ChristensenAbstractIn recent years, there has been an increasing interest in objective measures of speech intelligibility in the speech processing community. Important progress has been made in intrusive measures of intelligibility, where the Short-Time Objective Intelligibility (STOI) method has become the de facto standard. Online adaptation of signal processing in, for example, hearing aids, in accordance with the listening conditions, require...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Speaker models for monitoring Parkinson’s disease progression considering different communication channels and acoustic conditions
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): T. Arias-Vergara, J.C. Vásquez-Correa, J.R. Orozco-Arroyave, E. NöthAbstractSymptoms of Parkinson’s disease vary from patient to patient. Additionally, the progression of those symptoms also differs among patients. Most of the studies on the analysis of speech of people with Parkinson’s disease do not consider such an individual variation. This paper presents a methodology for the automatic and individual monitoring of speech disorders developed by PD patients. The neurological state and dysarthria level of the patients are evaluated. The p...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Maartje M.E. Hendrikse, Gerard Llorach, Giso Grimm, Volker HohmannAbstractRecent studies of hearing aid benefits indicate that head movement behavior influences performance. To systematically assess these effects, movement behavior must be measured in realistic communication conditions. For this, the use of virtual audiovisual environments with animated characters as visual stimuli has been proposed. It is unclear, however, how these animations influence the head- and eye-movement behavior of subjects. Here, two listening tasks were carried out w...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Automatic context window composition for distant speech recognition
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Mirco Ravanelli, Maurizio OmologoAbstractDistant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems. A key aspect behind the rapid rise and success of DNNs is their ability to better manage large time contexts. With this regard, asymmetric context windows that embed more past than future frames have been recently used with feed-forward neural networks. This context configuration turns out to be useful not only to address low-latency speech recognition, but also to...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Orthographic effects on the perception and production of L2 mandarin tones
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Peggy Pik Ki Mok, Albert Lee, Joanne Jingwen Li, Robert Bo XuAbstractRecent studies on orthographic effects on L2 phonology have typically investigated alphabetic writing systems and segmental contrasts with novice learners. The current study extends such investigation to compare orthographic effects of an opaque logographic system (Chinese characters) and a transparent schematic system (pinyin) on a suprasegmental feature (lexical tones) with experienced learners. A perception experiment of Mandarin tones by Cantonese L2 learners shows that piny...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Where has all the power gone? Energy production and loss in vocalization
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Ingo R. TitzeAbstractHuman voice production for speech is an inefficient process in terms of energy expended to produce acoustic output. A traditional measure of vocal efficiency relates acoustic power radiated from the mouth to aerodynamic power produced in the trachea. This efficiency ranges between 0.001% and 1.0% in speech-like vocalization. Simplified Navier–Stokes equations for non-steady compressible airflow from trachea to lips were used to calculate steady aerodynamic power, acoustic power, and combined total power at seven strategic l...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

The sound of im/politeness
Publication date: Available online 2 July 2018Source: Speech CommunicationAuthor(s): Jonathan Caballero, Nikos Vergis, Xiaoming Jiang, Marc D. PellAbstractUntil recently, research on im/politeness has primarily focused on the role of linguistic strategies while neglecting the contributions of prosody and acoustic cues for communicating politeness. Here, we analyzed a large set of recordings — verbal requests spoken in a direct manner (Lend me a nickel), preceded by the word “Please”, or in a conventionally-indirect manner (Can you) — which were known to convey polite or rude impressions on the listener. The pragmat...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Bone-Conducted Speech Enhancement Using Deep Denoising Autoencoder
In this study, we propose a novel deep-denoising autoencoder (DDAE) approach to bridge BCM and ACM in order to improve speech quality and intelligibility, and the current ASR could be employed directly without recreating a new system. Experimental results first demonstrated that the DDAE approach can effectively improve speech quality and intelligibility based on standardized evaluation metrics. Moreover, our proposed system can significantly improve the ASR performance with a notable 48.28% relative character error rate (CER) reduction (from 14.50% to 7.50%) under quiet conditions. In an actual noisy environment (sound pr...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Prosodic stress detection for fixed stress languages using formal atom decomposition and a statistical hidden Markov hybrid
Publication date: September 2018Source: Speech Communication, Volume 102Author(s): György Szaszák, Máté Ákos Tündik, Branislav GerazovAbstractThe detection of prosodic events, prosodic stress, and speech segmentation based on prosody have received much attention in the research community in the past decades. Prosody is relevant for both main areas of speech technology, text-to-speech synthesis and automatic speech recognition and understanding, and is exploited increasingly: besides providing redundancy, prosody is recognized to carry information unavailable from other sources and also contributes to the naturalness ...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions
Publication date: September 2018Source: Speech Communication, Volume 102Author(s): Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper JensenAbstractSpeech intelligibility prediction methods have recently gained popularity in the speech processing community as supplements to time consuming and costly listening experiments. Such methods can be used to objectively quantify and compare the advantage of different speech enhancement algorithms, in a way that correlates well with actual speech intelligibility. One such method is the short-time objective intelligibility (STOI) measure. In a recent publication, we pr...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research