Semi-parametric joint detection and estimation for speech enhancement based on minimum mean square error
Publication date: September 2018Source: Speech Communication, Volume 102Author(s): Van-Khanh Mai, Dominique Pastor, Abdeldjalil Aïssa-El-Bey, Raphaël Le BidanAbstractWe propose a novel estimator for estimating the amplitude of speech coefficients in the time-frequency domain. In order to avoid a phase spectrum estimator of complex coefficients when using the Fourier transform, we consider the discrete cosine transform (DCT). This estimator aims at minimizing the mean square error of the absolute values of the speech DCT coefficients. In order to take advantage of both parametric and non-parametric approaches, the propose...
Source: Speech Communication - July 11, 2018 Category: Speech-Language Pathology Source Type: research

On the Issues of Intra-Speaker Variability and Realism in Speech, Speaker, and Language Recognition Tasks
This study surveys several challenging domains in formulating effective solutions in realistic speech data, and in particular the notion of using naturalistic data to better reflect the potential effectiveness of new algorithms. Our main focus is on intra-speaker mismatch and speech variability issues due to (i) differences in noisy speech with and without Lombard effect and a communication factor, (ii) realistic field data in noisy and increased cognitive load conditions, (iii) speech variability introduced by whispered speech, and (iv) dialect identification using found data. Finally, we study speaker–environment and s...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Entrainment profiles: Comparison by gender, role, and feature set
Publication date: June 2018Source: Speech Communication, Volume 100Author(s): Uwe D. Reichel, Štefan Beňuš, Katalin MádyAbstractWe examine prosodic entrainment in cooperative game dialogs for new feature sets describing register, pitch accent shape, and rhythmic aspects of utterances. For these as well as for established features we present entrainment profiles to detect within- and across-dialog entrainment by the speakers’ gender and role in the game. It turned out, that feature sets undergo entrainment in different quantitative and qualitative ways, which can partly be attributed to their different functions. Furt...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Evaluation of Batvox 3.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)
Publication date: June 2018Source: Speech Communication, Volume 100Author(s): Cuiling Zhang, Chang TangAbstractThe present paper reports on an evaluation of Batvox 3.1 as part of the Speech Communication virtual special issue: Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01). We were interested in the effect of the amount of training data on the performance of the system. We therefore tested Batvox 3.1 using different sized sets of data randomly selected from the forensic_eval_01 training data: one known-speaker-condition recording...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

The impact of the Lombard effect on audio and visual speech recognition systems
Publication date: June 2018Source: Speech Communication, Volume 100Author(s): Ricard Marxer, Jon Barker, Najwa Alghamdi, Steve MaddockAbstractWhen producing speech in noisy backgrounds talkers reflexively adapt their speaking style in ways that increase speech-in-noise intelligibility. This adaptation, known as the Lombard effect, is likely to have an adverse effect on the performance of automatic speech recognition systems that have not been designed to anticipate it. However, previous studies of this impact have used very small amounts of data and recognition systems that lack modern adaptation strategies. This paper aim...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Automatic quantitative analysis of spontaneous aphasic speech
We describe our acoustic modeling method that sets a new recognition benchmark on AphasiaBank, a large-scale aphasic speech corpus. We propose a set of clinically-relevant quantitative measures that are shown to be highly robust to automatic transcription errors. Finally, we demonstrate that these measures can be used to accurately predict the revised Western Aphasia Battery (WAB-R) Aphasia Quotient (AQ) without the need for manual transcripts. The results and techniques presented in our work will help advance the state-of-the-art in aphasic speech processing and make ASR-based technology for aphasia treatment more feasibl...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Measuring communication difficulty through effortful speech production during conversation
This study describes the use of a novel conversation elicitation framework to collect fluent, dynamic conversational speech in simulated realistic acoustic environments of varying complexities. Our aim is to quantify speech modifications during conversation, which characterize effortful speech, as a function of the difficulty of the acoustic environment. We report speech production data at the acoustic-phonetic level (vocal level, mid-frequency emphasis, formant frequencies and formant bandwidths), as well as at higher levels of analysis including utterance duration and turn overlap durations. The sensitivity and test-rete...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Acoustic classification of Russian plain and palatalized sibilant fricatives: Spectral vs. cepstral measures
This study compares two methods for classifying voiceless sibilant fricatives forming a 4-way phonemic contrast found in Russian, but otherwise cross-linguistically rare. One method uses spectral measures, i.e. vowel formants, COG, duration and intensity of frication. The second method uses cepstral coefficients extracted from different regions inside fricatives and neighboring vowels. The corpus comprises 1,431 plain and palatalized fricatives from two places of articulation, produced by 10 speakers. Logistic regression was used to classify the productions of males and females together and separately. The productions of f...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: June 2018Source: Speech Communication, Volume 100Author(s): (Source: Speech Communication)
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Using language cluster models in hierarchical language identification
Publication date: June 2018Source: Speech Communication, Volume 100Author(s): Saad Irtza, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Haizhou LiAbstractHierarchical language identification systems can be employed to take advantage of similarities and disparities between languages to organize them into clusters and decompose the language identification problem into a tree of potentially simpler sub-problems of language group identifications. In this paper, a novel approach is proposed to incorporate knowledge of the language clusters into the front-ends of the classification systems employed in each node of a hierarchical...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Semi-Parametric Joint Detection and Estimation for Speech Enhancement based on Minimum Mean Square Error
Publication date: Available online 15 June 2018Source: Speech CommunicationAuthor(s): Van-Khanh Mai, Dominique Pastor, Abdeldjalil Aissa-El-Bey, Raphaël Le BidanAbstractWe propose a novel estimator for estimating the amplitude of speech coefficients in the time-frequency domain. In order to avoid a phase spectrum estimator of complex coefficients when using the Fourier transform, we consider the discrete cosine transform (DCT). This estimator aims at minimizing the mean square error of the absolute values of the speech DCT coefficients. In order to take advantage of both parametric and non-parametric approaches, the propo...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Non-intrusive codebook-based intelligibility prediction
Publication date: Available online 21 June 2018Source: Speech CommunicationAuthor(s): Charlotte Sørensen, Mathew Shaji Kavalekalam, Angeliki Xenaki, Jesper Bünsow Boldt, Mads Græsbøll ChristensenAbstractIn recent years, there has been an increasing interest in objective measures of speech intelligibility in the speech processing community. Important progress has been made in intrusive measures of intelligibility, where the Short-Time Objective Intelligibility (STOI) method has become the de facto standard. Online adaptation of signal processing in, for example, hearing aids, in accordance with the listening conditions,...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions
Publication date: Available online 25 June 2018Source: Speech CommunicationAuthor(s): Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper JensenAbstractSpeech intelligibility prediction methods have recently gained popularity in the speech processing community as supplements to time consuming and costly listening experiments. Such methods can be used to objectively quantify and compare the advantage of different speech enhancement algorithms, in a way that correlates well with actual speech intelligibility. One such method is the short-time objective intelligibility (STOI) measure. In a recent publication, we...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Prosodic Stress Detection for Fixed Stress Languages Using Formal Atom Decomposition and a Statistical Hidden Markov Hybrid
Publication date: Available online 28 June 2018Source: Speech CommunicationAuthor(s): György Szaszák, Máté Ákos Tündik, Branislav GerazovAbstractThe detection of prosodic events, prosodic stress, and speech segmentation based on prosody have received much attention in the research community in the past decades. Prosody is relevant for both main areas of speech technology, text-to-speech synthesis and automatic speech recognition and understanding, and is exploited increasingly: besides providing redundancy, prosody is recognized to carry information unavailable from other sources and also contributes to the naturalne...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research

Orthographic effects on the perception and production of L2 mandarin tones
Publication date: July 2018Source: Speech Communication, Volume 101Author(s): Peggy Pik Ki Mok, Albert Lee, Joanne Jingwen Li, Robert Bo XuAbstractRecent studies on orthographic effects on L2 phonology have typically investigated alphabetic writing systems and segmental contrasts with novice learners. The current study extends such investigation to compare orthographic effects of an opaque logographic system (Chinese characters) and a transparent schematic system (pinyin) on a suprasegmental feature (lexical tones) with experienced learners. A perception experiment of Mandarin tones by Cantonese L2 learners shows that piny...
Source: Speech Communication - July 5, 2018 Category: Speech-Language Pathology Source Type: research