Text Normalization using Memory Augmented Neural Networks
Publication date: Available online 28 February 2019Source: Speech CommunicationAuthor(s): Subhojeet Pramanik, Aman HussainAbstractWe perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM-based recurrent neural networks. By successfully reducing the frequency of such mistakes, we show that this novel architecture is ...
Source: Speech Communication - March 1, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: February 2019Source: Speech Communication, Volume 107Author(s): (Source: Speech Communication)
Source: Speech Communication - February 20, 2019 Category: Speech-Language Pathology Source Type: research

Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Publication date: Available online 20 February 2019Source: Speech CommunicationAuthor(s): Paria Dadvar, Masoud GeravanchizadehAbstractIn this paper, a robust binaural speech separation system based on deep neural network (DNN) is introduced. The proposed system has three main processing stages. In the spectral processing stage, the multiresolution cochleagram (MRCG) feature is extracted from the beamformed signal. In the spatial processing stage, a novel reliable spatial feature of smITD+smILD is obtained by soft missing data masking of binaural cues. In the final stage, a deep neural network takes the combined spectral an...
Source: Speech Communication - February 20, 2019 Category: Speech-Language Pathology Source Type: research

A network-modeling approach to investigating individual differences in articulatory-to-acoustic relationship strategies
This study represents an exploratory analysis of a novel method of investigating variation among individual speakers with respect to the articulatory strategies used to modify acoustic characteristics of their speech. Articulatory data (nasalization, tongue height, breathiness) and acoustic data (F1 frequency) related to the distinction of three nasal-oral vowel contrasts in French were co-registered. Data were collected first from four Southern French (FR) speakers and, subsequently, from nine naïve Australian English listeners who imitated the FR productions. Articulatory measurements were mapped to F1 measurements usin...
Source: Speech Communication - February 6, 2019 Category: Speech-Language Pathology Source Type: research

The relative contribution of computer assisted prosody training vs. instructor based prosody teaching in developing speaking skills by interpreter trainees: an experimental study
Publication date: Available online 2 February 2019Source: Speech CommunicationAuthor(s): Mahmood Yenkimaleki, Vincent J. van Heuven, Hossein MoradimokhlesAbstractThe present study investigates the relative contribution of computer assisted prosody training (CAPT) vs. instructor based prosody teaching (IBPT) on developing speaking skills by interpreter trainees. Three groups of student interpreters were formed. All were native speakers of Farsi who studied English translation and interpreting at the BA level at the University of Applied Sciences in Tehran, Iran. Participants were assigned to groups at random. No significant...
Source: Speech Communication - February 2, 2019 Category: Speech-Language Pathology Source Type: research

OPENGLOT – An open environment for the evaluation of glottal inverse filtering
Publication date: Available online 31 January 2019Source: Speech CommunicationAuthor(s): Paavo Alku, Tiina Murtola, Jarmo Malinen, Juha Kuortti, Brad Story, Manu Airaksinen, Mika Salmi, Erkki Vilkman, Ahmed GeneidAbstractGlottal inverse filtering (GIF) refers to technology to estimate the source of voiced speech, the glottal flow, from speech signals. When a new GIF algorithm is proposed, its accuracy needs to be evaluated. However, the evaluation of GIF is problematic because the ground truth, the real glottal volume velocity signal generated by the vocal folds, cannot be recorded non-invasively from natural speech. This ...
Source: Speech Communication - February 1, 2019 Category: Speech-Language Pathology Source Type: research

End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition
Publication date: Available online 30 January 2019Source: Speech CommunicationAuthor(s): Dimitri Palaz, Mathew Magimai-Doss, Ronan CollobertAbstractIn hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial step. This is typically achieved by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then training a classifier such as artifi...
Source: Speech Communication - January 31, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: January 2019Source: Speech Communication, Volume 106Author(s): (Source: Speech Communication)
Source: Speech Communication - January 17, 2019 Category: Speech-Language Pathology Source Type: research

Analysing patterns of right brain-hemisphere activity prior to speech articulation for identification of system-directed speech
In this study, we explore human brain activity prior to speech articulation alone and in combination with prosodic features to create models for off-talk prediction. The proposed EEG based models are a step towards improving response time in detecting system-directed speech in comparison with audio-based methods of detection, opening new possibilities for the integration of brain-computer interface techniques into interactive speech systems. (Source: Speech Communication)
Source: Speech Communication - January 11, 2019 Category: Speech-Language Pathology Source Type: research

Speech Recognition using Cepstral Articulatory Features
Publication date: Available online 10 January 2019Source: Speech CommunicationAuthor(s): Shamima Najnin, Bonny BanerjeeAbstractThough speech recognition has been widely investigated in the past decades, the role of articulation in recognition has received scant attention. Recognition accuracy increases when recognizers are trained with acoustic features in conjunction with articulatory ones. Traditionally, acoustic features are represented by mel-frequency cepstral coefficients (MFCCs) while articulatory features are represented by the locations or trajectories of the articulators. We propose the articulatory cepstral coef...
Source: Speech Communication - January 11, 2019 Category: Speech-Language Pathology Source Type: research

Phoneme Boundary Detection from Speech: A Rule Based Approach
Publication date: Available online 9 January 2019Source: Speech CommunicationAuthor(s): Pravin Bhaskar Ramteke, Shashidhar G. KoolagudiAbstractIn this paper, a novel approach has been proposed for the automatic segmentation of speech signal into phonemes. In a well spoken word, phonemes can be characterized by the changes observed in speech waveform. To get phoneme boundaries, the signal level properties of speech waveform i.e. changes in the waveform during transformation from one phoneme to the other are explored. The problem of phoneme level segmentation has been addressed in this work from two aspects 1. Segmentation o...
Source: Speech Communication - January 10, 2019 Category: Speech-Language Pathology Source Type: research

Prosodic Encoding of Focus in Hijazi Arabic
Publication date: Available online 24 December 2018Source: Speech CommunicationAuthor(s): Muhammad Swaileh Alzaidi, Yi Xu, Anqi XuAbstractThis paper presents findings of the first systematic acoustic analysis of focus prosody in Hijazi Arabic (HA), an under-researched Arabic dialect. A question-answer paradigm was used to elicit information and contrastive focus at different sentence locations in comparison with their neutral focus counterparts. Systematic acoustic analyses were performed to compare all the focus conditions, in terms of both continuous F0 trajectories and specific acoustic measurements. Results show that f...
Source: Speech Communication - December 25, 2018 Category: Speech-Language Pathology Source Type: research

Age estimation in foreign-accented speech by non-native speakers of English
Publication date: Available online 20 December 2018Source: Speech CommunicationAuthor(s): Dan Jiao, Vicky Watson, Sidney Gig-Jan Wong, Ksenia Gnevsheva, Jessie S. NixonAbstractListeners are able to very approximately estimate speakers’ ages, with a mean estimation error of around ten years. Interestingly, accuracy varies considerably, depending on a number of social aspects of both speaker and listener, including age, gender and native language or language variety. The present study considers the effects of four factors on age perception. It investigates whether there is a main effect of speakers’ native language (Arab...
Source: Speech Communication - December 20, 2018 Category: Speech-Language Pathology Source Type: research

Coding and decoding of messages in human speech communication: implications for machine recognition of speech
Publication date: Available online 15 December 2018Source: Speech CommunicationAuthor(s): Hynek HermanskyAbstractThis paper postulates that linguistic message in speech is coded redundantly in both the time and the frequency domains. Such redundant coding of the message in the signal evolved over millennia of human evolution so that relevant spectral and temporal properties of human hearing can be used to extract these messages in the presence of noise. This view of human speech suggests a particular architecture of an automatic recognition (ASR) system in which longer temporal segments of spectrally-smoothed temporal traj...
Source: Speech Communication - December 15, 2018 Category: Speech-Language Pathology Source Type: research

Efficient Two-stage Processing for Joint Sequence Model-based Thai Grapheme-to-Phoneme Conversion
Publication date: Available online 12 December 2018Source: Speech CommunicationAuthor(s): Anocha Rugchatjaroen, Sittipong Saychum, Sarawoot Kongyoung, Patcharika Chootrakool, Sawit Kasuriya, Chai WutiwiwatchaiAbstractThai grapheme-to-phoneme conversion (G2P) is a challenging task due to its difficulties found by many previous studies. This paper introduces a novel two-stage processing for Thai G2P. The first stage uses Conditional Random Fields (CRF) to segment input text into pseudo-syllable (PS) units, the smallest unit with pronunciation inconfusable. This first CRF simultaneously segments input text into PS units and p...
Source: Speech Communication - December 13, 2018 Category: Speech-Language Pathology Source Type: research