Towards Automatic Assessment of Spontaneous Spoken English
Publication date: Available online 5 September 2018Source: Speech CommunicationAuthor(s): Y. Wang, M.J.F. Gales, K.M. Knill, K. Kyriakopoulos, A. Malinin, R.C. van Dalen, M. RashidAbstractWith increasing global demand for learning English as a second language, there has been considerable interest in methods of automatic assessment of spoken language proficiency for use in interactive electronic learning tools as well as for grading candidates for formal qualifications. This paper presents an automatic system to address the assessment of spontaneous spoken language. Prompts or questions requiring spontaneous speech response...
Source: Speech Communication - September 6, 2018 Category: Speech-Language Pathology Source Type: research

Cross-lingual Adaptation of a CTC-based multilingual Acoustic Model
Publication date: Available online 4 September 2018Source: Speech CommunicationAuthor(s): Sibo Tong, Philip N. Garner, Hervé BourlardAbstractMultilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels.We investigate multilingual CTC training in the ...
Source: Speech Communication - September 5, 2018 Category: Speech-Language Pathology Source Type: research

Re-ranking Spoken Term Detection with Acoustic Exemplars of Keywords
Publication date: Available online 5 September 2018Source: Speech CommunicationAuthor(s): Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou LiAbstractSpoken term detection (STD) systems rank hypothesized detections by scores, which indicate how confident a hypothesized detection is a true instance of the keyword. Many STD systems rely on automatic speech recognition (ASR) to transcribe the speech content into the lattice representation. In such STD systems, the detection scores are usually estimated as the posterior probabilities of the keyword in the decoding lattices. Such scores may be inaccur...
Source: Speech Communication - September 5, 2018 Category: Speech-Language Pathology Source Type: research

Case Study of Brazilian Portuguese Laterals using a Novel Articulatory-Acoustic Methodology with 3D/4D Ultrasound
Publication date: Available online 23 August 2018Source: Speech CommunicationAuthor(s): Sherman Charles, Steven M. LulichAbstractThe focus of this case study is the articulation and the acoustics of laterals in Brazilian Portuguese. The study probes 1) the status of velarization in coronal laterals, 2) the articulation of palatal laterals in contrast with the palatal glide and coronal laterals with secondary palatalization, and 3) whether Brazilian Portuguese has 1, 2, or 3 lateral phonemes. As a case study, the findings cannot be generalized. Nevertheless, they generate predictions and demonstrate the capacity of the meth...
Source: Speech Communication - August 24, 2018 Category: Speech-Language Pathology Source Type: research

Comparison of spectral tilt measures for sentence prominence in speech—Effects of dimensionality and adverse noise conditions
Publication date: October 2018Source: Speech Communication, Volume 103Author(s): Sofoklis Kakouros, Okko Räsänen, Paavo AlkuAbstractLinguistic prominence in speech is known to correlate with the acoustic measures of energy, F0, and duration. In contrast, the role of spectral tilt in the realization of prominence has remained more inconsistent between previous empirical investigations. This may be partially due to the lack of a standard method for quantifying spectral tilt or due to difficulties in estimating the acoustical source of spectral tilt, the glottal flow, from continuous speech. These issues have rendered inter...
Source: Speech Communication - August 24, 2018 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: September 2018Source: Speech Communication, Volume 102Author(s): (Source: Speech Communication)
Source: Speech Communication - August 21, 2018 Category: Speech-Language Pathology Source Type: research

Development of a Thai Phonetically Balanced Monosyllabic Word Recognition Test: Derivation of Phoneme Distribution, Word List Construction, and Response Evaluations
Publication date: Available online 11 August 2018Source: Speech CommunicationAuthor(s): Charturong Tantibundhit, Chutamanee Onsuwan, Adirek Munthuli, Ploypailin Sirimujalin, Thanaporn Anansiripinyo, Sutanya Phuechpanpaisal, Nida Wright, Krit KosawatAbstractThis paper proposes a test tool for Thai word recognition, the Thammasat University Phonetically Balanced Word List 2014 (TU PB’14), standardized on several major criteria: phonemic balance, familiarity, reliability, list equivalency, and homogeneity. Phoneme distributions from the largest written Thai corpus (InterBEST) were obtained and used to construct five phoneti...
Source: Speech Communication - August 12, 2018 Category: Speech-Language Pathology Source Type: research

Comparison of spectral tilt measures for sentence prominence in speech – effects of dimensionality and adverse noise conditions
Publication date: Available online 8 August 2018Source: Speech CommunicationAuthor(s): Sofoklis Kakouros, Okko Räsänen, Paavo AlkuAbstractLinguistic prominence in speech is known to correlate with the acoustic measures of energy, F0, and duration. In contrast, the role of spectral tilt in the realization of prominence has remained more inconsistent between previous empirical investigations. This may be partially due to the lack of a standard method for quantifying spectral tilt or due to difficulties in estimating the acoustical source of spectral tilt, the glottal flow, from continuous speech. These issues have rendered...
Source: Speech Communication - August 8, 2018 Category: Speech-Language Pathology Source Type: research

Sequence Discriminative Training for Deep Learning based Acoustic Keyword Spotting
Publication date: Available online 8 August 2018Source: Speech CommunicationAuthor(s): Zhehuai Chen, Yanmin Qian, Kai YuAbstractSpeech recognition is a sequence prediction problem. Besides employing various deep learning approaches for frame-level classification, sequence-level discriminative training has been proved to be indispensable to achieve the state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR). However, keyword spotting (KWS), as one of the most common speech recognition tasks, almost only benefits from frame-level deep learning due to the difficulty of getting competing sequence...
Source: Speech Communication - August 8, 2018 Category: Speech-Language Pathology Source Type: research

Phonetic Subspace Features for Improved Query by Example Spoken Term Detection
Publication date: Available online 8 August 2018Source: Speech CommunicationAuthor(s): Dhananjay Ram, Afsaneh Asaei, Hervé BourlardAbstractThis paper addresses the problem of detecting speech utterances from a large audio archive using a simple spoken query, hence referring to this problem as “Query by Example Spoken Term Detection” (QbE-STD). This still open pattern matching problem has been addressed in different contexts, often based on variants of the Dynamic Time Warping (DTW) algorithm. In the work reported here, we exploit Deep Neural Networks (DNN) and the so inferred phone posteriors to better model the phone...
Source: Speech Communication - August 8, 2018 Category: Speech-Language Pathology Source Type: research

Compressive Speech Enhancement in the Modulation Domain
Publication date: Available online 7 August 2018Source: Speech CommunicationAuthor(s): Siow Yong LowAbstractCompressive speech enhancement (CSE) has gained popularity in recent years as it bypasses the need for noise estimation. Parallel to that, modulation domain has been widely studied in speech applications as it offers a more compact representation and is closely associated with speech intelligibility enhancement. Motivated by the development in modulation domain and CSE, this paper seeks to explore the suitability of modulation domain based sparse reconstruction for use in CSE. The main idea is to study if the increas...
Source: Speech Communication - August 8, 2018 Category: Speech-Language Pathology Source Type: research

Speech enhancement in spectral envelope and details subspaces
In this study, we address this challenge through a combination strategy of spectral modulation decoupling and low-rank and sparsity oriented decomposition. Specifically, supervised low-rank and sparse decompositions with energy thresholding are developed in the spectral envelop subspace, In the spectral details subspace, an unsupervised robust principal component analysis is utilized to extract the fine structure. The validation results show that, compared with five speech enhancement algorithms, including MMSE-SPP, NMF-RPCA, RPCA, LARC and BNMF, the proposed algorithms achieves satisfactory performance on improving both p...
Source: Speech Communication - August 4, 2018 Category: Speech-Language Pathology Source Type: research

Fusion of Bottleneck, Spectral and Modulation Spectral Features for Improved Speaker Verification of Neutral and Whispered Speech
Publication date: Available online 27 July 2018Source: Speech CommunicationAuthor(s): Milton Sarria-Paja, Tiago H. FalkAbstractSpeech based biometrics is becoming a preferred method of identity management amongst users and companies. Current state-of-the-art speaker verification (SV) systems, however, are known to be strongly dependent on the condition of the speech material provided as input, and can be affected by unexpected variability presented during testing, such as with environmental noise or changes in vocal effort. In this paper, SV using whispered speech is explored, as whispered speech is known to be a natural s...
Source: Speech Communication - July 28, 2018 Category: Speech-Language Pathology Source Type: research

Glottal Inverse Filtering by Combining a Constrained LP and An HMM-Based Generative Model of Glottal Flow Derivative
Publication date: Available online 20 July 2018Source: Speech CommunicationAuthor(s): Akira SasouAbstractGlottal flow is expected to convey useful information that can be effectively used in several speech applications, such as speech synthesis, expressive speech processing, speaker recognition, and voice-based biomedical engineering. Glottal inverse filtering (GIF) estimates the glottal flow that cannot be directly measured from a speech signal without any prior knowledge. Thus far, although many GIF methods have been proposed, several studies have concluded that conventional GIFs tend to degrade in estimation accuracy, e...
Source: Speech Communication - July 21, 2018 Category: Speech-Language Pathology Source Type: research

Discrimination of L2 Greek vowel contrasts: Evidence from learners with Arabic L1 background
Publication date: Available online 20 July 2018Source: Speech CommunicationAuthor(s): Georgios P. GeorgiouABSTRACTThe present study investigates the assimilation of the Cypriot Greek (CGR) vowels to the phonological categories of Egyptian Arabic (EA) as well as the discrimination of 2 stressed and 2 unstressed Cypriot Greek vowel contrasts by native speakers of Egyptian Arabic. It also intends to test the discriminability of the assimilation types according to the predictions of the Perceptual Assimilation Model (PAM). 15 adult female immigrants, who permanently live in Cyprus for 4-5 years and were taught Greek in formal ...
Source: Speech Communication - July 21, 2018 Category: Speech-Language Pathology Source Type: research