Speaker discrimination: Citation tones vs. coarticulated tones
Publication date: February 2020Source: Speech Communication, Volume 117Author(s): Ricky KW ChanAbstractThe task of forensic voice comparison (FVC) often involves the comparison of a voice in an offender recording with that in a suspect recording, with the aim to assist the investigating authority or the court in determining the identity of the speaker. One of the main goals in FVC research is to identify speech variables that are useful for differentiating speakers. While French and Stevens (2013) stated that connected speech processes (CSPs) vary across speakers and thus CSPs may be included in the ‘toolbox’ for foren...
Source: Speech Communication - February 23, 2020 Category: Speech-Language Pathology Source Type: research

Cosine Metric Learning Based Speaker Verification
Publication date: Available online 20 February 2020Source: Speech CommunicationAuthor(s): Zhongxin Bai, Xiao-Lei Zhang, Jingdong ChenAbstractThe performance of speaker verification depends on the overlap region of the decision scores of true and imposter trials. Motivated by the fact that the overlap region can be reduced by maximizing the between-class distance while minimizing the within-class variance of the trials, we present in this paper two cosine metric learning (CML) back-end algorithms. The first one, named m-CML, aims to enlarge the between-class distance with a regularization term to control the within-class va...
Source: Speech Communication - February 21, 2020 Category: Speech-Language Pathology Source Type: research

Wh-question or wh-declarative? Prosody makes the difference
Publication date: Available online 13 February 2020Source: Speech CommunicationAuthor(s): Yang Yang, Stella Gryllia, Lisa Lai-Shen ChengAbstractMandarin wh-words can have question or non-question (e.g., existential, universal quantificational) interpretations. Their interpretations in a sentence are usually not ambiguous, as the distinct interpretations need to be licensed by particular items/contexts. The starting point of our study concerns a case which allows the wh-words to remain ambiguous in a sentence: wh-words such as shénme appearing with diǎnr. After empirically confirming that such sentences are indeed ambiguo...
Source: Speech Communication - February 14, 2020 Category: Speech-Language Pathology Source Type: research

Improving Generative Adversarial Networks for Speech Enhancement through Regularization of Latent Representations
Publication date: Available online 6 February 2020Source: Speech CommunicationAuthor(s): Fan Yang, Ziteng Wang, Junfeng Li, Risheng Xia, Yonghong YanAbstractSpeech enhancement aims to improve the quality and intelligibility of speech signals, which is a challenging task in adverse environments. Speech enhancement generative adversarial network (SEGAN) that adopted a generative adversarial network (GAN) for speech enhancement achieved promising results. In this paper, a new network architecture and loss function based on SEGAN are proposed for speech enhancement. Different from most network structures applied in this field,...
Source: Speech Communication - February 8, 2020 Category: Speech-Language Pathology Source Type: research

Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition
Publication date: February 2020Source: Speech Communication, Volume 117Author(s): Ri Hyon Sun, Ri Jong CholAbstractThis paper focuses on adaptable continuous space language modeling approach of combining longer context information of recurrent neural network (RNN) with adaptation ability of subspace Gaussian mixture model (SGMM) which has been widely used in acoustic modeling for automatic speech recognition (ASR).In large vocabulary continuous speech recognition (LVCSR) it is a challenging problem to construct language models that can capture the longer context information of words and ensure generalization and adaptation...
Source: Speech Communication - January 30, 2020 Category: Speech-Language Pathology Source Type: research

nnnnSubspace Gaussian Mixture Based Language Modeling for Large Vocabulary Continuous Speech Recognition
Publication date: Available online 23 January 2020Source: Speech CommunicationAuthor(s): Ri Hyon Sun, Ri Jong CholAbstractThis paper focuses on adaptable continuous space language modeling approach of combining longer context information of recurrent neural network (RNN) with adaptation ability of subspace Gaussian mixture model (SGMM) which has been widely used in acoustic modeling for automatic speech recognition (ASR).In large vocabulary continuous speech recognition (LVCSR) it is a challenging problem to construct language models that can capture the longer context information of words and ensure generalization and ada...
Source: Speech Communication - January 24, 2020 Category: Speech-Language Pathology Source Type: research

Effect of articulatory and acoustic features on the intelligibility of speech in noise: an articulatory synthesis study
This study used an analysis-by-synthesis strategy to explore the contributions of multiple of these features. To this end, an articulatory speech synthesizer was used to synthesize the ten German digit words “Null” to “Neun”, for all 16 combinations of four binary features, i.e., modal vs. pressed phonation, normal vs. increased F1 and F2 formant frequencies, normal vs. increased f0 mean and range, and normal vs. increased duration of vowels. Subjects were asked to try to recognize the synthesized words in the presence of strong pink noise and babble noise. Compared to “plain” speech, the word recognition r...
Source: Speech Communication - January 22, 2020 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: January 2020Source: Speech Communication, Volume 116Author(s): (Source: Speech Communication)
Source: Speech Communication - January 17, 2020 Category: Speech-Language Pathology Source Type: research

Is segmental foreign accent perceived categorically?
Publication date: Available online 15 January 2020Source: Speech CommunicationAuthor(s): Rubén Pérez-Ramón, Martin Cooke, María Luisa García LecumberriAbstractThe second language learning process involves acquisition of sounds that differ to varying degrees from the sounds of a learner’s native language. Learners’ productions are strongly influenced by their native language particularly for sounds which are similar but non-identical in the two languages. However, foreign accent is typically investigated at the level of utterances and as a consequence the segmental basis of foreign accent and its role in communicat...
Source: Speech Communication - January 15, 2020 Category: Speech-Language Pathology Source Type: research

Positioning Oneself in Different roles: Structural and Lexical Measures of Power Relations Between Speakers in Map Task Corpus
Publication date: Available online 10 January 2020Source: Speech CommunicationAuthor(s): Vered Silber-Varod, Sarit Malayev, Anat LernerAbstractThis paper focuses on the process whereby speakers position themselves in jointly produced conversations. The expected degree of dominancy (degree of power realization) in the dialogues is derived by the independent variable of the role of a participant – a leader or a follower – in a Map Task setting. We examine the participants’ dominancy as reflected by a set of structural and lexical features. We then observe how the features are realized in four sex pairings: a female-lea...
Source: Speech Communication - January 11, 2020 Category: Speech-Language Pathology Source Type: research

Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models
Publication date: Available online 23 December 2019Source: Speech CommunicationAuthor(s): Jiang Fu, Yuya Chiba, Takashi Nose, Akinori ItoAbstractSpeech-based computer-assisted language learning (CALL) systems should recognize the utterances of the learner with high accuracy and evaluate the language proficiency of the specific speaker with appropriate methods. In this paper, we discuss the automatic assessment of the second language (L2) for non-native speakers. There are many existing works on pronunciation evaluation by applying the goodness of pronunciation (GOP) method. This paper introduces an automatic proficiency ev...
Source: Speech Communication - December 24, 2019 Category: Speech-Language Pathology Source Type: research

Speech Emotion Recognition: Emotional Models, Databases, Features, Preprocessing Methods, Supporting Modalities, and Classifiers
Publication date: Available online 13 December 2019Source: Speech CommunicationAuthor(s): Mehmet Berkehan Akçay, Kaya OğuzAbstractSpeech is the most natural way of expressing ourselves as humans. It is only natural then to extend this communication medium to computer applications. We define speech emotion recognition (SER) systems as a collection of methodologies that process and classify speech signals to detect the embedded emotions. SER is not a new field, it has been around for over two decades, and has regained attention thanks to the recent advancements. These novel studies make use of the advances in all fields of...
Source: Speech Communication - December 14, 2019 Category: Speech-Language Pathology Source Type: research

Audio-visual speech comprehension in noise with real and virtual speakers
Publication date: January 2020Source: Speech Communication, Volume 116Author(s): Jens Nirme, Birgitta Sahlén, Viveka Lyberg Åhlander, Jonas Brännström, Magnus HaakeAbstractThis paper presents a study where a 3D motion-capture animated ‘virtual speaker’ is compared to a video of a real speaker with regards to how it facilitates children's speech comprehension of narratives in background multitalker babble noise. As secondary measures, children self-assess the listening- and attentional effort demanded by the task, and associates words describing positive or negative social traits to the speaker. The results show tha...
Source: Speech Communication - November 30, 2019 Category: Speech-Language Pathology Source Type: research

Robust f0 extraction from monophonic signals using adaptive sub-band filtering
Publication date: Available online 29 November 2019Source: Speech CommunicationAuthor(s): Pradeep Rengaswamy, M Kiran Reddy, Krothapalli Sreenivasa Rao, Pallab DasguptaAbstractFundamental frequency (f0) extraction plays an important role in processing of monophonic signals such as speech and song. It is essential in various real-time applications such as emotion recognition, speech/singing voice discrimination and so on. Several f0 extraction methods have been proposed over the years, but no one algorithm works well for both speech and song. In this paper, we propose a novel approach that can accurately estimate f0 from sp...
Source: Speech Communication - November 29, 2019 Category: Speech-Language Pathology Source Type: research

Editorial Board
Publication date: December 2019Source: Speech Communication, Volume 115Author(s): (Source: Speech Communication)
Source: Speech Communication - November 25, 2019 Category: Speech-Language Pathology Source Type: research