Deep neural network based i-vector mapping for speaker verification using short utterances

Publication date: Available online 19 October 2018Source: Speech CommunicationAuthor(s): Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer AlwanAbstractText-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model (UBM) training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based (denoted I-vector_GMM) and deep neural network (DNN) based (denoted as I-vector_DNN) methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations (from full length to 5 s). However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utter...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research