Phonetic Subspace Features for Improved Query by Example Spoken Term Detection

Publication date: Available online 8 August 2018Source: Speech CommunicationAuthor(s): Dhananjay Ram, Afsaneh Asaei, Hervé BourlardAbstractThis paper addresses the problem of detecting speech utterances from a large audio archive using a simple spoken query, hence referring to this problem as “Query by Example Spoken Term Detection” (QbE-STD). This still open pattern matching problem has been addressed in different contexts, often based on variants of the Dynamic Time Warping (DTW) algorithm. In the work reported here, we exploit Deep Neural Networks (DNN) and the so inferred phone posteriors to better model the phonetic subspaces and, consequently, improve the QbE-STD performance. Those phone posteriors have indeed been shown to properly model the union of the underlying low-dimensional phonetic subspaces. Exploiting this property, we investigate here two methods relying on sparse modeling and linguistic knowledge of sub-phonetic components. Sparse modeling characterizes the phonetic subspaces through a dictionary for sparse coding. Projection of the phone posteriors through reconstruction on the corresponding subspaces using their sparse representation enhance those phone posteriors. On the other hand, linguistic knowledge driven sub-phonetic structures are identified using phonological posteriors which consists of the probabilities of phone attributes estimated by DNNs, resulting in a new set of feature vectors. These phonological posteriors provide complementary info...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research