Speech Enhancement Using a Risk Estimation Approach

Publication date: Available online 6 November 2019Source: Speech CommunicationAuthor(s): Jishnu Sadasivan, Chandra Sekhar Seelamantula, Nagarjuna Reddy MurakaAbstractThe goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure (risk). Often, this results in an estimate that depends on the unknown clean signal or its statistics. Since access to such priors is limited or impractical, one has to rely on an estimate of the clean signal statistics. In this paper, we develop a risk estimation framework for speech enhancement, in which one optimizes an unbiased estimate of the risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations and the noise statistics. Hence, the corresponding denoiser does not require the clean speech prior. We consider several speech-specific perceptually relevant distortion measures and develop corresponding unbiased estimates. Minimizing the risk estimates gives rise to denoisers, which are nonlinear functions of the a posteriori SNR. Listening tests show that, within the risk estimation framework, Itakura-Saito and weighted hyperbolic cosine distortions are superior than the other measures. Comparisons in terms of perceptual evaluation of speech quality (PESQ), segmental SNR (SSNR), source-to-distortion ratio (SDR), and short-time objective intelligibility (STOI) also indicate a superior performance for these two disto...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research