Automatic Speech Emotion Recognition using an Optimal Combination of Features based on EMD-TKEO

Publication date: Available online 19 September 2019Source: Speech CommunicationAuthor(s): Leila Kerkeni, Youssef Serrestou, Kosai Raoof, Mohamed Mbarki, Mohamed Ali Mahjoub, Catherine ClederAbstractIn this paper, we propose a global approach for speech emotion recognition (SER) system using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD combined with the Teager-Kaiser Energy Operator (TKEO) gives an efficient time-frequency analysis of the non-stationary signals. In this method, each signal is decomposed using EMD into oscillating components called intrinsic mode functions (IMFs). TKEO is used for estimating the time-varying amplitude envelope and instantaneous frequency of a signal that is supposed to be Amplitude Modulation-Frequency Modulation (AM-FM) signal. A subset of the IMFs was selected and used to extract features from speech signal to recognize different emotions. The main contribution of our work is to extract novel features named modulation spectral (MS) features and modulation frequency features (MFF) based on AM-FM modulation model and combined them with cepstral features. It is believed that the combination of all features will improve the performance of the emotion recognition system. Furthermore, we examine the effect of feature selection on SER system performance. For classification task, Support Vecto Machine (SVM) and Recurrent Neural Networks (RNN) are used to distinguish seven basic emotions. Two databases- the Berlin...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research