Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space

Publication date: Available online 19 July 2018Source: Speech CommunicationAuthor(s): Yawen Xue, Yasuhiro Hamada, Masato AkagiAbstractThis paper proposes a rule-based voice conversion system for emotion which is capable of converting neutral speech to emotional speech using dimensional space (arousal and valence) to control the degree of emotion on a continuous scale. We propose an inverse three-layered model with acoustic features as output at the top layer, semantic primitives at the middle layer and emotion dimension as input at the bottom layer; an adaptive-based fuzzy inference system acts as connectors to extract the non-linear rules among the three layers. The rules are applied by modifying the acoustic features of neutral speech to create the different types of emotional speech. The prosody-related acoustic features of F0 and power envelope are parameterized using the Fujisaki model and target prediction model separately. Perceptual evaluation results show that the degree of emotion can be perceived well in the dimensional space of valence and arousal.
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research