Efficient Two-stage Processing for Joint Sequence Model-based Thai Grapheme-to-Phoneme Conversion

Publication date: Available online 12 December 2018Source: Speech CommunicationAuthor(s): Anocha Rugchatjaroen, Sittipong Saychum, Sarawoot Kongyoung, Patcharika Chootrakool, Sawit Kasuriya, Chai WutiwiwatchaiAbstractThai grapheme-to-phoneme conversion (G2P) is a challenging task due to its difficulties found by many previous studies. This paper introduces a novel two-stage processing for Thai G2P. The first stage uses Conditional Random Fields (CRF) to segment input text into pseudo-syllable (PS) units, the smallest unit with pronunciation inconfusable. This first CRF simultaneously segments input text into PS units and predicts the function of each character. Outputs from the first stage are used to efficiently align graphemes and phonemes, forming graphone joint sequences as input for the next stage. The second stage uses another CRF to model the graphone joint sequences. The character function predicted by the first stage is the cue to explicitly solve some critical Thai G2P difficulties such as hidden syllables often appeared in loan words and complicated character ordering. An evaluation is done using a large pronunciation dictionary that covers over 70% of Thai word usage. Experimental results show that 6.55% and 8.43% word error rates (WER) are obtained at the first and the second prediction states, while the overall G2P achieves a 9.94% WER. This is as much as 14.49% absolute improvement from a baseline model using Context Free Grammar (CFG) syllabification and sylla...
Source: Speech Communication - Category: Speech-Language Pathology Source Type: research