Template based techniques for automatic segmentation of TTS unit database
Published on Mar 1, 2016 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
· DOI :10.1109/ICASSP.2016.7472750
We address the problem of automatic segmentation of the unit database in unit-selection based TTS and propose template based forced alignment segmentation in the one-pass dynamic programming (DP) framework with several variants: i) multi-template representation derived by modified K-means (MKM) algorithm, ii) context-independent and context-dependent templates for reduced multi-template representation, iii) segmental K-means algorithm with MKM modeling of phone classes, as a template-based equivalent of the conventional embedded re-estimation procedure for HMM based modeling and segmentation, that is typical for deriving unit-databases for TTS (e.g. EHMM in Festival). We first benchmark the performance of the proposed segmentation framework on TIMIT database for phonetic segmentation given the availability of phonetic labeling ground truth in TIMIT. We then apply the proposed template based segmentation algorithms for syllabic Indian language TTS, and benchmark the proposed segmentation using objective measures based on spectral distortions (SD) obtained on time-aligned speech utterances and compare it with other recent segmentation approaches, namely the group-delay (GD) based semiautomatic method, Hybrid method, EHMM, HMM and SKM-HMM and show that the proposed template based approaches offer comparable and better spectral distortions, validating their ability to provide accurate high-resolution segmentation of the unit-database.