How modeling entrance loss and flow separation in a two-mass model affects the oscillation and synthesis quality
Abstract Self-oscillating bar-mass models of the vocal folds are frequently used as the voice source in articulatory speech synthesis. For these models, a number of ways to handle the entrance loss and the flow separation in the glottis have been proposed. However, the effect of different modeling choices on vocal fold oscillation and glottal flow, and on the quality of synthesized speech has been rarely examined. In this study, a modified two-mass model of the vocal folds was used to simulate phonation for 12 modeling options: three ways to model the entrance loss combined with four ways to model flow separation. For each condition, the following characteristics of the glottal oscillation and flow were determined: the phonation threshold pressure, the frequency range of self-sustained oscillation, the oscillation amplitude for different glottal rest openings, the spectral slope of the flow derivative, the maximum flow declination rate (MFDR), the open quotient (OQ), and the difference between the levels of the first and second harmonics of the flow derivative (H1-H2). In addition, the effect of the modeling options on the perceived naturalness of the synthetic voice was evaluated. There was no effect of the different ways to model entrance loss and flow separation on the phonation threshold pressure and on frequency range, and only a minor effect on MFDR, OQ, and H1-H2. However, there was a strong effect of the flow separation model on the oscillation amplitude, the spectral slope, and on the naturalness of the voice. The voice was perceived as most natural when the flow was assumed to always separate at the minimum glottal diameter.