How modeling entrance loss and flow separation in a two-mass model affects the oscillation and synthesis quality

Published on Jul 1, 2019in Speech Communication1.661
· DOI :10.1016/j.specom.2019.04.009
Peter Birkholz12
Estimated H-index: 12
(TUD: Dresden University of Technology),
Daniel Pape8
Estimated H-index: 8
(McMaster University)
Abstract Self-oscillating bar-mass models of the vocal folds are frequently used as the voice source in articulatory speech synthesis. For these models, a number of ways to handle the entrance loss and the flow separation in the glottis have been proposed. However, the effect of different modeling choices on vocal fold oscillation and glottal flow, and on the quality of synthesized speech has been rarely examined. In this study, a modified two-mass model of the vocal folds was used to simulate phonation for 12 modeling options: three ways to model the entrance loss combined with four ways to model flow separation. For each condition, the following characteristics of the glottal oscillation and flow were determined: the phonation threshold pressure, the frequency range of self-sustained oscillation, the oscillation amplitude for different glottal rest openings, the spectral slope of the flow derivative, the maximum flow declination rate (MFDR), the open quotient (OQ), and the difference between the levels of the first and second harmonics of the flow derivative (H1-H2). In addition, the effect of the modeling options on the perceived naturalness of the synthetic voice was evaluated. There was no effect of the different ways to model entrance loss and flow separation on the phonation threshold pressure and on frequency range, and only a minor effect on MFDR, OQ, and H1-H2. However, there was a strong effect of the flow separation model on the oscillation amplitude, the spectral slope, and on the naturalness of the voice. The voice was perceived as most natural when the flow was assumed to always separate at the minimum glottal diameter.
  • References (35)
  • Citations (0)
📖 Papers frequently viewed together
4 Citations
42 Citations
48 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
#2Lucia Martin (RWTH Aachen University)H-Index: 2
Last. Christiane Neuschaefer-Rube (RWTH Aachen University)H-Index: 10
view all 5 authors...
Secondary prosodic features contribute to paralinguistic information in speech.Concatenative speech synthesis has difficulties to control many prosodic features.Here, articulatory synthesis is used for rule-based control of prosodic features.Vocal tract length, articulatory precision and nasality are controlled effectively. Vocal emotions, as well as different speaking styles and speaker traits, are characterized by a complex interplay of multiple prosodic features. Natural sounding speech synth...
5 CitationsSource
#1Benjamin Elie (University of Lorraine)H-Index: 5
#2Yves Laprie (University of Lorraine)H-Index: 10
The paper presents extensions of the single-matrix formulation (Mokhtari etźal., 2008, Speech Comm. 50(3) 179 - 190) that enable self-oscillation models of vocal folds, including glottal chink, to be connected to the vocal tract. They also integrate the case of a local division of the main air path into two lateral channels, as it may occur during the production of lateral consonants. Provided extensions are detailed by a reformulation of the acoustic conditions at the glottis, and at the upstre...
8 CitationsSource
Pressure distributions were obtained for 5°, 10°, and 20° convergent angles with a static physical model (M5) of the glottis. Measurements were made for minimal glottal diameters from d = 0.005–0.32 cm with a range of transglottal pressures of interest for phonation. Entrance loss coefficients were calculated at the glottal entrance for each minimal diameter and transglottal pressure to measure how far the flows in this region deviate from Bernoulli flow. Exit coefficients were also calculated t...
7 CitationsSource
126k Citations
#1Byron D. Erath (Clarkson University)H-Index: 12
#2Matías Zañartu (Valpo: Valparaiso University)H-Index: 13
Last. Sean D. Peterson (UW: University of Waterloo)H-Index: 16
view all 6 authors...
Voiced speech is a highly complex process involving coupled interactions between the vocal fold structure, aerodynamics, and acoustic field. Reduced-order lumped-element models of the vocal fold structure, coupled with various aerodynamic and acoustic models, have proven useful in a wide array of speech investigations. These simplified models of speech, in which the vocal folds are approximated as arrays of lumped masses connected to one another via springs and dampers to simulate the viscoelast...
30 CitationsSource
#1Yi Xu (UCL: University College London)H-Index: 33
#2Albert Lee (UCL: University College London)H-Index: 7
Last. Peter Birkholz (RWTH Aachen University)H-Index: 12
view all 5 authors...
Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that mal...
36 CitationsSource
#1Peter Birkholz (RWTH Aachen University)H-Index: 12
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract t...
39 CitationsSource
#1Rajat MittalH-Index: 50
#2Byron D. ErathH-Index: 12
Last. Michael W. PlesniakH-Index: 23
view all 3 authors...
This article presents a review of the fluid dynamics, flow-structure interactions, and acoustics associated with human phonation and speech. Our voice is produced through the process of phonation in the larynx, and an improved understanding of the underlying physics of this process is essential to advancing the treatment of voice disorders. Insights into the physics of phonation and speech can also contribute to improved vocal training and the development of new speech compression and synthesis ...
60 CitationsSource
In an important paper on the physics of small amplitude oscillations, Titze showed that the essence of the vertical phase difference, which allows energy to be transferred from the flowing air to the motion of the vocal folds, could be captured in a surface wave model, and he derived a formula for the phonation threshold pressure with an explicit dependence on the geometrical and biomechanical properties of the vocal folds. The formula inspired a series of experiments [e.g., R. Chan and I. Titze...
8 CitationsSource
#1Peter Birkholz (RWTH Aachen University)H-Index: 12
#2Bernd J. Kröger (RWTH Aachen University)H-Index: 15
Last. Christiane Neuschaefer-Rube (RWTH Aachen University)H-Index: 10
view all 3 authors...
We present a novel quantitative model for the generation of articulatory trajectories based on the concept of sequential target approximation. The model was applied for the detailed reproduction of movements in repeated consonant-vowel syllables measured by electromagnetic articulography (EMA). The trajectories for the constrictor (lower lip, tongue tip, or tongue dorsum) and the jaw were reproduced. Thereby, we tested the following hypotheses about invariant properties of articulatory commands:...
38 CitationsSource
Cited By0
#1Thuanvan Ngo (Japan Advanced Institute of Science and Technology)
#1Thuan Van Ngo (Japan Advanced Institute of Science and Technology)
Last. Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
view all 3 authors...
Abstract In noisy conditions, speakers involuntarily change their manner of speaking to enhance the intelligibility of their voices. The increased intelligibility of this so-called Lombard speech is enabled by the change of multiple articulatory and acoustic features. While the major features of Lombard speech are well known from previous studies, little is known about their relative contributions to the intelligibility of speech in noise. This study used an analysis-by-synthesis strategy to exp...