Effect of articulatory and acoustic features on the intelligibility of speech in noise: An articulatory synthesis study

Published on Feb 1, 2020in Speech Communication1.661
· DOI :10.1016/J.SPECOM.2020.01.004
Thuanvan Ngo (Japan Advanced Institute of Science and Technology), Thuan Van Ngo (Japan Advanced Institute of Science and Technology)+ 0 AuthorsPeter Birkholz12
Estimated H-index: 12
(TUD: Dresden University of Technology)
Abstract In noisy conditions, speakers involuntarily change their manner of speaking to enhance the intelligibility of their voices. The increased intelligibility of this so-called Lombard speech is enabled by the change of multiple articulatory and acoustic features. While the major features of Lombard speech are well known from previous studies, little is known about their relative contributions to the intelligibility of speech in noise. This study used an analysis-by-synthesis strategy to explore the contributions of multiple of these features. To this end, an articulatory speech synthesizer was used to synthesize the ten German digit words “Null” to “Neun”, for all 16 combinations of four binary features, i.e., modal vs. pressed phonation, normal vs. increased F1 and F2 formant frequencies, normal vs. increased f0 mean and range, and normal vs. increased duration of vowels. Subjects were asked to try to recognize the synthesized words in the presence of strong pink noise and babble noise. Compared to “plain” speech, the word recognition rate was most improved by pressed phonation, followed by an increased f0 mean and f0 range, and increased formant frequencies. Increased duration of vowels slightly reduced the recognition rate for pink noise but had no effect for babble noise.
  • References (33)
  • Citations (0)
📖 Papers frequently viewed together
2009ICASSP: International Conference on Acoustics, Speech, and Signal Processing
6 Citations
12 Citations
3 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
Sep 15, 2019 in INTERSPEECH (Conference of the International Speech Communication Association)
#1Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
#2Susanne DrechselH-Index: 1
Last. Simon Stone (TUD: Dresden University of Technology)H-Index: 2
view all 3 authors...
1 CitationsSource
#1Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
#2Daniel Pape (McMaster University)H-Index: 8
Abstract Self-oscillating bar-mass models of the vocal folds are frequently used as the voice source in articulatory speech synthesis. For these models, a number of ways to handle the entrance loss and the flow separation in the glottis have been proposed. However, the effect of different modeling choices on vocal fold oscillation and glottal flow, and on the quality of synthesized speech has been rarely examined. In this study, a modified two-mass model of the vocal folds was used to simulate p...
#1Martin Cooke (Ikerbasque)H-Index: 33
#2Vincent Aubanel (CNRS: Centre national de la recherche scientifique)H-Index: 2
Last. Maria Luisa Garcia Lecumberri (UPV/EHU: University of the Basque Country)H-Index: 7
view all 3 authors...
Abstract Modifying clean speech prior to output in noisy conditions can lead to substantial intelligibility gains. Most algorithms operate by redistributing energy across the signal, leaving the timing of the underlying speech sounds intact. Other techniques do alter the timing of speech relative to the masker. Both classes of approach – spectral and temporal – lead to a reduction in energetic masking. The current study examines how their combination affects intelligibility. Arguments can be mad...
#1Maëva Garnier (CNRS: Centre national de la recherche scientifique)H-Index: 11
#2Lucie Ménard (UQAM: Université du Québec à Montréal)H-Index: 17
Last. Boris Alexandre (CNRS: Centre national de la recherche scientifique)H-Index: 1
view all 3 authors...
This study investigates the hypothesis that speakers make active use of the visual modality in production to improve their speech intelligibility in noisy conditions. Six native speakers of Canadian French produced speech in quiet conditions and in 85 dB of babble noise, in three situations: interacting face-to-face with the experimenter (AV), using the auditory modality only (AO), or reading aloud (NI, no interaction). The audio signal was recorded with the three-dimensional movements of their ...
3 CitationsSource
#1Thuan Van Ngo (Japan Advanced Institute of Science and Technology)
#2Rieko Kubo (Japan Advanced Institute of Science and Technology)H-Index: 7
Last. Masato Akagi (Japan Advanced Institute of Science and Technology)H-Index: 15
view all 4 authors...
Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identifi...
1 CitationsSource
#1Peter Birkholz (TUD: Dresden University of Technology)H-Index: 12
#2Lucia Martin (RWTH Aachen University)H-Index: 2
Last. Christiane Neuschaefer-Rube (RWTH Aachen University)H-Index: 10
view all 5 authors...
Secondary prosodic features contribute to paralinguistic information in speech.Concatenative speech synthesis has difficulties to control many prosodic features.Here, articulatory synthesis is used for rule-based control of prosodic features.Vocal tract length, articulatory precision and nasality are controlled effectively. Vocal emotions, as well as different speaking styles and speaker traits, are characterized by a complex interplay of multiple prosodic features. Natural sounding speech synth...
5 CitationsSource
#1Juraj Simko (UH: University of Helsinki)H-Index: 5
#2Štefan BeňušH-Index: 16
Last. Martti Vainio (UH: University of Helsinki)H-Index: 21
view all 3 authors...
Over the last century, researchers have collected a considerable amount of data reflecting the properties of Lombard speech, i.e., speech in a noisy environment. The documented phenomena predominately report effects on the speech signal produced in ambient noise. In comparison, relatively little is known about the underlying articulatory patterns of Lombard speech, in particular for lingual articulation. Here the authors present an analysis of articulatory recordings of speech material in babble...
9 CitationsSource
#1Tuomo Raitio (Aalto University)H-Index: 16
#2Antti Suni (UH: University of Helsinki)H-Index: 15
Last. Paavo Alku (Aalto University)H-Index: 48
view all 4 authors...
This papers studies the synthesis of speech over a wide vocal effort continuum and its perception in the presence of noise. Three types of speech are recorded and studied along the continuum: breathy, normal, and Lombard speech. Corresponding synthetic voices are created by training and adapting the statistical parametric speech synthesis system GlottHMM. Natural and synthetic speech along the continuum is assessed in listening tests that evaluate the intelligibility, quality, and suitability of...
19 CitationsSource
Speech produced in the presence of noise (Lombard speech) is typically more intelligible than speech produced in quiet (plain speech) when presented at the same signal-to-noise ratio, but the factors responsible for the Lombard intelligibility benefit remain poorly understood. Previous studies have demonstrated a clear effect of spectral differences between the two speech styles and a lack of effect of fundamental frequency differences. The current study investigates a possible role for duration...
12 CitationsSource
Cited By0