Eurasip Journal on Audio, Speech, and Music Processing
Papers 382
1 page of 39 pages (382 results)
#1Luis M. T. Jesus (RMIT: RMIT University)H-Index: 10
Last. Maria da Conceição Costa (University of Aveiro)H-Index: 1
view all 2 authors...
Experimental data combining complementary measures based on the oral airflow signal is presented in this paper, exploring the view that European Portuguese voiced stops are produced in a similar fashion to Germanic languages. Four Portuguese speakers were recorded producing a corpus of nine isolated words with /b, d, ɡ/ in initial, medial and final word position, and the same nine words embedded in 39 different sentences. Slope of the stop release (SLP), voice onset time (VOT), release and stop ...
#1Jichen Yang (NUS: National University of Singapore)
#2Longting Xu (Donghua University)
Last. Yunyun Ji (Nantong University)
view all 4 authors...
In order to improve the performance of hand-crafted features to detect playback speech, two discriminative features, constant-Q variance-based octave coefficients and constant-Q mean-based octave coefficients, are proposed for playback speech detection in this work. They rely on our findings that variance-based modified log magnitude spectrum and mean-based modified log magnitude spectrum can enhance the discriminative power between genuine speech and playback speech. Then constant-Q variance-ba...
#1Peter Mølgaard Sørensen (DTU: Technical University of Denmark)
#2Bastian Epp (DTU: Technical University of Denmark)H-Index: 5
Last. Tobias May (DTU: Technical University of Denmark)H-Index: 11
view all 3 authors...
A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported. The proposed system was derived from a high-complexity system with the goal to reduce complexity and to increase efficiency. In order to meet the requirements set by hardware resource constraints, a limited hyper-parameter grid search was performed, which showed that network complexity could be drastically reduced with little effect on classification acc...
#1Yuval Dorfan (BIU: Bar-Ilan University)H-Index: 7
#2Ofer Schwartz (CEVA Logistics)H-Index: 8
Last. Sharon Gannot (BIU: Bar-Ilan University)H-Index: 36
view all 3 authors...
Ad hoc acoustic networks comprising multiple nodes, each of which consists of several microphones, are addressed. From the ad hoc nature of the node constellation, microphone positions are unknown. Hence, typical tasks, such as localization, tracking, and beamforming, cannot be directly applied. To tackle this challenging joint multiple speaker localization and array calibration task, we propose a novel variant of the expectation-maximization (EM) algorithm. The coordinates of multiple arrays re...
#1Junfeng Hou (USTC: University of Science and Technology of China)H-Index: 1
Last. Li-Rong Dai (USTC: University of Science and Technology of China)H-Index: 24
view all 4 authors...
Attention-based encoder-decoder models have recently shown competitive performance for automatic speech recognition (ASR) compared to conventional ASR systems. However, how to employ attention models for online speech recognition still needs to be explored. Different from conventional attention models wherein the soft alignment is obtained by a pass over the entire input sequence, attention models for online recognition must learn online alignment to attend part of input sequence monotonically w...
1 CitationsSource
#1Jing Wang (BIT: Beijing Institute of Technology)H-Index: 15
#2Jin Wang (BIT: Beijing Institute of Technology)H-Index: 1
Last. Jingming Kuang (BIT: Beijing Institute of Technology)H-Index: 15
view all 5 authors...
Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRTF). Because the HRTF is closely related to human physiological structure, the HRTFs vary between individuals. Related machine learning studies to date tend to focus on binaural localization in reverberant or noisy environments, or in conditions with multiple simultaneously active sound sour...
#1Mohammed Sidi Yakoub (U de M: Université de Moncton)H-Index: 3
#2Sid-Ahmed Selouani (U de M: Université de Moncton)H-Index: 11
view all 4 authors...
In this paper, we use empirical mode decomposition and Hurst-based mode selection (EMDH) along with deep learning architecture using a convolutional neural network (CNN) to improve the recognition of dysarthric speech. The EMDH speech enhancement technique is used as a preprocessing step to improve the quality of dysarthric speech. Then, the Mel-frequency cepstral coefficients are extracted from the speech processed by EMDH to be used as input features to a CNN-based recognizer. The effectivenes...
#1Gal Itzhak (Technion – Israel Institute of Technology)H-Index: 3
#2Jacob Benesty (Université du Québec)H-Index: 56
Last. Israel Cohen (Technion – Israel Institute of Technology)H-Index: 38
view all 3 authors...
In this paper, we introduce a quadratic approach for single-channel noise reduction. The desired signal magnitude is estimated by applying a linear filter to a modified version of the observations’ vector. The modified version is constructed from a Kronecker product of the observations’ vector with its complex conjugate. The estimated signal magnitude is multiplied by a complex exponential whose phase is obtained using a conventional linear filtering approach. We focus on the linear and quadrati...
#1Pablo Gimeno (University of Zaragoza)H-Index: 1
#2Ignacio Viñals (University of Zaragoza)H-Index: 2
Last. Eduardo Lleida (University of Zaragoza)H-Index: 19
view all 5 authors...
This paper presents a new approach based on recurrent neural networks (RNN) to the multiclass audio segmentation task whose goal is to classify an audio signal as speech, music, noise or a combination of these. The proposed system is based on the use of bidirectional long short-term Memory (BLSTM) networks to model temporal dependencies in the signal. The RNN is complemented by a resegmentation module, gaining long term stability by means of the tied state concept in hidden Markov models. We exp...
#1Loris Nanni (UNIPD: University of Padua)H-Index: 41
#2Yandre M. G. CostaH-Index: 9
Last. Carlos N. Silla (PUCPR: Pontifícia Universidade Católica do Paraná)H-Index: 13
view all 6 authors...
In this work, we present an ensemble for automated audio classification that fuses different types of features extracted from audio files. These features are evaluated, compared, and fused with the goal of producing better classification accuracy than other state-of-the-art approaches without ad hoc parameter optimization. We present an ensemble of classifiers that performs competitively on different types of animal audio datasets using the same set of classifiers and parameter settings. To prod...
Top fields of study
Pattern recognition
Voice activity detection
Hidden Markov model
Speech recognition
Computer science
Speech processing