Eurasip Journal on Audio, Speech, and Music Processing
Papers 345
1 page of 35 pages (345 results)
#1Linhui Sun (NUPT: Nanjing University of Posts and Telecommunications)H-Index: 1
#2Sheng Fu (NUPT: Nanjing University of Posts and Telecommunications)H-Index: 1
Last.Fu Wang (NUPT: Nanjing University of Posts and Telecommunications)H-Index: 1
view all 3 authors...
The overall recognition rate will reduce due to the increase of emotional confusion in multiple speech emotion recognition. To solve the problem, we propose a speech emotion recognition method based on the decision tree support vector machine (SVM) model with Fisher feature selection. At the stage of feature selection, Fisher criterion is used to filter out the feature parameters of higher distinguish ability. At the emotion classification stage, an algorithm is proposed to determine the structu...
2 CitationsSource
#1Mohit Shah (ASU: Arizona State University)H-Index: 7
#2Ming Tu (ASU: Arizona State University)H-Index: 5
Last.Andreas Spanias (ASU: Arizona State University)H-Index: 28
view all 5 authors...
Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional l1-regularized logistic regression cost functi...
Phonetic information is one of the most essential components of a speech signal, playing an important role for many speech processing tasks. However, it is difficult to integrate phonetic information into speaker verification systems since it occurs primarily at the frame level while speaker characteristics typically reside at the segment level. In deep neural network-based speaker verification, existing methods only apply phonetic information to the frame-wise trained speaker embeddings. To imp...
#1Kacper Pawel Radzikowski (Warsaw University of Technology)H-Index: 1
#2Robert Nowak (Warsaw University of Technology)H-Index: 4
Last.Osamu Yoshie (Waseda University)H-Index: 19
view all 4 authors...
Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodology applied and datasets used. However, the level of accuracy decreases significantly when the same ASR system is used by a non-native speaker of the language to be recognized. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size and in the number of existing languages. This problem makes it difficult to train or build sufficiently a...
#1Diego de Benito-Gorrón (UAM: Autonomous University of Madrid)H-Index: 1
#2Alicia Lozano-Diez (UAM: Autonomous University of Madrid)H-Index: 5
Last.Joaquin Gonzalez-Rodriguez (UAM: Autonomous University of Madrid)H-Index: 34
view all 4 authors...
Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments (216 h), selected from the Goog...
#1Javier TejedorH-Index: 13
#2Doroteo Torre Toledano (UAM: Autonomous University of Madrid)H-Index: 14
Last.Antonio Moreno Jiménez (UAM: Autonomous University of Madrid)H-Index: 34
view all 7 authors...
The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that conta...
Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormous increase in production of movies and songs in Bollywood industry over the last 50 decades. Surveying this extensive dataset of singers, the paper presents singer identification system for Indian playback singers. Four acoustic features namely—formants, harmonic spectral envelope, vibrato...
According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively. Subsequently, the speech signal is estimated from noisy speech when the inter-channel level difference (ICLD) and inter-channel correlation (ICC) between speech and noise are given. In this paper, exact inter-channel cues and the pre-enhanced inter-channel cues are used for speech rest...
#1Liang He (THU: Tsinghua University)H-Index: 6
#2Xianhong Chen (THU: Tsinghua University)H-Index: 2
Last.Michael T. Johnson (UK: University of Kentucky)H-Index: 2
view all 6 authors...
In this paper, we apply a latent class model (LCM) to the task of speaker diarization. LCM is similar to Patrick Kenny’s variational Bayes (VB) method in that it uses soft information and avoids premature hard decisions in its iterations. In contrast to the VB method, which is based on a generative model, LCM provides a framework allowing both generative and discriminative models. The discriminative property is realized through the use of i-vector (Ivec), probabilistic linear discriminative anal...
#1Zamir Ben-Hur (BGU: Ben-Gurion University of the Negev)H-Index: 2
#2David L. Alon (Facebook)H-Index: 4
Last.Ravish Mehra (Facebook)H-Index: 13
view all 4 authors...
In response to renewed interest in virtual and augmented reality, the need for high-quality spatial audio systems has emerged. The reproduction of immersive and realistic virtual sound requires high resolution individualized head-related transfer function (HRTF) sets. In order to acquire an individualized HRTF, a large number of spatial measurements are needed. However, such a measurement process requires expensive and specialized equipment, which motivates the use of sparsely measured HRTFs. Pr...
Top fields of study
Pattern recognition
Hidden Markov model
Speech recognition
Computer science
Speech processing