Speech Communication
Papers 2746
1 page of 275 pages (2,746 results)
#1Rajib Sharma (Technion – Israel Institute of Technology)
#1Rajib SharmaH-Index: 4
Last. Jacob BenestyH-Index: 56
view all 3 authors...
Abstract This work presents a Kronecker product based methodology of frequency-domain beamforming of large sensor arrays for far-field broadband speech signals. The principal idea involves splitting up a given uniform linear array (ULA) into two smaller virtual ULAs (VULAs), using the Kronecker product. The linear system of the original ULA is bifurcated into two smaller linear systems of the VULAs. Henceforth, traditional adaptive beamformers such as the minimum-variance-distortionless-response...
#1Sahar GhannayH-Index: 6
#2Yannick EstèveH-Index: 16
Last. Nathalie CamelinH-Index: 9
view all 3 authors...
Abstract This paper presents a study of continuous word representations applied to automatic detection of speech recognition errors. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. We explore the use of several types of word representations: simple and combined linguistic embeddings, and acoustic ones associated to prosodic features, extracted from the audio signal. To compensate certain phenomena highlighted by the...
Abstract Deep learning has become one of the most widely accepted paradigms regarding machine learning. It focuses on the use of hierarchical data models and builds upon the notion that in order to learn about high level data representations, a better understanding of intermediate level representation is needed. Restricted Boltzmann Machines and deep belief networks are two main types of deep learning algorithms commonly used in a wide array of classification and pattern recognition tasks. Examp...
Abstract Speech emotion recognition plays an increasingly important role in emotional computing and is still a challenging task due to its complexity. In this study, we developed a framework integrating three distinctive classifiers: a deep neural network (DNN), a convolution neural network (CNN), and a recurrent neural network (RNN). The framework was used for categorical recognition of four discrete emotions (i.e., angry, happy, neutral and sad). Frame-level low-level descriptors (LLDs), segme...
#1Pejman MowlaeeH-Index: 16
#2Johannes Stahl (Graz University of Technology)H-Index: 3
Abstract In this paper, we investigate single-channel speech enhancement algorithms that operate in the short-time Fourier transform and take into account dependencies w.r.t. frequency. As a result of allowing for inter-frequency dependencies, the minimum mean square error optimal estimates of the short-time Fourier transform expansion coefficients are functions of complex-valued covariance matrices in general. The covariance matrices are not known a priori and have to be estimated from the obse...
#1Catherine Lai (Edin.: University of Edinburgh)H-Index: 9
#2Mireia FarrúsH-Index: 11
Last. Johanna D. Moore (Edin.: University of Edinburgh)H-Index: 46
view all 3 authors...
Abstract Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transition...
#1Mohammad Azharuddin Laskar (National Institute of Technology, Silchar)H-Index: 1
#2Rabul Hussain Laskar (National Institute of Technology, Silchar)H-Index: 10
Abstract This paper builds on a multi-task Deep Neural Network (DNN), which provides an utterance-level feature representation called j-vector, to implement a Text-dependent Speaker Verification (TDSV) system. This technique exploits the speaker idiosyncrasies associated with individual pass-phrases. However, speaker information is known to be characteristic of more specific speech units and, thus, it is likely that important speaker identity traits might get averaged out if it is considered as ...
#1Pranay DigheH-Index: 6
#2Afsaneh AsaeiH-Index: 13
Last. Hervé BourlardH-Index: 45
view all 3 authors...
Abstract We propose an information theoretic framework for quantitative assessment of acoustic models used in hidden Markov model (HMM) based automatic speech recognition (ASR). The HMM backend expects that (i) the acoustic model yields accurate state conditional emission probabilities for the observations at each time step, and (ii) the conditional probability distribution of the data given the underlying hidden state is independent of any other state in the sequence. The latter property is als...
Top fields of study
Speech synthesis
Pattern recognition
Natural language processing
Speech recognition
Computer science
Speech processing