Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure

Published on Feb 1, 2016in IEEE Transactions on Signal Processing5.23
· DOI :10.1109/TSP.2015.2477805
Visar Berisha12
Estimated H-index: 12
(ASU: Arizona State University),
Alan Wisler4
Estimated H-index: 4
(ASU: Arizona State University)
+ 1 AuthorsAndreas Spanias28
Estimated H-index: 28
(ASU: Arizona State University)
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a nonparametric f -divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm these theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.
  • References (42)
  • Citations (31)
📖 Papers frequently viewed together
25 Citations
457 Citations
2,672 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Visar Berisha (ASU: Arizona State University)H-Index: 12
#2Alfred O. HeroH-Index: 59
25 CitationsSource
#1Visar Berisha (ASU: Arizona State University)H-Index: 12
#2Douglas Cochran (ASU: Arizona State University)H-Index: 14
Existing statistical learning methods perform well when evaluated on training and test data drawn from the same distribution. In practice, however, these distributions are not always the same. In this paper we derive an estimable upper bound on the test error rate that depends on a new probability distance measure between training and test distributions. Furthermore, we identify a non-parametric estimator for this distance measure that can be estimated directly from data. We show how this new pr...
1 CitationsSource
Jan 1, 2014 in NeurIPS (Neural Information Processing Systems)
#1Kevin R. Moon (UM: University of Michigan)H-Index: 10
#2Alfred O. Hero (UM: University of Michigan)H-Index: 59
The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of sa...
24 Citations
May 1, 2014 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Visar Berisha (ASU: Arizona State University)H-Index: 12
#2Julie M. Liss (ASU: Arizona State University)H-Index: 26
Last. Andreas Spanias (ASU: Arizona State University)H-Index: 28
view all 5 authors...
The current state of the art in judging pathological speech intelligibility is subjective assessment performed by trained speech pathologists (SLP). These tests, however, are inconsistent, costly and, oftentimes suffer from poor intra- and inter-judge reliability. As such, consistent, reliable, and perceptually-relevant objective evaluations of pathological speech are critical. Here, we propose a data-driven approach to this problem. We propose new cost functions for examining data from a series...
14 CitationsSource
#1Kaitlin L. Lansford (ASU: Arizona State University)H-Index: 8
#2Julie M. Liss (ASU: Arizona State University)H-Index: 26
Purpose The purpose of this study was to determine the extent to which vowel metrics are capable of distinguishing healthy from dysarthric speech and among different forms of dysarthria. Method A v...
48 CitationsSource
#1Kumar Sricharan (UM: University of Michigan)H-Index: 12
#2Raviv Raich (OSU: Oregon State University)H-Index: 23
Last. Alfred O. Hero (UM: University of Michigan)H-Index: 59
view all 3 authors...
This paper introduces a class of k-nearest neighbor (k-NN) estimators called bipartite plug-in (BPI) estimators for estimating integrals of nonlinear functions of a probability density, such as Shannon entropy and Renyi entropy. The density is assumed to be smooth, have bounded support, and be uniformly bounded from below on this set. Unlike previous k-NN estimators of nonlinear density functionals, the proposed estimator uses data-splitting and boundary correction to achieve lower mean square e...
22 CitationsSource
#1XuanLong Nguyen (UM: University of Michigan)H-Index: 19
#2Martin J. Wainwright (University of California, Berkeley)H-Index: 71
Last. Michael I. Jordan (University of California, Berkeley)H-Index: 128
view all 3 authors...
We develop and analyze M-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a nonasymptotic variational characterization of f -divergences, which allows the problem of estimating divergences to be tackled via convex empirical risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estima...
267 CitationsSource
#1Shai Ben-David (UW: University of Waterloo)H-Index: 35
#2John Blitzer (University of California, Berkeley)H-Index: 22
Last. Jennifer Wortman Vaughan (Harvard University)H-Index: 26
view all 6 authors...
Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target d...
618 CitationsSource
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress fr...
875 CitationsSource
Jun 18, 2009 in UAI (Uncertainty in Artificial Intelligence)
#1Yishay Mansour (TAU: Tel Aviv University)H-Index: 62
#2Mehryar Mohri (CIMS: Courant Institute of Mathematical Sciences)H-Index: 52
Last. Afshin Rostamizadeh (NYU: New York University)H-Index: 23
view all 3 authors...
This paper presents a novel theoretical study of the general problem of multiple source adaptation using the notion of Renyi divergence. Our results build on our previous work [12], but significantly broaden the scope of that work in several directions. We extend previous multiple source loss guarantees based on distribution weighted combinations to arbitrary target distributions P, not necessarily mixtures of the source distributions, analyze both known and unknown target distribution cases, an...
43 Citations
Cited By31
This thesis contributes to the mathematical foundation of domain adaptation as emerging field in machine learning. In contrast to classical statistical learning, the framework of domain adaptation takes into account deviations between probability distributions in the training and application setting. Domain adaptation applies for a wider range of applications as future samples often follow a distribution that differs from the ones of the training samples. A decisive point is the generality of th...
Entity resolution (ER) refers to the problem of matching records in one or more relations that refer to the same real-world entity. While supervised machine learning (ML) approaches achieve the state-of-the-art results, they require a large amount of labeled examples that are expensive to obtain and often times infeasible. We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve perfo...
#1Weizhi LiH-Index: 1
#2Gautam DasarathyH-Index: 10
Last. Visar BerishaH-Index: 12
view all 3 authors...
Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network by softening the ground-truth labels in the training data in an attempt to penalize overconfident outputs. Existing approaches typically use cross-validation to impose this smoothing, which is uniform across all training data. In this paper, we show that such ...
1 Citations
Machine learning techniques will contribution towards making Internet of Things (IoT) symmetric applications among the most significant sources of new data in the future. In this context, network systems are endowed with the capacity to access varieties of experimental symmetric data across a plethora of network devices, study the data information, obtain knowledge, and make informed decisions based on the dataset at its disposal. This study is limited to supervised and unsupervised machine lear...
3 CitationsSource
#2Alfred O. Hero (UM: University of Michigan)H-Index: 59
This paper proposes a geometric estimator of dependency between a pair of multivariate random variables. The proposed estimator of dependency is based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples. This estimator converges to a quantity that we call the geometric mutual information (GMI), which is equivalent to the Henze–Penrose divergence. between the joint distribution of the multivariate samples and the product of the marginals. The GMI h...
3 CitationsSource
#1Donald E. WaagenH-Index: 4
#2Katie RaineyH-Index: 5
view all 10 authors...
Deep learning architectures have demonstrated state-of-the-art performance for object classification and have become ubiquitous in commercial products. These methods are often applied without understanding (a) the difficulty of a classification task given the input data, and (b) how a specific deep learning architecture transforms that data. To answer (a) and (b), we illustrate the utility of a multivariate nonparametric estimator of class separation, the Henze-Penrose (HP) statistic, in the ori...
May 1, 2019 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Salimeh Yasaei Sekeh (UM: University of Michigan)H-Index: 5
#2Alfred O. Hero (UM: University of Michigan)H-Index: 59
Feature selection and reducing the dimensionality of data is an essential step in data analysis. In this work we propose a new criterion for feature selection that is formulated as conditional information between features given the labeled variable. Instead of using the standard mutual information measure based on Kullback-Leibler divergence, we use our proposed criterion to filter out redundant features for the purpose of multiclass classification. This approach results in an efficient and fast...
#1Yuanhua Fu (University of Electronic Science and Technology of China)
#2Zhiming He (University of Electronic Science and Technology of China)
Cooperative spectrum sensing (CSS) is crucial for dynamic spectrum access in cognitive radio networks. This paper considers a CSS scheme by using a multilevel quantizer in each sensing node (SN) to quantize the local energy detector’s observation. A log-likelihood ratio test detector by using quantized data received from each SN is proposed to determine the presence or absence of the primary user signal. The Bhattacharyya distance (BD) of the cooperative sensing system is derived. Then, a quanti...
#2David SontagH-Index: 33
Last. Rajesh RanganathH-Index: 23
view all 3 authors...
Learning domain-invariant representations has become a popular approach to unsupervised domain adaptation and is often justified by invoking a particular suite of theoretical results. We argue that there are two significant flaws in such arguments. First, the results in question hold only for a fixed representation and do not account for information lost in non-invertible transformations. Second, domain invariance is often a far too strict requirement and does not always lead to consistent estim...
5 Citations
#1Stephanie A. Borrie (USU: Utah State University)H-Index: 9
#2Tyson S. Barrett (USU: Utah State University)H-Index: 3
Last. Visar Berisha (ASU: Arizona State University)H-Index: 12
view all 4 authors...
Purpose Conversational entrainment, the phenomenon whereby communication partners synchronize their behavior, is considered essential for productive and fulfilling conversation. Lack of entrainment...
1 CitationsSource