arXiv: Machine Learning

Papers

Papers 7356

1 page of 736 pages (7,356 results)

Newest

While the accuracy of modern deep learning models has significantly improved in recent years, the ability of these models to generate uncertainty estimates has not progressed to the same degree. Uncertainty methods are designed to provide an estimate of class probabilities when predicting class assignment. While there are a number of proposed methods for estimating uncertainty, they all suffer from a lack of calibration: predicted probabilities can be off from empirical ones by a few percent or ...

In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise L_1norm and pairwise L_\inftynorm, respectively. This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. Desp...

We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the "generator"). GEBMs ar...

Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness. This implies that solutions obtained using L2regularisation should in principle be sharper than those without, despite generalising better. We show t...

We consider the theory of regression on a manifold using reproducing kernel Hilbert space methods. Manifold models arise in a wide variety of modern machine learning problems, and our goal is to help understand the effectiveness of various implicit and explicit dimensionality-reduction methods that exploit manifold structure. Our first key contribution is to establish a novel nonasymptotic version of the Weyl law from differential geometry. From this we are able to show that certain spaces of sm...

We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost: When updating the generator parameters, we simply zero out the gradient contributions from the elements of the batch that the critic scores as `least realistic'. Through experiments on many different GAN variants, we show that this `top-k update' procedure is a generally applicable improvement. In order to un...

Optimal Transport (OT) is being widely used in various fields such as machine learning and computer vision, as it is a powerful tool for measuring the similarity between probability distributions and histograms. In previous studies, OT has been defined as the minimum cost to transport probability mass from one probability distribution to another. In this study, we propose a new framework in which OT is considered as a maximum a posteriori (MAP) solution of a probabilistic generative model. With ...

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We show that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. Our framework yields an analytical expression for the maximal SGD learning rate as a function of batch size, informing practical optimisation schemes. We use this framework to demonstrate that accepted and empirically-proven schemes for adapting ...

An unbiased low-variance gradient estimator, termed GO gradient, was proposed recently for expectation-based objectives \mathbb{E}_{q_{\boldsymbol{\gamma}}(\boldsymbol{y})} [f(\boldsymbol{y})] where the random variable (RV) \boldsymbol{y}may be drawn from a stochastic computation graph with continuous (non-reparameterizable) internal nodes and continuous/discrete leaves. Upgrading the GO gradient, we present for $\mathbb{E}_{q_{\boldsymbol{\boldsymbol{\gamma}}}(\boldsymbol{y})} [f(\boldsym...

Thanks to the reparameterization trick, deep latent Gaussian models have shown tremendous success recently in learning latent representations. The ability to couple them however with nonparamet-ric priors such as the Dirichlet Process (DP) hasn't seen similar success due to its non parameteriz-able nature. In this paper, we present an alternative treatment of the variational posterior of the Dirichlet Process Deep Latent Gaussian Mixture Model (DP-DLGMM), where we show that the prior cluster par...

12345678910