Controlled Random Search Improves Sample Mining and Hyper-Parameter Optimization.

Published on Jan 1, 2018in arXiv: Learning
Gowtham Muniraju3
Estimated H-index: 3
Bhavya Kailkhura12
Estimated H-index: 12
+ 1 AuthorsPeer-Timo Bremer26
Estimated H-index: 26
A common challenge in machine learning and related fields is the need to efficiently explore high dimensional parameter spaces using small numbers of samples. Typical examples are hyper-parameter optimization in deep learning and sample mining in predictive modeling tasks. All such problems trade-off exploration, which samples the space without knowledge of the target function, and exploitation where information from previous evaluations is used in an adaptive feedback loop. Much of the recent focus has been on the exploitation while exploration is done with simple designs such as Latin hypercube or even uniform random sampling. In this paper, we introduce optimal space-filling sample designs for effective exploration of high dimensional spaces. Specifically, we propose a new parameterized family of sample designs called space-filling spectral designs, and introduce a framework to choose optimal designs for a given sample size and dimension. Furthermore, we present an efficient algorithm to synthesize a given spectral design. Finally, we evaluate the performance of spectral designs in both data space and model space applications. The data space exploration is targeted at recovering complex regression functions in high dimensional spaces. The model space exploration focuses on selecting hyper-parameters for a given neural network architecture. Our empirical studies demonstrate that the proposed approach consistently outperforms state-of-the-art techniques, particularly with smaller design sizes.
  • References (0)
  • Citations (0)
📖 Papers frequently viewed together
2 Citations
2 Citations
78% of Scinapse members use related papers. After signing in, all features are FREE.
#1Smitha Milli (University of California, Berkeley)H-Index: 4
#2Pieter Abbeel (University of California, Berkeley)H-Index: 71
Last. Igor MordatchH-Index: 23
view all 3 authors...
Teachers intentionally pick the most informative examples to show their students. However, if the teacher and student are neural networks, the examples that the teacher network learns to give, although effective at teaching the student, are typically uninterpretable. We show that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies. We evaluate interpretability by (1) measuring the similarity of the teacher's emergent strategies to intu...
4 Citations
2 Citations
Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that selecting training examples plays an equally important role. We propose distance weighted sampling,...
79 Citations
#1Olivier BousquetH-Index: 38
#2Sylvain GellyH-Index: 20
Last. Damien VincentH-Index: 7
view all 5 authors...
The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, "one-shot" optimization schemes - where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel - are commonly used. It is known that grid search is sub-optimal, especially when only a few critical parameters matter, and suggest to use random search i...
7 Citations
Nov 11, 2016 in SIGGRAPH (International Conference on Computer Graphics and Interactive Techniques)
#1Bhavya Kailkhura (LLNL: Lawrence Livermore National Laboratory)H-Index: 12
#2Jayaraman J. Thiagarajan (LLNL: Lawrence Livermore National Laboratory)H-Index: 14
Last. Pramod K. Varshney (SU: Syracuse University)H-Index: 65
view all 4 authors...
A common solution to reducing visible aliasing artifacts in image reconstruction is to employ sampling patterns with a blue noise power spectrum. These sampling patterns can prevent discernible artifacts by replacing them with incoherent noise. Here, we propose a new family of blue noise distributions, Stair blue noise, which is mathematically tractable and enables parameter optimization to obtain the optimal sampling distribution. Furthermore, for a given sample budget, the proposed blue noise ...
8 CitationsSource
Mar 1, 2016 in ICASSP (International Conference on Acoustics, Speech, and Signal Processing)
#1Bhavya Kailkhura (SU: Syracuse University)H-Index: 12
#2Jayaraman J. Thiagarajan (LLNL: Lawrence Livermore National Laboratory)H-Index: 14
Last. Pramod K. Varshney (SU: Syracuse University)H-Index: 65
view all 4 authors...
In this paper, we study the problem of generating uniform random point samples on a domain of d dimensional space based on a minimum distance criterion between point samples (Poisson-disk sampling or PDS). First, we formally define PDS via the pair correlation function (PCF) to quantitatively evaluate properties of the sampling process. Surprisingly, none of the existing PDS techniques satisfy both uniformity and minimum distance criterion, simultaneously. These approaches typically create an ap...
3 CitationsSource
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/o...
16.6k Citations
#1Daniel Heck (University of Konstanz)H-Index: 2
#2Thomas Schlömer (University of Konstanz)H-Index: 7
Last. Oliver Deussen (University of Konstanz)H-Index: 38
view all 3 authors...
In this article we revisit the problem of blue noise sampling with a strong focus on the spectral properties of the sampling patterns. Starting from the observation that oscillations in the power spectrum of a sampling pattern can cause aliasing artifacts in the resulting images, we synthesize two new types of blue noise patterns: step blue noise with a power spectrum in the form of a step function and single-peak blue noise with a wide zero-region and no oscillations except for a single peak. W...
40 CitationsSource
#1Mohamed S. Ebeida (SNL: Sandia National Laboratories)H-Index: 11
#2Scott A. Mitchell (SNL: Sandia National Laboratories)H-Index: 17
Last. John D. Owens (UC Davis: University of California, Davis)H-Index: 44
view all 5 authors...
We provide a simple algorithm and data structures for d-dimensional unbiased maximal Poisson-disk sampling. We use an order of magnitude less memory and time than the alternatives. Our results become more favorable as the dimension increases. This allows us to produce bigger samplings. Domains may be non-convex with holes. The generated point cloud is maximal up to round-off error. The serial algorithm is provably bias-free. For an output sampling of size n in fixed dimension d, we use a linear ...
58 CitationsSource
#1James Bergstra (UdeM: Université de Montréal)H-Index: 21
#2Yoshua Bengio (UdeM: Université de Montréal)H-Index: 122
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random ...
1,875 Citations
Cited By0