wisdom

Hyperparameter optimization

Hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of optimizing a measure of the algorithm’s performance on an independent data set. A serach consists of:

Bayesian optimization

The learning algorithm’s generalization performance is modeled as a sample from a Gaussian process (GP) ie where every point in some continuous input space is associated with a normally distributed random variable (the performance). We pick the hyperparameters for the next experiments by optimizing the expected improvement (EI) over the current best result or the GP upper confidence bound (UCB).

Bayesian optimization aims to minimize a function $f(x)$ in a bounded set $\mathcal{X}$ ( a subset of $R^D$). It does so by building a probabilistic model for $f(x)$, which it uses to make decisions about where in X will be the next evaluation of the function. It actually uses all previously sampled $f(x)$, not simply a local gradient. It allows to evaluate $f(x)$ less times, but requires some computation between runs, so it’s convenient for machine learning, where the computations of $f(x)$ are slow.

The Gaussian process is a prior distribution on funtions of the form $f : \mathcal(X) \rightarrow \mathbb{R}$.

Sources