Hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of optimizing a measure of the algorithmâ€™s performance on an independent data set. A serach consists of:
A score function (see model selection measures)
The learning algorithmâ€™s generalization performance is modeled as a sample from a Gaussian process (GP) ie where every point in some continuous input space is associated with a normally distributed random variable (the performance). We pick the hyperparameters for the next experiments by optimizing the expected improvement (EI) over the current best result or the GP upper confidence bound (UCB).
Bayesian optimization aims to minimize a function $f(x)$ in a bounded set $\mathcal{X}$ ( a subset of $R^D$). It does so by building a probabilistic model for $f(x)$, which it uses to make decisions about where in X will be the next evaluation of the function. It actually uses all previously sampled $f(x)$, not simply a local gradient. It allows to evaluate $f(x)$ less times, but requires some computation between runs, so itâ€™s convenient for machine learning, where the computations of $f(x)$ are slow.
The Gaussian process is a prior distribution on funtions of the form $f : \mathcal(X) \rightarrow \mathbb{R}$.