Model selection

All models are wrong; some are useful. Abraham Lincoln

Model selection is the problem of choosing the best model among a set of competing models. There are several dimensions that can help us define what best means, such as:

Usually we require a trade-off between different terms. That is because different approaches take emphasize different dimensions: cross-validation, for instance, focuses on future predictive ability. For example, we typically balance goodness of fit (how well the model describes the data) and parsimony (to avoid over-fitting). It can be interpreted as the trade-off between the bias introduced by a model too small and the variance that comes by a model too large. It’s worth mentioning that the true model is unbiased and only has the necessary variance.