Underfitting is a term in machine learning and statistical inference where a model is either too simple, or too restricted to fit the data very well. An underfit model will typically show poor performance on both training and test data until either the underlying model or the hyperparamers (or Bayesian Priors) are changed.
Examples of Underfitting
Underfitting will occur if a Bayesian prior is too “certain” about what the model or answer is beforehand. To think about it another way, the modeler has already made up its mind before seeing the data.
A most extreme underfitter will literally have a Bayesian prior with 100% of the probability on a single answer, which means that all the data is discarded and no learning takes place.
In a linear regression for example, a large prior or regularization term will prevent us from finding the line that fits the data, and instead we end up with a flattened version of that line. In another case the prior could be reasonable, but the data points themselves are laid out in a parabola which makes the line a very poor model.
Underfitting can also occur if the model is not allowed to optimize properly. In this case, we have a good Bayesian posterior distribution, but we have not done a good job in finding the likely (or dense) spots of that distribution. This could happen when a poor algorithm is selected for searching the posterior space, or if it is not allowed to run long enough.
Underfitting Behavior in Humans
A person who underfits has already made up their mind on subjects and is unable to be swayed by new information. They also have trouble integrating new ideas. A chronic underfitting will have trouble adapting to new situations.
On the plus side, underfitters do not experience the same problems as overfitters. They are unlikely to blindly follow fads and are difficult to manipulate.
Examples of underfitting can be found in Episode 16 of The Local Maximum.
Symptoms of Underfitting
In machine learning, you typically have a training set that the algorithm is a allowed to see and a test set that the algorithm is not allowed to see. Sometimes a validation set is created (within the training set) to see how a partially trained model will do. That allows the algorithm to adjust hyperparameters (usually regularization terms or priors) to optimize the outcome and prevent overfitting.
If a hyperparamer (particularly a regularization term or prior) can be relaxed leading to the validation set score to improve, then the original setting was likely underfit.
If there’s some fundamental simplification made in the model that makes it really bad at predicting new data, this is also underfitting.
In gradient descent, if the training and validation scores are still both decreasing after every iteration, this typically indicates that the training is not complete, and thus if you stopped early the model would be underfit.
Why does underfitting happen?
Underfitting can happen when:
- There are no enough parameters or complexity to appropriately model the data
- The Bayesian Priors are too restrictive or certain (low entropy)
- The machine learning algorithm wasn’t given enough time to train.
There are a few ways to remedy this:
- Use a different Bayesian Prior
- Allow the algorithm to run longer
- Fundamentally rethink the model and add parameters or complexity. Adding complexity to the model is not guaranteed to work, so it’s important to think about the problem and speculate which effects the current underfit model are not accounting for, and what likely is the largest limitation.
In human underfitting, the best remedy is to go into a new environment and experience new things. This will help one think differently.