Overfitting is a term in machine learning and statistical inference where a model was selected that fits the training data very closely but fails to generalize.
Examples of Overfitting
One example is in curve fitting. Suppose that two variables have a simple linear relationship (with some error) and data is collected to measure that relationship. A scatter plot is created, and a linear regression can be used to construct that linear relationship.
Now suppose instead, an n-degree polynomial is used. And what’s more - there are n points in the sample, and there’s no reason to suspect that they have this linear relationship! The resulting curve will wind and twist to hit every single point exactly, but it will not be useful in predicting where future points fall.
In essence, the algorithm has memorized the data without actually finding a useful model for it.
Overfitting Behavior in Humans
People can overfit as they learn about the world. A person who overfits will come to quick and dramatic conclusions from a small amount of data. A chronic overfitter can be easy to sell or manipulate, and it likely to follow fads.
Some people say that younger people can also be susceptible to overfitting, because they haven’t seen and digested much data yet.
On the plus side, overfitters do not experience the same problems as underfitters. They can be flexible in changing their mind upon receiving new data. And underfitter without recency bias (someone who collects experience at time goes on) will improve.
Examples of overfitting an underfitting can be found in Episode 16 of The Local Maximum.
Symptoms of Overfitting
In machine learning, you typically have a training set that the algorithm is a allowed to see and a test set that the algorithm is not allowed to see. If the algorithm performs significantly better on the training set than on the test set, that is a very good sign of overfitting.
Sometimes a validation set is created (within the training set) to see how a partially trained model will do. That allows the algorithm to adjust hyperparameters (usually regularization terms or priors) to optimize the outcome and prevent overfitting. Also a gradient descent algorithm on a complicated model (like a neural net) can be checked in real time against a validation set, and can be stopped early to also prevent overfitting. This strategy is not guaranteed to work, as sometimes these complex systems can have multiple local maximums.
Why does overfitting happen?
The main reason overfitting happens is that there are too many parameters (or degrees of freedom) against too little data.
In terms of Bayesian Inference, overfitting occurs when the Bayesian Prior (usually in an attempt to be neutral), overestimates the possibility of extreme results.
There are a few ways to remedy this:
- Use a different Bayesian Prior
- Gather more data
- Reduce the model complexity or the number of parameters.
In the latter case, this is equivalent to choosing a Bayesian prior that puts less weight on more complex models or mode extreme parameters.
In some models, like decision trees, there aren’t a set number of parameters - but complexity can still be trimmed by looking at the minimum leaf size for example.
In human overfitting - the best remedy is experience! Once overfitting is discovered, one can update their baseline Bayesian prior for the next big decision.