Overfitting is a consequence of being too sensitive to the data we have seen, thus not being able to draw conclusions about unseen data.
If we had copious data, drawn from a perfectly representative sample, completely mistake-free, and representing exactly what we’re trying to evaluate, then using the most complex model available would indeed be the best approach. But if we try to perfectly fit our model to the data when any of these factors fails to hold, we risk overfitting.
Chapter notes
- Incentive structures can induce overfitting in companies
- Cross-validation
- Use of multiple evaluation metrics can prevent overfitting
- Regularization
- Spending a shorter amount of time thinking about a problem can prevent overfitting
References
- Chapter 7 in Algorithms to live by