Explore Exploit
Date created: 2021-10-19
When should I make a decision based on the information I have right now, and when should I seek more information?
- Do I go to my favorite restaurant or explore a new one?
- Do I call my best friend or make a new acquaintance?
One incarnation of this problem is a casino with multiple armed bandit machines. When should you continue playing the current one and when should you change to a new one?
A very simple version of solution is the Win-stay, Lose-shift heuristic. Stay as long as you keep winning, and shift when you lose.
However a British statistician, John Gittins, developed the Gittins Index which is a more sophisticated version.
Another strategy is Regret minimisation
The value to explore or exploit shifts across the lifespan
References
- Chapter 2 in Algorithms to live by