Explore Exploit

Date created: 2021-10-19

When should I make a decision based on the information I have right now, and when should I seek more information?

  • Do I go to my favorite restaurant or explore a new one?
  • Do I call my best friend or make a new acquaintance?

One incarnation of this problem is a casino with multiple armed bandit machines. When should you continue playing the current one and when should you change to a new one?

A very simple version of solution is the Win-stay, Lose-shift heuristic. Stay as long as you keep winning, and shift when you lose.

However a British statistician, John Gittins, developed the Gittins Index which is a more sophisticated version.

Another strategy is Regret minimisation

The value to explore or exploit shifts across the lifespan