K-Armed Bandit
K agents, each with an arm to pull that has a probability of reward that is unknown
Maximum likelihood Strategy
What is the maximum confidence strategy and where does it fail.
The maximum liklihood strategy will always pick the agent or arm that has been chosen the most as we have the highest confidence of the outcome.
What is the Minimum Confidence strategy. How does it fail.
The Minimum Confidence is purely exploration and will always choose the bandit that has been chosen the least.
What are the metrics for bandits?
Bad metrics
Good metrics
What is the R_Max algortihm
What is the General Rmax Algorithm
a
Hoeffding Bound
answer
Simulation Lemma
Answer
Explore or Exploit Lemma
It all transistion are either accurately estimated or unknown, the optimal policy is either near optimal or an unkown state is reached quickly.