,

Non-trivial two-armed partial-monitoring games are bandits

, , и .
CoRR, (2011)

Аннотация

We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is Theta(T^1/2).

тэги

Пользователи данного ресурса

  • @csaba

Комментарии и рецензии