In a partial monitoring game, the learner repeatedly chooses an action, the
environment responds with an outcome, and then the learner suffers a loss and
receives a feedback signal, both of which are fixed functions of the action and
the outcome. The goal of the learner is to minimize his regret, which is the
difference between his total cumulative loss and the total loss of the best
fixed action in hindsight.
In this paper we characterize the minimax regret of any
partial monitoring game with finitely many actions and
outcomes. It turns out that the minimax regret of any such game is either zero,
Theta~(T^1/2), Theta(T^2/3), or Theta(T). We provide computationally efficient learning
algorithms that achieve the minimax regret within logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.
%0 Journal Article
%1 BaFoPaRaSze14
%A Bartók, G.
%A Foster, D.
%A Pál, D.
%A Rakhlin, A.
%A Szepesvári, Cs.
%D 2014
%J Mathematics of Operations Research
%K adversarial information, learning, monitoring online partial setting, stochastic theory,
%P 967--997
%T Partial monitoring -- classification, regret bounds, and algorithms
%V 39
%X In a partial monitoring game, the learner repeatedly chooses an action, the
environment responds with an outcome, and then the learner suffers a loss and
receives a feedback signal, both of which are fixed functions of the action and
the outcome. The goal of the learner is to minimize his regret, which is the
difference between his total cumulative loss and the total loss of the best
fixed action in hindsight.
In this paper we characterize the minimax regret of any
partial monitoring game with finitely many actions and
outcomes. It turns out that the minimax regret of any such game is either zero,
Theta~(T^1/2), Theta(T^2/3), or Theta(T). We provide computationally efficient learning
algorithms that achieve the minimax regret within logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.
@article{BaFoPaRaSze14,
abstract = { In a partial monitoring game, the learner repeatedly chooses an action, the
environment responds with an outcome, and then the learner suffers a loss and
receives a feedback signal, both of which are fixed functions of the action and
the outcome. The goal of the learner is to minimize his regret, which is the
difference between his total cumulative loss and the total loss of the best
fixed action in hindsight.
In this paper we characterize the minimax regret of any
partial monitoring game with finitely many actions and
outcomes. It turns out that the minimax regret of any such game is either zero,
Theta~(T^{1/2}), Theta(T^{2/3}), or Theta(T). We provide computationally efficient learning
algorithms that achieve the minimax regret within logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.},
added-at = {2020-03-17T03:03:01.000+0100},
author = {Bart{\'o}k, G. and Foster, D. and P{\'a}l, D. and Rakhlin, A. and {Sz}epesv{\'a}ri, {Cs}.},
bdsk-url-1 = {http://dx.doi.org/10.1016/j.tcs.2012.10.008},
biburl = {https://www.bibsonomy.org/bibtex/291dc1728774f9c611cce306136227583/csaba},
date-added = {2014-05-16 22:39:50 -0700},
date-modified = {2014-12-06 19:49:48 +0000},
interhash = {82a11faba42a2f73d757a9ac733f78a3},
intrahash = {91dc1728774f9c611cce306136227583},
journal = {Mathematics of Operations Research},
keywords = {adversarial information, learning, monitoring online partial setting, stochastic theory,},
pages = {967--997},
pdf = {papers/partial_monitoring-mor.pdf},
timestamp = {2020-03-17T03:03:01.000+0100},
title = {Partial monitoring -- classification, regret bounds, and algorithms},
volume = 39,
year = 2014
}