Article,

Partial monitoring -- classification, regret bounds, and algorithms

G. Bartók, D. Foster, D. Pál, A. Rakhlin, and {. Szepesvári.
Mathematics of Operations Research, (2014)

Abstract

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, Theta~(T^1/2), Theta(T^2/3), or Theta(T). We provide computationally efficient learning algorithms that achieve the minimax regret within logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.

BibTeX key: BaFoPaRaSze14
entry type: article
year: 2014
journal: Mathematics of Operations Research
pages: 967--997
volume: 39
date-added: 2014-05-16 22:39:50 -0700
bdsk-url-1: http://dx.doi.org/10.1016/j.tcs.2012.10.008
pdf: papers/partial_monitoring-mor.pdf
date-modified: 2014-12-06 19:49:48 +0000

BibSonomy

Partial monitoring -- classification, regret bounds, and algorithms

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on