Artikel in einem Konferenzbericht,

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

B. Kveton, Z. Wen, A. Ashkan, und {. Szepesvári.
AISTATS, Seite 535--543. (2015)

Zusammenfassung

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / Delta) log n) and O( (K L n log n)^1/2 )$ upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and Delta is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.

BibTeX-Schlüssel: KWASz15
Eintragstyp: inproceedings
Buchtitel: AISTATS
Jahr: 2015
Seiten: 535--543
pdf: papers/AISTAT15-CombBand.pdf
date-modified: 2015-08-02 01:02:44 +0000
date-added: 2015-01-27 06:39:07 +0000

BibSonomy

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf