Inproceedings,

Combinatorial Cascading Bandits

B. Kveton, Z. Wen, A. Ashkan, and {. Szepesvári.
NIPS, page 1450--1458. (September 2015)

Abstract

We consider learning to maximize reward in combinatorial cascading bandits, a new learning setting that unifies cascading and combinatorial bandits. The unification of these frameworks presents unique challenges in the analysis but allows for modeling a rich set of partial monitoring problems, such as learning to route in a communication network to minimize the probability of losing routed packets and recommending diverse items. We propose CombCascade, a computationally-efficient UCB-like algorithm for solving our problem; and derive gap-dependent and gap-free upper bounds on its regret. Our analysis builds on recent results in stochastic combinatorial semi-bandits but also addresses two novel challenges of our learning setting, a non-linear objective and partial observability. We evaluate CombCascade on two real-world problems and demonstrate that it performs well even when our modeling assumptions are violated. We also demonstrate that our setting requires new learning algorithms.

BibTeX key: KveWeAshSze15
entry type: inproceedings
booktitle: NIPS
year: 2015
month: September
pages: 1450--1458
pdf: papers/NIPS15-CombCascadeBandit.pdf
date-modified: 2016-08-01 03:14:33 +0000
date-added: 2015-12-02 01:22:43 +0000

BibSonomy

Combinatorial Cascading Bandits

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on