A. Badanidiyuru, J. Langford, and A. Slivkins. (2014)cite arxiv:1402.6779Comment: This is the full version of a paper in COLT 2014. Version history: (V2) Added some details to one of the proofs, (v3) a big revision following comments from COLT reviewers (but no new results).
Abstract
We study contextual bandits with ancillary constraints on resources, which
are common in real-world applications such as choosing ads or dynamic pricing
of items. We design the first algorithm for solving these problems that
improves over a trivial reduction to the non-contextual case. We consider very
general settings for both contextual bandits (arbitrary policy sets, e.g. Dudik
et al. (UAI'11)) and bandits with resource constraints (bandits with knapsacks,
Badanidiyuru et al. (FOCS'13)), and prove a regret guarantee with near-optimal
statistical properties.
cite arxiv:1402.6779Comment: This is the full version of a paper in COLT 2014. Version history: (V2) Added some details to one of the proofs, (v3) a big revision following comments from COLT reviewers (but no new results)
%0 Generic
%1 badanidiyuru2014resourceful
%A Badanidiyuru, Ashwinkumar
%A Langford, John
%A Slivkins, Aleksandrs
%D 2014
%K MAB-
%T Resourceful Contextual Bandits
%U http://arxiv.org/abs/1402.6779
%X We study contextual bandits with ancillary constraints on resources, which
are common in real-world applications such as choosing ads or dynamic pricing
of items. We design the first algorithm for solving these problems that
improves over a trivial reduction to the non-contextual case. We consider very
general settings for both contextual bandits (arbitrary policy sets, e.g. Dudik
et al. (UAI'11)) and bandits with resource constraints (bandits with knapsacks,
Badanidiyuru et al. (FOCS'13)), and prove a regret guarantee with near-optimal
statistical properties.
@misc{badanidiyuru2014resourceful,
abstract = {We study contextual bandits with ancillary constraints on resources, which
are common in real-world applications such as choosing ads or dynamic pricing
of items. We design the first algorithm for solving these problems that
improves over a trivial reduction to the non-contextual case. We consider very
general settings for both contextual bandits (arbitrary policy sets, e.g. Dudik
et al. (UAI'11)) and bandits with resource constraints (bandits with knapsacks,
Badanidiyuru et al. (FOCS'13)), and prove a regret guarantee with near-optimal
statistical properties.},
added-at = {2014-06-23T08:00:45.000+0200},
author = {Badanidiyuru, Ashwinkumar and Langford, John and Slivkins, Aleksandrs},
biburl = {https://www.bibsonomy.org/bibtex/2d9d616261095e9be52c5e2f29cd00932/sidyr},
description = {Resourceful Contextual Bandits},
interhash = {db35ab6d1d492b692899e440425fa2e7},
intrahash = {d9d616261095e9be52c5e2f29cd00932},
keywords = {MAB-},
note = {cite arxiv:1402.6779Comment: This is the full version of a paper in COLT 2014. Version history: (V2) Added some details to one of the proofs, (v3) a big revision following comments from COLT reviewers (but no new results)},
timestamp = {2014-06-23T08:00:45.000+0200},
title = {Resourceful Contextual Bandits},
url = {http://arxiv.org/abs/1402.6779},
year = 2014
}