Abstract
We study the problem of regret minimization in partially observable linear
quadratic control systems when the model dynamics are unknown a priori. We
propose ExpCommit, an explore-then-commit algorithm that learns the model
Markov parameters and then follows the principle of optimism in the face of
uncertainty to design a controller. We propose a novel way to decompose the
regret and provide an end-to-end sublinear regret upper bound for partially
observable linear quadratic control. Finally, we provide stability guarantees
and establish a regret upper bound of $\mathcalO(T^2/3)$ for
ExpCommit, where $T$ is the time horizon of the problem.
Users
Please
log in to take part in the discussion (add own reviews or comments).