Аннотация
Safe reinforcement learning is a promising path toward applying reinforcement
learning algorithms to real-world problems, where suboptimal behaviors may lead
to actual negative consequences. In this work, we focus on the setting where
unsafe states can be avoided by planning ahead a short time into the future. In
this setting, a model-based agent with a sufficiently accurate model can avoid
unsafe states. We devise a model-based algorithm that heavily penalizes unsafe
trajectories, and derive guarantees that our algorithm can avoid unsafe states
under certain assumptions. Experiments demonstrate that our algorithm can
achieve competitive rewards with fewer safety violations in several continuous
control tasks.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)