Abstract
One approach to deal with the statistical inefficiency of neural networks is
to rely on auxiliary losses that help to build useful representations. However,
it is not always trivial to know if an auxiliary task will be helpful for the
main task and when it could start hurting. We propose to use the cosine
similarity between gradients of tasks as an adaptive weight to detect when an
auxiliary loss is helpful to the main loss. We show that our approach is
guaranteed to converge to critical points of the main task and demonstrate the
practical usefulness of the proposed algorithm in a few domains: multi-task
supervised learning on subsets of ImageNet, reinforcement learning on
gridworld, and reinforcement learning on Atari games.
Users
Please
log in to take part in the discussion (add own reviews or comments).