Machine learning pipelines often rely on optimization procedures to make
discrete decisions (e.g. sorting, picking closest neighbors, finding shortest
paths or optimal matchings). Although these discrete decisions are easily
computed in a forward manner, they cannot be used to modify model parameters
using first-order optimization techniques because they break the
back-propagation of computational graphs. In order to expand the scope of
learning problems that can be solved in an end-to-end fashion, we propose a
systematic method to transform a block that outputs an optimal discrete
decision into a differentiable operation. Our approach relies on stochastic
perturbations of these parameters, and can be used readily within existing
solvers without the need for ad hoc regularization or smoothing. These
perturbed optimizers yield solutions that are differentiable and never locally
constant. The amount of smoothness can be tuned via the chosen noise amplitude,
whose impact we analyze. The derivatives of these perturbed solvers can be
evaluated efficiently. We also show how this framework can be connected to a
family of losses developed in structured prediction, and describe how these can
be used in unsupervised and supervised learning, with theoretical guarantees.
We demonstrate the performance of our approach on several machine learning
tasks in experiments on synthetic and real data.
Description
[2002.08676] Learning with Differentiable Perturbed Optimizers
%0 Journal Article
%1 berthet2020learning
%A Berthet, Quentin
%A Blondel, Mathieu
%A Teboul, Olivier
%A Cuturi, Marco
%A Vert, Jean-Philippe
%A Bach, Francis
%D 2020
%K generalization learning optimization perturbations readings
%T Learning with Differentiable Perturbed Optimizers
%U http://arxiv.org/abs/2002.08676
%X Machine learning pipelines often rely on optimization procedures to make
discrete decisions (e.g. sorting, picking closest neighbors, finding shortest
paths or optimal matchings). Although these discrete decisions are easily
computed in a forward manner, they cannot be used to modify model parameters
using first-order optimization techniques because they break the
back-propagation of computational graphs. In order to expand the scope of
learning problems that can be solved in an end-to-end fashion, we propose a
systematic method to transform a block that outputs an optimal discrete
decision into a differentiable operation. Our approach relies on stochastic
perturbations of these parameters, and can be used readily within existing
solvers without the need for ad hoc regularization or smoothing. These
perturbed optimizers yield solutions that are differentiable and never locally
constant. The amount of smoothness can be tuned via the chosen noise amplitude,
whose impact we analyze. The derivatives of these perturbed solvers can be
evaluated efficiently. We also show how this framework can be connected to a
family of losses developed in structured prediction, and describe how these can
be used in unsupervised and supervised learning, with theoretical guarantees.
We demonstrate the performance of our approach on several machine learning
tasks in experiments on synthetic and real data.
@article{berthet2020learning,
abstract = {Machine learning pipelines often rely on optimization procedures to make
discrete decisions (e.g. sorting, picking closest neighbors, finding shortest
paths or optimal matchings). Although these discrete decisions are easily
computed in a forward manner, they cannot be used to modify model parameters
using first-order optimization techniques because they break the
back-propagation of computational graphs. In order to expand the scope of
learning problems that can be solved in an end-to-end fashion, we propose a
systematic method to transform a block that outputs an optimal discrete
decision into a differentiable operation. Our approach relies on stochastic
perturbations of these parameters, and can be used readily within existing
solvers without the need for ad hoc regularization or smoothing. These
perturbed optimizers yield solutions that are differentiable and never locally
constant. The amount of smoothness can be tuned via the chosen noise amplitude,
whose impact we analyze. The derivatives of these perturbed solvers can be
evaluated efficiently. We also show how this framework can be connected to a
family of losses developed in structured prediction, and describe how these can
be used in unsupervised and supervised learning, with theoretical guarantees.
We demonstrate the performance of our approach on several machine learning
tasks in experiments on synthetic and real data.},
added-at = {2020-02-22T03:08:22.000+0100},
author = {Berthet, Quentin and Blondel, Mathieu and Teboul, Olivier and Cuturi, Marco and Vert, Jean-Philippe and Bach, Francis},
biburl = {https://www.bibsonomy.org/bibtex/2695a0623ff2668980945add049a52111/kirk86},
description = {[2002.08676] Learning with Differentiable Perturbed Optimizers},
interhash = {0d22ec13005929961dd31854f7f67ee1},
intrahash = {695a0623ff2668980945add049a52111},
keywords = {generalization learning optimization perturbations readings},
note = {cite arxiv:2002.08676},
timestamp = {2020-02-22T03:08:22.000+0100},
title = {Learning with Differentiable Perturbed Optimizers},
url = {http://arxiv.org/abs/2002.08676},
year = 2020
}