Abstract
Recent work on adversarial attack has shown that Projected Gradient Descent
(PGD) Adversary is a universal first-order adversary, and the classifier
adversarially trained by PGD is robust against a wide range of first-order
attacks. It is worth noting that the original objective of an attack/defense
model relies on a data distribution $p(x)$, typically in the form of
risk maximization/minimization, e.g.,
$\max/\minE_p(\mathbf(x))L(x)$ with
$p(x)$ some unknown data distribution and $L(\cdot)$ a loss
function. However, since PGD generates attack samples independently for each
data sample based on $L(\cdot)$, the procedure does not necessarily
lead to good generalization in terms of risk optimization. In this paper, we
achieve the goal by proposing distributionally adversarial attack (DAA), a
framework to solve an optimal adversarial-data distribution, a perturbed
distribution that satisfies the $L_ınfty$ constraint but deviates from the
original data distribution to increase the generalization risk maximally.
Algorithmically, DAA performs optimization on the space of potential data
distributions, which introduces direct dependency between all data points when
generating adversarial samples. DAA is evaluated by attacking state-of-the-art
defense models, including the adversarially-trained models provided by MIT
MadryLab. Notably, DAA ranks the first place on MadryLab's white-box
leaderboards, reducing the accuracy of their secret MNIST model to $88.79\%$
(with $l_ınfty$ perturbations of $= 0.3$) and the accuracy of their
secret CIFAR model to $44.71\%$ (with $l_ınfty$ perturbations of $=
8.0$). Code for the experiments is released on
https://github.com/tianzheng4/Distributionally-Adversarial-Attack.
Users
Please
log in to take part in the discussion (add own reviews or comments).