copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Adversarial Feature Desensitization

P. Bashivan, B. Richards, and I. Rish. (2020)cite arxiv:2006.04621Comment: submitted to Neurips 2020.

Abstract

Deep neural networks can now perform many tasks that were once thought to be only feasible for humans. Unfortunately, while reaching impressive performance under standard settings, such networks are known to be susceptible to adversarial attacks -- slight but carefully constructed perturbations of the inputs which drastically decrease the network performance and reduce their trustworthiness. Here we propose to improve network robustness to input perturbations via an adversarial training procedure which we call Adversarial Feature Desensitization (AFD). We augment the normal supervised training with an adversarial game between the embedding network and an additional adversarial decoder which is trained to discriminate between the clean and perturbed inputs from their high-level embeddings. Our theoretical and empirical evidence acknowledges the effectiveness of this approach in learning robust features on MNIST, CIFAR10, and CIFAR100 datasets -- substantially improving the state-of-the-art in robust classification against previously observed adversarial attacks. More importantly, we demonstrate that AFD has better generalization ability than previous methods, as the learned features maintain their robustness against a large range of perturbations, including perturbations not seen during training. These results indicate that reducing feature sensitivity using adversarial training is a promising approach for ameliorating the problem of adversarial attacks in deep neural networks.

Description

[2006.04621] Adversarial Feature Desensitization

Links and resources

BibTeX key: bashivan2020adversarial
entry type: article
year: 2020
url: http://arxiv.org/abs/2006.04621
note: cite arxiv:2006.04621Comment: submitted to Neurips 2020

Cite this publication

@article{bashivan2020adversarial, abstract = {Deep neural networks can now perform many tasks that were once thought to be only feasible for humans. Unfortunately, while reaching impressive performance under standard settings, such networks are known to be susceptible to adversarial attacks -- slight but carefully constructed perturbations of the inputs which drastically decrease the network performance and reduce their trustworthiness. Here we propose to improve network robustness to input perturbations via an adversarial training procedure which we call Adversarial Feature Desensitization (AFD). We augment the normal supervised training with an adversarial game between the embedding network and an additional adversarial decoder which is trained to discriminate between the clean and perturbed inputs from their high-level embeddings. Our theoretical and empirical evidence acknowledges the effectiveness of this approach in learning robust features on MNIST, CIFAR10, and CIFAR100 datasets -- substantially improving the state-of-the-art in robust classification against previously observed adversarial attacks. More importantly, we demonstrate that AFD has better generalization ability than previous methods, as the learned features maintain their robustness against a large range of perturbations, including perturbations not seen during training. These results indicate that reducing feature sensitivity using adversarial training is a promising approach for ameliorating the problem of adversarial attacks in deep neural networks.}, added-at = {2020-06-09T09:35:22.000+0200}, author = {Bashivan, Pouya and Richards, Blake and Rish, Irina}, biburl = {https://www.bibsonomy.org/bibtex/227d31e16f5610552defd4562b7f0a2ff/kirk86}, description = {[2006.04621] Adversarial Feature Desensitization}, interhash = {b811240a1b54a38d28ea7564691bd217}, intrahash = {27d31e16f5610552defd4562b7f0a2ff}, keywords = {adversarial deep-learning}, note = {cite arxiv:2006.04621Comment: submitted to Neurips 2020}, timestamp = {2020-06-09T09:35:22.000+0200}, title = {Adversarial Feature Desensitization}, url = {http://arxiv.org/abs/2006.04621}, year = 2020 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Adversarial Feature Desensitization

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Adversarial Feature Desensitization

Abstract

Description

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Adversarial Feature Desensitization

Comments and Reviews
(0)