copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Study of Training Self-Supervised Vision Transformers

X. Chen, S. Xie, and K. He. (2021)cite arxiv:2104.02057Comment: Camera-ready, ICCV 2021, Oral. Code: https://github.com/facebookresearch/moco-v3.

Abstract

This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Vision Transformers (ViT). While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet to be built, especially in the self-supervised scenarios where training becomes more challenging. In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT. We observe that instability is a major issue that degrades accuracy, and it can be hidden by apparently good results. We reveal that these results are indeed partial failure, and they can be improved when training is made more stable. We benchmark ViT results in MoCo v3 and several other self-supervised frameworks, with ablations in various aspects. We discuss the currently positive evidence as well as challenges and open questions. We hope that this work will provide useful data points and experience for future research.

Description

[2104.02057] An Empirical Study of Training Self-Supervised Vision Transformers

Links and resources

BibTeX key: chen2021empirical
entry type: misc
year: 2021
url: http://arxiv.org/abs/2104.02057
note: cite arxiv:2104.02057Comment: Camera-ready, ICCV 2021, Oral. Code: https://github.com/facebookresearch/moco-v3

@aerover's tags highlighted

cs.cv

Cite this publication

search on

Meta data

Last update 3 years ago
Created 3 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Study of Training Self-Supervised Vision Transformers

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML An Empirical Study of Training Self-Supervised Vision Transformers

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

An Empirical Study of Training Self-Supervised Vision Transformers

Comments and Reviews
(0)