An Image is Worth 16x16 Words: Transformers for Image Recognition at
Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, и N. Houlsby. (2020)cite arxiv:2010.11929Comment: Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected).
Please choose a person to relate this publication to
To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.
You can add a new person with the name "Heigold, Georg", or you can "Heigold, Georg" with a person entry that is so far only been referred to by another name (such as a former name or an alias name).
Your choice of the person associated to the publication can be saved in our system, so that no other have to make this choice again. Do you want to save your choice?