From post

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

, , , , , , , , , , , и . (2020)cite arxiv:2010.11929Comment: Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected).

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

Conditional Object-Centric Learning from Video., , , , , , , , и . CoRR, (2021)Conditional Object-Centric Learning from Video., , , , , , , , и . ICLR, OpenReview.net, (2022)WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding., , , , и . IEEE Trans. Speech Audio Process., 20 (2): 551-564 (2012)A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines., , , , , и . Prague Bull. Math. Linguistics, (2017)Video OWL-ViT: Temporally-consistent open-world localization in video., , , , , , , и . ICCV, стр. 13756-13765. IEEE, (2023)Optimization Algorithms and Applications for Speech and Language Processing., , , , , и . IEEE Trans. Speech Audio Process., 21 (11): 2231-2243 (2013)Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs., , и . IEEE ACM Trans. Audio Speech Lang. Process., 21 (12): 2616-2626 (2013)Object-Centric Learning with Slot Attention., , , , , , , и . NeurIPS, (2020)ViViT: A Video Vision Transformer., , , , , и . ICCV, стр. 6816-6826. IEEE, (2021)Equivalence of Generative and Log-Linear Models., , , , и . IEEE Trans. Speech Audio Process., 19 (5): 1138-1148 (2011)