group :: ma_ss22_ts | BibSonomy

bookmarks (hide)1
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)3
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

1SiT: Self-supervised vIsion Transformer
S. Atito, M. Awais, and J. Kittler. (2021)cite arxiv:2104.03602.
a year ago by @annakrause
show all tags
vit
selfsupervised
transformer
training
todo:read
vitselfsupervisedtransformertrainingtodo:read
copydeleteadd this publication to your clipboard
4An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly and 2 other author(s). (2020)cite arxiv:2010.11929Comment: Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected).
2 years ago by @annakrause
show all tags
vit
representationlearning
idea:big_data_geo_2
visionTransformer
vitrepresentationlearningidea:big_data_geo_2visionTransformer
copydeleteadd this publication to your clipboard
1An image is worth 16x16 words: Transformers for image recognition at scale
A. Dosovitskiy. (2020)
2 years ago by @marjaw
show all tags
vit
ts_ss22_ml
final
thema:cnn_and_attention_methods_for_audio_classification
vitts_ss22_mlfinalthema:cnn_and_attention_methods_for_audio_classification
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
⟩
⟩⟩