Author of the publication

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation.

, , , , , , and . ICLR, OpenReview.net, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Characterizing signal propagation to close the performance gap in unnormalized ResNets., , and . ICLR, OpenReview.net, (2021)SMASH: One-Shot Model Architecture Search through HyperNetworks., , , and . ICLR (Poster), OpenReview.net, (2018)FreezeOut: Accelerate Training by Progressively Freezing Layers., , , and . CoRR, (2017)Neural Photo Editing with Introspective Adversarial Networks., , , and . CoRR, (2016)Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error., , , , and . CoRR, (2021)Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation., , , , , , and . ICLR, OpenReview.net, (2023)Perceiver IO: A General Architecture for Structured Inputs & Outputs., , , , , , , , , and 5 other author(s). CoRR, (2021)ConvNets Match Vision Transformers at Scale., , , and . CoRR, (2023)BYOL works even without batch statistics., , , , , , , , , and 1 other author(s). CoRR, (2020)Skilful precipitation nowcasting using deep generative models of radar., , , , , , , , , and 10 other author(s). Nat., 597 (7878): 672-677 (2021)