Author of the publication

PowerNorm: Rethinking Batch Normalization in Transformers.

, , , , and . ICML, volume 119 of Proceedings of Machine Learning Research, page 8741-8751. PMLR, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Unified Acceleration Method for Packing and Covering Problems via Diameter Reduction., , and . ICALP, volume 55 of LIPIcs, page 50:1-50:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, (2016)A Local Perspective on Community Structure in Multilayer Networks., , , and . CoRR, (2015)Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data., , and . CoRR, (2016)Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information., , and . CoRR, (2017)Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data., , and . CoRR, (2020)HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision., , , , and . ICCV, page 293-302. IEEE, (2019)On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent., , , , , , , and . CoRR, (2018)Learning differentiable solvers for systems with hard constraints., , and . CoRR, (2022)Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics., and . CoRR, (2021)Full Stack Optimization of Transformer Inference: a Survey., , , , , , , , , and 2 other author(s). CoRR, (2023)