Author of the publication

PowerNorm: Rethinking Batch Normalization in Transformers.

, , , , and . ICML, volume 119 of Proceedings of Machine Learning Research, page 8741-8751. PMLR, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Inefficiency of K-FAC for Large Batch Size Training., , , , , , and . AAAI, page 5053-5060. AAAI Press, (2020)FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube., , , and . SIAM J. Sci. Comput., (2016)Adaptive Self-supervision Algorithms for Physics-informed Neural Networks., , , and . CoRR, (2022)A framework for scalable biophysics-based image analysis., , , , , and . SC, page 19. ACM, (2017)Hessian-based Analysis of Large Batch Training and Robustness to Adversaries., , , , and . NeurIPS, page 4954-4964. (2018)A Fast Post-Training Pruning Framework for Transformers., , , , , and . NeurIPS, (2022)I-BERT: Integer-only BERT Quantization., , , , and . ICML, volume 139 of Proceedings of Machine Learning Research, page 5506-5518. PMLR, (2021)LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement., , , , , , , , and . CoRR, (2024)Integrated Model, Batch, and Domain Parallelism in Training Neural Networks., , , , and . SPAA, page 77-86. ACM, (2018)PyHessian: Neural Networks Through the Lens of the Hessian., , , and . CoRR, (2019)