Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Landmark Attention: Random-Access Infinite Context Length for Transformers., and . CoRR, (2023)Masked Training of Neural Networks with Partial Gradients., , and . AISTATS, volume 151 of Proceedings of Machine Learning Research, page 5876-5890. PMLR, (2022)Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models., , and . CoRR, (2023)Simultaneous Training of Partially Masked Neural Networks., , and . CoRR, (2021)CoTFormer: More Tokens With Attention Make Up For Less Depth., , and . CoRR, (2023)Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates., , and . AISTATS, volume 130 of Proceedings of Machine Learning Research, page 4042-4050. PMLR, (2021)MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, , , , , , , , , and 10 other author(s). (2023)The splay-list: a distribution-adaptive concurrent skip-list., , , and . Distributed Comput., 36 (3): 395-418 (September 2023)Special Properties of Gradient Descent with Large Learning Rates., , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 25082-25104. PMLR, (2023)QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs., , , , , , , and . CoRR, (2024)