Author of the publication

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.

, , , , , and . ICLR (Poster), OpenReview.net, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods., , , , , and . ICLR (Poster), OpenReview.net, (2019)Robust Reinforcement Learning from Corrupted Human Feedback., , , , , and . CoRR, (2024)A Biased Graph Neural Network Sampler with Near-Optimal Regret., , , and . NeurIPS, page 8833-8844. (2021)Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer., , , , and . EMNLP (Findings), page 2775-2786. Association for Computational Linguistics, (2023)AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods., , , , , and . CoRR, (2018)Less is More: Task-aware Layer-wise Distillation for Language Model Compression., , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 20852-20867. PMLR, (2023)GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM., , , , , , and . CoRR, (2024)MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation., , , , , and . NAACL-HLT, page 1610-1623. Association for Computational Linguistics, (2022)PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance., , , , , , and . ICML, volume 162 of Proceedings of Machine Learning Research, page 26809-26823. PMLR, (2022)Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning., , , , , , and . ICLR, OpenReview.net, (2023)