Author of the publication

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods., , , , , and . ICLR (Poster), OpenReview.net, (2019)Robust Reinforcement Learning from Corrupted Human Feedback., , , , , and . CoRR, (2024)Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs., , , , , , and . ICLR, OpenReview.net, (2024)A Biased Graph Neural Network Sampler with Near-Optimal Regret., , , and . NeurIPS, page 8833-8844. (2021)Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer., , , , and . EMNLP (Findings), page 2775-2786. Association for Computational Linguistics, (2023)AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods., , , , , and . CoRR, (2018)Less is More: Task-aware Layer-wise Distillation for Language Model Compression., , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 20852-20867. PMLR, (2023)Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs., , , , , , and . CoRR, (2023)PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance., , , , , , and . ICML, volume 162 of Proceedings of Machine Learning Research, page 26809-26823. PMLR, (2022)LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation., , , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 20336-20350. PMLR, (2023)