Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.

Z. Zhou, Q. Zhang, G. Lu, H. Wang, W. Zhang, and Y. Yu. ICLR (Poster), OpenReview.net, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Zhang Zhang

Meng Zhang

Methods and implementations of road-network matchingM. Zhang. TU München, (2009)

Yue Zhang

Ning Zhang

Yan Zhang

Other publications of authors with the same name

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.Z. Zhou, Q. Zhang, G. Lu, H. Wang, W. Zhang, and Y. Yu. ICLR (Poster), OpenReview.net, (2019)Robust Reinforcement Learning from Corrupted Human Feedback.A. Bukharin, I. Hong, H. Jiang, Q. Zhang, Z. Zhang, and T. Zhao. CoRR, (2024)A Biased Graph Neural Network Sampler with Near-Optimal Regret.Q. Zhang, D. Wipf, Q. Gan, and L. Song. NeurIPS, page 8833-8844. (2021)Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer.Q. Zhang, D. Ram, C. Hawkins, S. Zha, and T. Zhao. EMNLP (Findings), page 2775-2786. Association for Computational Linguistics, (2023)AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.Z. Zhou, Q. Zhang, G. Lu, H. Wang, W. Zhang, and Y. Yu. CoRR, (2018)Less is More: Task-aware Layer-wise Distillation for Language Model Compression.C. Liang, S. Zuo, Q. Zhang, P. He, W. Chen, and T. Zhao. ICML, volume 202 of Proceedings of Machine Learning Research, page 20852-20867. PMLR, (2023)GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM.H. Kang, Q. Zhang, S. Kundu, G. Jeong, Z. Liu, T. Krishna, and T. Zhao. CoRR, (2024)MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation.S. Zuo, Q. Zhang, C. Liang, P. He, T. Zhao, and W. Chen. NAACL-HLT, page 1610-1623. Association for Computational Linguistics, (2022)PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance.Q. Zhang, S. Zuo, C. Liang, A. Bukharin, P. He, W. Chen, and T. Zhao. ICML, volume 162 of Proceedings of Machine Learning Research, page 26809-26823. PMLR, (2022)Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning.Q. Zhang, M. Chen, A. Bukharin, P. He, Y. Cheng, W. Chen, and T. Zhao. ICLR, OpenReview.net, (2023)

BibSonomy

Disambiguation of "Zhang, Qingru"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.

Please choose a person to relate this publication to

Zhang Zhang

Meng Zhang

Yue Zhang

Ning Zhang

Yan Zhang

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Zhang, Qingru"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.

Please choose a person to relate this publication to

Zhang Zhang

Meng Zhang

Yue Zhang

Ning Zhang

Yan Zhang

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods.