Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.

J. Lin, J. Tang, H. Tang, S. Yang, W. Chen, W. Wang, G. Xiao, X. Dang, C. Gan, and S. Han. MLSys, mlsys.org, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Fang Xiao

Wenjiang Xiao

Chun Xiao

Meifang Xiao

Jianping Xiao

Other publications of authors with the same name

Offsite-Tuning: Transfer Learning without Full Model.G. Xiao, J. Lin, and S. Han. CoRR, (2023)SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models.G. Xiao, J. Lin, M. Seznec, H. Wu, J. Demouth, and S. Han. ICML, volume 202 of Proceedings of Machine Learning Research, page 38087-38099. PMLR, (2023)Efficient Streaming Language Models with Attention SinksG. Xiao, Y. Tian, B. Chen, S. Han, and M. Lewis. (2024)ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training.K. Huang, H. Jiang, M. Wang, G. Xiao, D. Wipf, X. Song, Q. Gan, Z. Huang, J. Zhai, and Z. Zhang. CoRR, (2023)AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.J. Lin, J. Tang, H. Tang, S. Yang, W. Chen, W. Wang, G. Xiao, X. Dang, C. Gan, and S. Han. MLSys, mlsys.org, (2024)BitDelta: Your Fine-Tune May Only Be Worth One BitJ. Liu, G. Xiao, K. Li, J. Lee, S. Han, T. Dao, and T. Cai. (2024)Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks.Z. Zhang, G. Xiao, Y. Li, T. Lv, F. Qi, Z. Liu, Y. Wang, X. Jiang, and M. Sun. Mach. Intell. Res., 20 (2): 180-193 (April 2023)Sparse and Local Networks for Hypergraph Reasoning.G. Xiao, L. Kaelbling, J. Wu, and J. Mao. LoG, volume 198 of Proceedings of Machine Learning Research, page 34. PMLR, (2022)Retrieval Head Mechanistically Explains Long-Context Factuality.W. Wu, Y. Wang, G. Xiao, H. Peng, and Y. Fu. CoRR, (2024)QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference.J. Tang, Y. Zhao, K. Zhu, G. Xiao, B. Kasikci, and S. Han. ICML, OpenReview.net, (2024)

BibSonomy

Disambiguation of "Xiao, Guangxuan"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.

Please choose a person to relate this publication to

Fang Xiao

Wenjiang Xiao

Chun Xiao

Meifang Xiao

Jianping Xiao

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Xiao, Guangxuan"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.

Please choose a person to relate this publication to

Fang Xiao

Wenjiang Xiao

Chun Xiao

Meifang Xiao

Jianping Xiao

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.