Author of the publication

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration.

, , , , , , , , , and . MLSys, mlsys.org, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Offsite-Tuning: Transfer Learning without Full Model., , and . CoRR, (2023)SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models., , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 38087-38099. PMLR, (2023)Efficient Streaming Language Models with Attention Sinks, , , , and . (2024)ReFresh: Reducing Memory Access from Exploiting Stable Historical Embeddings for Graph Neural Network Training., , , , , , , , , and . CoRR, (2023)AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration., , , , , , , , , and . MLSys, mlsys.org, (2024)BitDelta: Your Fine-Tune May Only Be Worth One Bit, , , , , , and . (2024)Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks., , , , , , , , and . Mach. Intell. Res., 20 (2): 180-193 (April 2023)Sparse and Local Networks for Hypergraph Reasoning., , , and . LoG, volume 198 of Proceedings of Machine Learning Research, page 34. PMLR, (2022)Retrieval Head Mechanistically Explains Long-Context Factuality., , , , and . CoRR, (2024)QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference., , , , , and . ICML, OpenReview.net, (2024)