Author of the publication

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models.

, , , , , and . CoRR, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization., , , , , , , and . CoRR, (2024)A Low-Power Neural Graphics System for Instant 3D Modeling and Real-Time Rendering on Mobile AR/VR Devices., , , , , , , , , and 1 other author(s). COOL CHIPS, page 1-3. IEEE, (2024)Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization., , , , , and . CoRR, (2022)A 92 fps and 2.56 mJ/Frame Computing-In-Memory-Based Human Pose Estimation Accelerator With Resource-Efficient Macro for Mobile Devices., , , , and . IEEE Trans. Circuits Syst. II Express Briefs, 71 (6): 2921-2925 (June 2024)Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models., , , , , and . CoRR, (2023)A 709.3 TOPS/W Event-Driven Smart Vision SoC with High-Linearity and Reconfigurable MRAM PIM., , , , , , and . VLSI Technology and Circuits, page 1-2. IEEE, (2023)A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision., , , , , and . CoRR, (2024)20.7 NeuGPU: A 18.5mJ/Iter Neural-Graphics Processing Unit for Instant-Modeling and Real-Time Rendering with Segmented-Hashing Architecture., , , , , , , , , and . ISSCC, page 372-374. IEEE, (2024)