Author of the publication

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm.

, , , , , , , , , , and . ACL (1), page 190-200. Association for Computational Linguistics, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A length adaptive algorithm-hardware co-design of transformer on FPGA through sparse attention and dynamic pipelining., , , , , , , , , and 1 other author(s). DAC, page 1135-1140. ACM, (2022)AutoReP: Automatic ReLU Replacement for Fast Private Network Inference., , , , , , , , , and 4 other author(s). ICCV, page 5155-5165. IEEE, (2023)Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning., , , , , , , and . ISQED, page 142-148. IEEE, (2021)Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks., , , , , , , , , and . ICCAD, page 1-9. IEEE, (2023)Towards Sparsification of Graph Neural Networks., , , , , , and . ICCD, page 272-279. IEEE, (2022)E.T.: re-thinking self-attention for transformer models on GPUs., , , , , , , and . SC, page 25. ACM, (2021)Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off., , , , , , and . DAC, page 1-6. IEEE, (2023)HMC-TRAN: A Tensor-core Inspired Hierarchical Model Compression for Transformer-based DNNs on GPU., , , , , , , , , and . ACM Great Lakes Symposium on VLSI, page 169-174. ACM, (2021)CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM., , , , , , , , , and 1 other author(s). ICCD, page 280-289. IEEE, (2022)Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models., , , , , and . IJCAI, page 5113-5121. ijcai.org, (2023)