Author of the publication

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.

, , , , , and . CoRR, (2024)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers., , , , , and . CoRR, (2024)tcFFT: Accelerating Half-Precision FFT through Tensor Cores., , and . CoRR, (2021)HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion., , , , , and . IEEE Trans. Image Process., (2020)FTL: A Universal Framework for Training Low-Bit DNNs via Feature Transfer., , , , , , and . ECCV (25), volume 12370 of Lecture Notes in Computer Science, page 700-716. Springer, (2020)FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours., , , , , , and . CoRR, (2022)CUBE - Towards an Optimal Scaling of Cosmological N-body Simulations., , , , , and . CCGRID, page 685-690. IEEE, (2020)FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters., , , , , , , , and . PPoPP, page 417-430. ACM, (2024)HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices., , , , , and . CoRR, (2024)Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference., , , , , , and . PPoPP, page 42-54. ACM, (2024)ATP: Adaptive Tensor Parallelism for Foundation Models., , , and . CoRR, (2023)