Author of the publication

Optimizing massively parallel sparse matrix computing on ARM many-core processor.

, , , , and . Parallel Comput., (September 2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Optimizing massively parallel sparse matrix computing on ARM many-core processor., , , , and . Parallel Comput., (September 2023)SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems., , , , , , and . CoRR, (2022)MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform., , , , and . ICCD, page 366-374. IEEE, (2023)Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference., , , , , , and . PPoPP, page 42-54. ACM, (2024)A mechanism for scheduling multi robot intelligent warehouse system face with dynamic demand., , , , and . J. Intell. Manuf., 31 (2): 469-480 (2020)Full-Stack Optimizing Transformer Inference on ARM Many-Core CPU., , , , , and . IEEE Trans. Parallel Distributed Syst., 34 (7): 2221-2235 (July 2023)Optimizing small channel 3D convolution on GPU with tensor core., , , , and . Parallel Comput., (2022)Characterizing and Optimizing Transformer Inference on ARM Many-core Processor., , , , , and . ICPP, page 20:1-20:11. ACM, (2022)Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure., , , , , , and . ACM Trans. Archit. Code Optim., 20 (3): 42:1-42:21 (September 2023)Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs., , , , , and . ACM Trans. Archit. Code Optim., 20 (4): 46:1-46:22 (December 2023)