Author of the publication

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors.

, , and . IPDPS Workshops, page 713-722. IEEE Computer Society, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Work Stealing in a Shared Virtual-Memory Heterogeneous Environment: A Case Study with Betweenness Centrality., , and . Conf. Computing Frontiers, page 164-173. ACM, (2017)Challenges of Programming a System with Heterogeneous Memories and Heterogeneous Processors: A Programmer's View., , and . MEMSYS, page 99-103. ACM, (2016)Software-Defined Design Space Exploration for an Efficient DNN Accelerator Architecture., , , , and . IEEE Trans. Computers, 70 (1): 45-56 (2021)DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales., , , , , , , , , and 9 other author(s). CoRR, (2023)Graph Coloring on the GPU and Some Techniques to Improve Load Imbalance., , , and . IPDPS Workshops, page 610-617. IEEE Computer Society, (2015)Accelerating Compute-Intensive Applications with GPUs and FPGAs., , , , and . SASP, page 101-107. IEEE Computer Society, (2008)AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing., , , , , , , , , and 1 other author(s). MICRO, page 922-936. IEEE, (2020)Software Assisted Hardware Cache Coherence for Heterogeneous Processors., , , and . MEMSYS, page 279-288. ACM, (2016)Synchronization Using Remote-Scope Promotion., , , , , and . ASPLOS, page 73-86. ACM, (2015)Gravel: fine-grain GPU-initiated network messages., , , , , and . SC, page 23. ACM, (2017)