Author of the publication

Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs.

, , , , , and . CLUSTER, page 152-160. IEEE, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization., , , and . HPEC, page 1-7. IEEE, (2018)Towards Achieving Performance Portability Using Directives for Accelerators., , , , , , and . WACCPD@SC, page 13-24. IEEE Computer Society, (2016)Performance, Design, and Autotuning of Batched GEMM for GPUs., , , and . ISC, volume 9697 of Lecture Notes in Computer Science, page 21-38. Springer, (2016)Efficient implementation of quantum materials simulations on distributed CPU-GPU systems., , , , , and . SC, page 10:1-10:12. ACM, (2015)Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster., , , and . IPDPS Workshops, page 1070-1079. IEEE, (2013)The Impact of Multicore on Math Software., , , , , and . PARA, volume 4699 of Lecture Notes in Computer Science, page 1-10. Springer, (2006)Autotuning GEMM Kernels for the Fermi GPU., , and . IEEE Trans. Parallel Distributed Syst., 23 (11): 2045-2057 (2012)Scalability Issues in FFT Computation., , , and . PaCT, volume 12942 of Lecture Notes in Computer Science, page 279-287. Springer, (2021)Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures., , , and . ICCS, volume 108 of Procedia Computer Science, page 606-615. Elsevier, (2017)Explicit and Averaging A Posteriori Error Estimates for Adaptive Finite Volume Methods., , and . SIAM J. Numer. Anal., 42 (6): 2496-2521 (2005)