Author of the publication

Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators.

, , , , and . SC Workshops, page 1680-1687. ACM, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Translational process: Mathematical software perspective., , , and . J. Comput. Sci., (2021)A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines., , , , , , , , , and 1 other author(s). ACM Trans. Math. Softw., 47 (3): 21:1-21:23 (2021)Linear algebra software for large-scale accelerated multicore computing., , , , , , , , , and . Acta Numer., (2016)Bringing High Performance Computing to Big Data Algorithms., , , , , , and . Handbook of Big Data Technologies, Springer, (2017)A survey of recent developments in parallel implementations of Gaussian elimination., , , , , , and . Concurr. Comput. Pract. Exp., 27 (5): 1292-1309 (2015)High-performance hybrid CPU and GPU parallel algorithm for digital volume correlation., , and . Int. J. High Perform. Comput. Appl., 29 (1): 92-106 (2015)Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs., , and . Parallel Comput., (2018)Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems., , , , , , , , , and . Supercomput. Front. Innov., 2 (4): 67-86 (2015)High performance digital volume correlation. University of Illinois Urbana-Champaign, USA, (2011)Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices., , , , and . IPDPS Workshops, page 1408-1417. IEEE Computer Society, (2017)