Author of the publication

Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.

, , , and . VECPAR, volume 8969 of Lecture Notes in Computer Science, page 17-30. Springer, (2014)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Sampling algorithms to update truncated SVD., , and . IEEE BigData, page 817-826. IEEE Computer Society, (2017)Mixed-precision block gram Schmidt orthogonalization., , , , and . ScalA@SC, page 2:1-2:8. ACM, (2015)Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators., , and . HPEC, page 1-6. IEEE, (2019)Optimizing Krylov Subspace Solvers on Graphics Processing Units., , , , , and . IPDPS Workshops, page 941-949. IEEE Computer Society, (2014)Virtual Systolic Array for QR Decomposition., , , , and . IPDPS, page 251-260. IEEE Computer Society, (2013)PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP., , , , , , , , , and 5 other author(s). ACM Trans. Math. Softw., 45 (2): 16:1-16:35 (2019)Solving dense symmetric indefinite systems using GPUs., , , , and . Concurr. Comput. Pract. Exp., (2017)Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs., , , , , and . SC, page 60:1-60:11. ACM, (2015)Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime., , , and . IPDPS Workshops, page 1495-1504. IEEE Computer Society, (2014)Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors., , , , , , and . ScalA@SC, page 61-68. IEEE Computer Society, (2014)