Author of the publication

Exploiting the capabilities of modern GPUs for dense matrix computations.

, , , , , and . Concurr. Comput. Pract. Exp., 21 (18): 2457-2477 (2009)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A power measurement environment for PCIe accelerators., , , , and . Comput. Sci. Res. Dev., 30 (2): 115-124 (2015)Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra., , , and . CoRR, (2015)Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors., , , , , , and . MARC Symposium, page 103-106. KIT Scientific Publishing, Karlsruhe, (2011)Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices., , , , and . IEEE Trans. Circuits Syst. I Regul. Pap., 65-I (5): 1614-1627 (2018)Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors., , , , and . J. Supercomput., 77 (10): 11257-11269 (2021)A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures., , , , , , , , , and 2 other author(s). IWOMP, volume 5568 of Lecture Notes in Computer Science, page 154-167. Springer, (2009)Runtime Scheduling of the LU Factorization: Performance and Energy., , , , and . EE-LSDS, volume 8046 of Lecture Notes in Computer Science, page 153-167. Springer, (2013)Automatic generation of ARM NEON micro-kernels for matrix multiplication., , , , , , and . J. Supercomput., 80 (10): 13873-13899 (July 2024)Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors., , , and . IPDPS Workshops, page 679-687. IEEE, (2021)Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures., , , and . PPAM (1), volume 6067 of Lecture Notes in Computer Science, page 387-395. Springer, (2009)