Author of the publication

Anatomy of High-Performance Many-Threaded Matrix Multiplication.

, , , , and . IPDPS, page 1049-1059. IEEE Computer Society, (2014)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

libflame., , and . Encyclopedia of Parallel Computing, Springer, (2011)Programming matrix algorithms-by-blocks for thread-level parallelism., , , , and . ACM Trans. Math. Softw., 36 (3): 14:1-14:26 (2009)Implementing High-Performance Complex Matrix Multiplication via the 1M Method.. SIAM J. Sci. Comput., 42 (5): C221-C244 (2020)Towards a Unified Implementation of GEMM in BLIS., , and . ICS, page 111-121. ACM, (2023)SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks., , , , , and . PPoPP, page 123-132. ACM, (2008)Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance., , and . ACM Trans. Math. Softw., 40 (3): 18:1-18:34 (2014)Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications., , and . PPoPP, page 153-163. ACM, (2005)Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization., , , , and . IPDPS, page 1-8. IEEE, (2008)Satisfying your dependencies with SuperMatrix., , , , and . CLUSTER, page 91-99. IEEE Computer Society, (2007)The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations., , , , , and . J. Parallel Distributed Comput., 72 (9): 1134-1143 (2012)