Author of the publication

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.

, , , , , and . J. Parallel Distributed Comput., 72 (3): 338-352 (2012)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-core Computations., , , and . IOPADS, page 79-92. ACM, (1997)Optimization by neural networks., and . ICNN, page 325-332. IEEE, (1988)Exploiting shared scratch pad memory space in embedded multiprocessor systems., , and . DAC, page 219-224. ACM, (2002)Reducing code size through address register assignment., , , and . ACM Trans. Embed. Comput. Syst., 5 (1): 225-258 (2006)Memory-Constrained Communication Minimization for a Class of Array Computations., , , , and . LCPC, volume 2481 of Lecture Notes in Computer Science, page 1-15. Springer, (2002)Locality Optimization Algorithms for Compilation of Out-of-Core Codes., , , and . J. Inf. Sci. Eng., 14 (1): 107-138 (1998)Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions., , , , , and . J. Parallel Distributed Comput., 72 (3): 338-352 (2012)Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver., , , , , , and . J. Parallel Distributed Comput., 66 (5): 659-673 (2006)Static and Dynamic Locality Optimizations Using Integer Linear Programming., , , , and . IEEE Trans. Parallel Distributed Syst., 12 (9): 922-941 (2001)Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines., , and . J. Parallel Distributed Comput., 60 (8): 924-965 (2000)