The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.
JCublas is providing Java bindings for the NVIDIA CUDA BLAS implementation, thus making the parallel processing power of modern graphics hardware available for Java programs.
Our approach is fundamentally different. It starts by observing that for current generation architectures, much of the overhead comes from Translation Look-aside Buffer (TLB) table misses. While the importance of caches is also taken into consideration, i