Modern graphics processing units (GPUs) contain hundreds of arithmetic units and can be harnessed to provide tremendous acceleration for many numerically intensive scientific applications. The key to effective utilization of GPUs for scientific computing