CUDA lets you work with familiar programming concepts while developing software that can run on a GP This is the first of a series of articles to introduce you to the power of CUDA -- through working code -- and to the thought process to help you map applications onto multi-threaded hardware (such as GPUs) to get big performance increases. Of course, not all problems can be mapped efficiently onto multi-threaded hardware, so part of my thought process will be to distinguish what will and what won't work, plus provide a common-sense idea of what might work "well-enough". "CUDA programming" and "GPGPU programming" are not the same (although CUDA runs on GPUs). CUDA permits working with familiar programming concepts while developing software that can run on a GPU. It also avoids the performance overhead of graphics layer APIs by compiling your software directly to the hardware (GPU assembly language, for instance), thereby providing great performance.
JStar is a new declarative programming language that supports automatic parallelization for manycore and cluster computers. This page describes our current research on JStar and gives suggestions for PhD and Masters thesis topics.
Intel® Threading Building Blocks (TBB) offers a rich and complete approach to expressing parallelism in a C++ program. It is a library that helps you take advantage of multi-core processor performance without having to be a threading expert. Threading Building Blocks is not just a threads-replacement library. It represents a higher-level, task-based parallelism that abstracts platform details and threading mechanisms for scalability and performance.
O. Tripp, G. Yorsh, J. Field, and M. Sagiv. Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, page 207--224. ACM, (2011)