@ytyoun

The cache complexity of multithreaded cache oblivious algorithms

, and . Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, page 271--280. New York, NY, USA, ACM, (2006)
DOI: 10.1145/1148109.1148157

Abstract

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs <i>O</i>(<i>n</i><sup>3</sup>/&#8730;<i>Z</i> + (<i>Pn</i>)<sup>1/3</sup><i>n</i><sup>2</sup>) cache misses when executed by the Cilk scheduler on a machine with <i>P</i> processors, each with a cache of size <i>Z</i>, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations, which incurs <i>O</i>(<i>n</i><sup>2</sup>/<i>Z</i>+<i>n</i>+&#8730;<i>Pn</i><sup>3</sup>+<sup>&#949;</sup>) cache misses with high probability.

Links and resources

Tags

community

  • @dblp
  • @ytyoun
@ytyoun's tags highlighted