Abstract
A lot of work is spent on low-level optimization for regular computations; from instruction scheduling and cache-aware design to intensive use of SIMD instructions. Meanwhile, irregular applications, especially pointer intensive ones, are often only optimized at algorithm or compilation levels, since not so much hardware or dedicated instructions are available for this kind of code. In this paper, we investigate a low-level optimization of associative arrays intensively used in complex applications such as dynamic compilers, using self-modifying code. We propose to encode Red-Black trees, widely used to implement asssociative arrays, as specialized binary code rather than data, in order to accelerate the tree traversal by taking advantage of the underlying hardware: program cache, processor fetch and decode. We show a 45% gain on an ARM Cortex-A9 processor and that we transfer most of the data-cache pressure to the program-cache, motivating future work on dedicated hardware.
Users
Please
log in to take part in the discussion (add own reviews or comments).