Inproceedings,

Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

, , , , and .
Proceedings of the 26th ACM international conference on Supercomputing, page 101--110. New York, NY, USA, ACM, (2012)
DOI: 10.1145/2304576.2304592

Abstract

Though the prime target of multicore architectures is parallel and multithreaded workloads (which favors maximum core <i>count</i>), executing sequential code fast continues to remain critical (which benefits from maximum core <i>size</i>). This poses a difficult design trade-off. <i>Core Fusion</i> is a recently-proposed reconfigurable multicore architecture that attempts to circumvent this compromise by "fusing" groups of fundamentally independent cores into larger, more aggressive processors dynamically as needed. In this way, it accommodates highly parallel, partially parallel, multiprogrammed, and sequential codes with ease.</p> <p>However, the sequential performance of the original fused configuration falls quite short of an area-equivalent, monolithic, out-of-order processor. This paper effectively eliminates the fusion deficit for sequential codes by attacking two major sources of inefficiency: collective commit and instruction steering. We demonstrate in detail that these modifications allow Core Fusion to essentially match the performance of an area-equivalent monolithic out-of-order processor. The implication is that the inclusion of wide-issue cores in future multicore designs may be unnecessary.

Tags

Users

  • @ytyoun

Comments and Reviews