Abstract
FPGA-based heterogeneous architectures provide programmers with the ability
to customize their hardware accelerators for flexible acceleration of many
workloads. Nonetheless, such advantages come at the cost of sacrificing
programmability. FPGA vendors and researchers attempt to improve the
programmability through high-level synthesis (HLS) technologies that can
directly generate hardware circuits from high-level language descriptions.
However, reading through recent publications on FPGA designs using HLS, one
often gets the impression that FPGA programming is still hard in that it leaves
programmers to explore a very large design space with many possible
combinations of HLS optimization strategies.
In this paper we make two important observations and contributions. First, we
demonstrate a rather surprising result: FPGA programming can be made easy by
following a simple best-effort guideline of five refinement steps using HLS. We
show that for a broad class of accelerator benchmarks from MachSuite, the
proposed best-effort guideline improves the FPGA accelerator performance by
42-29,030x. Compared to the baseline CPU performance, the FPGA accelerator
performance is improved from an average 292.5x slowdown to an average 34.4x
speedup. Moreover, we show that the refinement steps in the best-effort
guideline, consisting of explicit data caching, customized pipelining,
processing element duplication, computation/communication overlapping and
scratchpad reorganization, correspond well to the best practice guidelines for
multicore CPU programming. Although our best-effort guideline may not always
lead to the optimal solution, it substantially simplifies the FPGA programming
effort, and will greatly support the wide adoption of FPGA-based acceleration
by the software programming community.
Description
[1807.01340] Best-Effort FPGA Programming: A Few Steps Can Go a Long Way
Links and resources
Tags