R. Karrenberg, und S. Hack. Compiler Construction, Springer Berlin Heidelberg, (2012)
Zusammenfassung
Data-parallel languages like OpenCL and CUDA are an important
means to exploit the computational power of today's computing
devices. In this paper, we deal with two aspects of implementing
such languages on CPUs: First, we present a static analysis and
an accompanying optimization to exclude code regions from
control-flow to data-flow conversion, which is the commonly used
technique to leverage vector instruction sets. Second, we
present a novel technique to implement barrier synchronization.
We evaluate our techniques in a custom OpenCL CPU driver which
is compared to itself in different configurations and to
proprietary implementations by AMD and Intel. We achieve an
average speedup factor of 1.21 compared to na\"ıve
vectorization and additional factors of 1.15--2.09 for suited
kernels due to the optimizations enabled by our analysis. Our
best configuration achieves an average speedup factor of 2.5
against the Intel driver.
%0 Book Section
%1 Karrenberg2012-ca
%A Karrenberg, Ralf
%A Hack, Sebastian
%B Compiler Construction
%D 2012
%I Springer Berlin Heidelberg
%K All Barrier_synchronisation CPU Contrinuations Control_flow_analysis Expose OpenCL Vectorization
%P 1--20
%T Improving Performance of OpenCL on CPUs
%X Data-parallel languages like OpenCL and CUDA are an important
means to exploit the computational power of today's computing
devices. In this paper, we deal with two aspects of implementing
such languages on CPUs: First, we present a static analysis and
an accompanying optimization to exclude code regions from
control-flow to data-flow conversion, which is the commonly used
technique to leverage vector instruction sets. Second, we
present a novel technique to implement barrier synchronization.
We evaluate our techniques in a custom OpenCL CPU driver which
is compared to itself in different configurations and to
proprietary implementations by AMD and Intel. We achieve an
average speedup factor of 1.21 compared to na\"ıve
vectorization and additional factors of 1.15--2.09 for suited
kernels due to the optimizations enabled by our analysis. Our
best configuration achieves an average speedup factor of 2.5
against the Intel driver.
@incollection{Karrenberg2012-ca,
abstract = {Data-parallel languages like OpenCL and CUDA are an important
means to exploit the computational power of today's computing
devices. In this paper, we deal with two aspects of implementing
such languages on CPUs: First, we present a static analysis and
an accompanying optimization to exclude code regions from
control-flow to data-flow conversion, which is the commonly used
technique to leverage vector instruction sets. Second, we
present a novel technique to implement barrier synchronization.
We evaluate our techniques in a custom OpenCL CPU driver which
is compared to itself in different configurations and to
proprietary implementations by AMD and Intel. We achieve an
average speedup factor of 1.21 compared to na{\"{\i}}ve
vectorization and additional factors of 1.15--2.09 for suited
kernels due to the optimizations enabled by our analysis. Our
best configuration achieves an average speedup factor of 2.5
against the Intel driver.},
added-at = {2015-04-10T18:02:47.000+0200},
author = {Karrenberg, Ralf and Hack, Sebastian},
biburl = {https://www.bibsonomy.org/bibtex/270be730ec1f217d3f20f46743e108820/christophv},
booktitle = {Compiler Construction},
interhash = {1571e210fa371b32ef1c19695e0ba0bf},
intrahash = {70be730ec1f217d3f20f46743e108820},
keywords = {All Barrier_synchronisation CPU Contrinuations Control_flow_analysis Expose OpenCL Vectorization},
pages = {1--20},
publisher = {Springer Berlin Heidelberg},
series = {Lecture Notes in Computer Science},
timestamp = {2016-01-04T14:22:08.000+0100},
title = {Improving Performance of {OpenCL} on {CPUs}},
year = 2012
}