Author of the publication

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations.

, , , , , and . ASPLOS, page 153-166. ACM, (2019)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Embedded software generation from system level specification for multi-tasking embedded systems., , , and . ASP-DAC, page 145-150. ACM Press, (2005)Real-time integrated face detection and recognition on embedded GPGPUs., , , and . ESTIMedia, page 98-107. IEEE, (2014)BPNet: Branch-pruned conditional neural network for systematic time-accuracy tradeoff in DNN inference: work-in-progress., and . CODES+ISSS, page 2:1-2:2. ACM, (2019)BPNet: Branch-pruned Conditional Neural Network for Systematic Time-accuracy Tradeoff., , and . DAC, page 1-6. IEEE, (2020)Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs., , and . HotMobile, page 57-63. ACM, (2021)Versapipe: a versatile programming framework for pipelined computing on GPU., , , , , and . MICRO, page 587-599. ACM, (2017)FASOP: Fast yet Accurate Automated Search for Optimal Parallelization of Transformers on Heterogeneous GPU Clusters., , , and . HPDC, page 253-266. ACM, (2024)NNsim: fast performance estimation based on sampled simulation of GPGPU kernels for neural networks., , , and . DAC, page 176:1-176:6. ACM, (2018)GOPipe: A Granularity-Oblivious Programming Framework for Pipelined Stencil Executions on GPU., , , , and . PACT, page 43-54. ACM, (2020)Trace-driven HW/SW cosimulation using virtual synchronization technique., , and . DAC, page 345-348. ACM, (2005)