Author of the publication

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

, , , , , , , and . DAC, page 29:1-29:6. ACM, (2017)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

TensorIR: An Abstraction for Automatic Tensorized Program Optimization., , , , , , , , , and 1 other author(s). CoRR, (2022)Decoupled Model Schedule for Deep Learning Training., , , , , and . CoRR, (2023)AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture, , , and . arXiv preprint arXiv:1809.07683, (2018)From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining., , and . HotCloud, USENIX Association, (2018)MOCHA: Multinode Cost Optimization in Heterogeneous Clouds with Accelerators., , , , , , and . FPGA, page 273-279. ACM, (2021)TensorIR: An Abstraction for Automatic Tensorized Program Optimization., , , , , , , , , and 1 other author(s). ASPLOS (2), page 804-817. ACM, (2023)DietCode: Automatic Optimization for Dynamic Tensor Programs., , , , , , , , , and . MLSys, mlsys.org, (2022)Bring Your Own Codegen to Deep Learning Compiler., , , , , , , , and . CoRR, (2021)Efficient Memory Management for Large Language Model Serving with PagedAttention., , , , , , , , and . SOSP, page 611-626. ACM, (2023)AutoDSE: Enabling Software Programmers Design Efficient FPGA Accelerators., , , and . FPGA, page 147. ACM, (2021)