Author of the publication

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.

, , , , , , , and . PACT, page 30-44. IEEE, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning., , and . MICRO, page 622-636. IEEE, (2020)Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport., , , , , , , and . Hot Interconnects, page 33-42. IEEE, (2020)GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM., , , , , , and . CoRR, (2024)Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference., , , , and . CoRR, (2024)Training Recipe for N: M Structured Sparsity with Decaying Pruning Mask., , , , , and . CoRR, (2022)Efficient Distributed Inference of Deep Neural Networks via Restructuring and Pruning., , , and . AAAI, page 6640-6648. AAAI Press, (2023)STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators., , , and . IISWC, page 201-213. IEEE, (2021)Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing., , , , , and . ASPLOS (3), page 252-265. ACM, (2023)FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks., , , , and . ASPLOS (2), page 295-310. ACM, (2023)Optimizing the data placement and transformation for multi-bank CGRA computing system., , , , , and . DATE, page 1087-1092. IEEE, (2018)