Author of the publication

Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks.

, , , , , , , , , and . NSDI, page 1027-1040. USENIX Association, (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks., , , , , , , , , and . NSDI, page 1027-1040. USENIX Association, (2022)A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis., , , , and . ISAV@SC, page 27-31. ACM, (2017)Harmonic CUDA: Asynchronous Programming on GPUs., , , and . PMAM@PPoPP, page 39-49. ACM, (2023)Regent: a high-productivity programming language for HPC with logical regions., , , , and . SC, page 81:1-81:12. ACM, (2015)Structure Slicing: Extending Logical Regions with Fields., , , and . SC, page 845-856. IEEE Computer Society, (2014)Realm: performance portability through composable asynchrony.. Stanford University, USA, (2016)Legion: expressing locality and independence with logical regions., , , and . SC, page 66. IEEE/ACM, (2012)Towards Asynchronous Many-Task in Situ Data Analysis Using Legion., , , , , , , and . IPDPS Workshops, page 1033-1037. IEEE Computer Society, (2016)Isometry: A Path-Based Distributed Data Transfer System., , , , and . ICS, page 295-306. ACM, (2018)Singe: leveraging warp specialization for high performance on GPUs., , and . PPoPP, page 119-130. ACM, (2014)