Author of the publication

Low-Order Finite Element Solver with Small Matrix-Matrix Multiplication Accelerated by AI-Specific Hardware for Crustal Deformation Computation.

, , , , , , , , , and . PASC, page 16:1-16:11. ACM, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks., , , , , and . CVPR, page 12359-12367. Computer Vision Foundation / IEEE, (2019)GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC., , , , , and . WACCPD@SC, volume 12017 of Lecture Notes in Computer Science, page 3-24. Springer, (2019)A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements., , , , , , , , , and 2 other author(s). IPDPS, page 620-629. IEEE Computer Society, (2018)Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method., , , , , and . ICPP Workshops, page 21:1-21:8. ACM, (2019)Scalable and Practical Natural Gradient for Large-Scale Deep Learning., , , , , and . CoRR, (2020)Interference-aware Incoming Message Detection for MPI Threaded Progression., , and . CCGRID, page 184-185. IEEE Computer Society, (2013)Speeding Up Kernel Scheduler by Reducing Cache Misses., , , , , and . USENIX ATC, FREENIX Track, page 275-285. USENIX, (2002)Preliminary Performance Evaluation of Grace-Hopper GH200., , , , , , , , , and 4 other author(s). CLUSTER Workshops, page 184-185. IEEE, (2024)Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods., , , and . SC, page 76:1-76:13. ACM, (2023)CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs., , , , , and . ICDE, page 4236-4247. IEEE, (2024)