Author of the publication

A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU.

, , , , and . IEICE Trans. Electron., 105-C (6): 222-231 (2022)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A traffic-aware memory-cube network using bypassing., , , , , , and . Microprocess. Microsystems, (April 2022)Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization., , , and . IEICE Trans. Electron., 104-C (6): 257-260 (2021)Performance balancing: software-based on-chip memory management for effective CMP executions., , , and . MEDEA@PACT, page 28-34. ACM, (2009)Efficient Collision-Free MTTKRP Algorithm for Multi-core CPUs with Less Memory Usage., and . CCGRID, page 534-543. IEEE, (2022)Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation., , and . IPDPS Workshops, page 1014-1017. IEEE, (2022)mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems., , , , , , , and . CoRR, (2022)The 16, 384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer., , , , , , , , , and 2 other author(s). HiPC, page 152-161. IEEE, (2021)Low-Latency Low-Energy Memory-Cube Networks using Dual-Voltage Datapaths., , , , , , and . PDP, page 143-147. IEEE, (2021)mpiQulacs: A Scalable Distributed Quantum Computer Simulator for ARM-based Clusters., , , , , , , and . QCE, page 959-969. IEEE, (2023)Introducing software pipelining for the A64FX processor into LLVM., , and . HPC Asia Workshops, page 1-6. ACM, (2024)