Author of the publication

Lifelong Language Pretraining with Distribution-Specialized Experts.

, , , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 5383-5395. PMLR, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Architectural and Implementation Tradeoffs in the Design of Multiple-Context Processors., , and . Multithreaded Computer Architecture, volume 281 of The Kluwer International Series in Engineering and Computer Science, Kluwer / Springer, (1994)GDP: Generalized Device Placement for Dataflow Graphs., , , , , , , , , and 1 other author(s). CoRR, (2019)A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules., , , , , , , , and . CoRR, (2021)Chip Placement with Deep Reinforcement Learning., , , , , , , , , and 12 other author(s). CoRR, (2020)A domain-specific supercomputer for training deep neural networks., , , , , , , and . Commun. ACM, 63 (7): 67-78 (2020)Brainformers: Trading Simplicity for Efficiency., , , , , , , , , and 5 other author(s). ICML, volume 202 of Proceedings of Machine Learning Research, page 42531-42542. PMLR, (2023)Transferable Graph Optimizers for ML Compilers., , , , , , , , , and 2 other author(s). NeurIPS, (2020)System overview of the SGI Origin 200/2000 product line., and . COMPCON, page 150-156. IEEE Computer Society, (1997)The Design Process for Google's Training Chips: TPUv2 and TPUv3., , , , , , , , and . IEEE Micro, 41 (2): 56-63 (2021)The DASH Prototype: Logic Overhead and Performance., , , , , , and . IEEE Trans. Parallel Distributed Syst., 4 (1): 41-61 (1993)