Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.

Z. Li, E. Wallace, S. Shen, K. Lin, K. Keutzer, D. Klein, and J. Gonzalez. ICML, volume 119 of Proceedings of Machine Learning Research, page 5958-5968. PMLR, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Shuhong Sheng

Xuejun Sheng

Gang Sheng

Xiaoqin Sheng

Xiaowu Sheng

Other publications of authors with the same name

Voice localization using nearby wall reflections.S. Shen, D. Chen, Y. Wei, Z. Yang, and R. Choudhury. MobiCom, page 7:1-7:14. ACM, (2020)Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of TransformersZ. Li, E. Wallace, S. Shen, K. Lin, K. Keutzer, D. Klein, and J. Gonzalez. (2020)cite arxiv:2002.11794.Multitask Prompted Training Enables Zero-Shot Task GeneralizationV. Sanh, A. Webson, C. Raffel, S. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, A. Raja, M. Dey and 30 other author(s). International Conference on Learning Representations, (2022)HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption.B. Zhai, S. Yang, X. Zhao, C. Xu, S. Shen, D. Zhao, K. Keutzer, M. Li, T. Yan, and X. Fan. CoRR, (2023)Virtual stereo content rendering technology review for light-field display.S. Shen, S. Xing, X. Sang, B. Yan, and Y. Chen. Displays, (January 2023)RAFT: Adapting Language Model to Domain Specific RAG.T. Zhang, S. Patil, N. Jain, S. Shen, M. Zaharia, I. Stoica, and J. Gonzalez. CoRR, (2024)MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation.H. Zhou, R. Zhang, X. He, N. Li, Y. Wang, and S. Shen. Sensors, 23 (6): 2922 (March 2023)Discovering Non-monotonic Autoregressive Orderings with Variational Inference.X. Li, B. Trabucco, D. Park, M. Luo, S. Shen, T. Darrell, and Y. Gao. ICLR, OpenReview.net, (2021)What's Hidden in a One-layer Randomly Weighted Transformer?S. Shen, Z. Yao, D. Kiela, K. Keutzer, and M. Mahoney. EMNLP (1), page 2914-2921. Association for Computational Linguistics, (2021)Poisoning Language Models During Instruction Tuning.A. Wan, E. Wallace, S. Shen, and D. Klein. ICML, volume 202 of Proceedings of Machine Learning Research, page 35413-35425. PMLR, (2023)

BibSonomy

Disambiguation of "Shen, Sheng"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.

Please choose a person to relate this publication to

Shuhong Sheng

Xuejun Sheng

Gang Sheng

Xiaoqin Sheng

Xiaowu Sheng

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Shen, Sheng"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.

Please choose a person to relate this publication to

Shuhong Sheng

Xuejun Sheng

Gang Sheng

Xiaoqin Sheng

Xiaowu Sheng

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.