Author of the publication

Understanding the Difficulty of Training Transformers.

, , , , and . EMNLP (1), page 5747-5763. Association for Computational Linguistics, (2020)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A Persona-Based Neural Conversation Model, , , , , and . (August 2016)Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer., , , , , , , , , and . NeurIPS, page 17084-17097. (2021)Distribution-Based Pruning of Backoff Language Models., and . ACL, page 579-588. ACL, (2000)Approximation Lasso Methods for Language Modeling., , and . ACL, The Association for Computer Linguistics, (2006)Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning., , , , and . ACL (1), page 2182-2192. Association for Computational Linguistics, (2018)A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing., , , and . ACL, The Association for Computational Linguistics, (2007)PENS: A Machine-aided English Writing System for Chinese Users., , , , and . ACL, page 529-536. ACL, (2000)Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models., , , , , , and . ACL (1), page 2079-2089. Association for Computational Linguistics, (2019)Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog., , , , , and . ACL (1), page 6463-6474. Association for Computational Linguistics, (2019)Single Character Chinese Named Entity Recognition., , , and . SIGHAN, page 125-132. (2003)