Author of the publication

Stabilizing Transformer Training by Preventing Attention Entropy Collapse.

, , , , , , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 40770-40803. PMLR, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Detecting Hallucinated Content in Conditional Neural Sequence Generation., , , , , and . CoRR, (2020)Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders., , , , , and . EACL, page 1613-1624. Association for Computational Linguistics, (2021)Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation., , , , , and . COLING, page 3520-3533. International Committee on Computational Linguistics, (2020)Universal Neural Machine Translation for Extremely Low Resource Languages., , , and . NAACL-HLT, page 344-354. Association for Computational Linguistics, (2018)VizSeq: a visual analysis toolkit for text generation tasks., , , and . EMNLP/IJCNLP (3), page 253-258. Association for Computational Linguistics, (2019)Detection, Disambiguation, Re-ranking: Autoregressive Entity Linking as a Multi-Task Problem., , , , , and . ACL (Findings), page 1972-1983. Association for Computational Linguistics, (2022)Addressing Posterior Collapse with Mutual Information for Improved Variational Neural Machine Translation., , , and . ACL, page 8512-8525. Association for Computational Linguistics, (2020)Detecting Hallucinated Content in Conditional Neural Sequence Generation., , , , , , and . ACL/IJCNLP (Findings), volume ACL/IJCNLP 2021 of Findings of ACL, page 1393-1404. Association for Computational Linguistics, (2021)Deep Learning Model to Estimate Air Pollution Using M-BP to Fill in Missing Proxy Urban Data., , , and . GLOBECOM, page 1-6. IEEE, (2017)Intelligent Time-Adaptive Transient Stability Assessment System., , , , and . CoRR, (2016)