Author of the publication

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention.

, , , and . ICML, volume 139 of Proceedings of Machine Learning Research, page 11170-11181. PMLR, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design., , , and . ICLR (Poster), OpenReview.net, (2018)Limits to Depth Efficiencies of Self-Attention., , , , and . NeurIPS, (2020)SenseBERT: Driving Some Sense into BERT., , , , , , , , and . ACL, page 4656-4667. Association for Computational Linguistics, (2020)PMI-Masking: Principled masking of correlated spans., , , , , , and . ICLR, OpenReview.net, (2021)Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks., , and . ICLR, OpenReview.net, (2023)Benefits of Depth for Long-Term Memory of Recurrent Networks., , and . ICLR (Workshop), OpenReview.net, (2018)Rationality Report Cards: Assessing the Economic Rationality of Large Language Models., , , , , and . CoRR, (2024)Bridging Many-Body Quantum Physics and Deep Learning via Tensor Networks, , , and . (2018)cite arxiv:1803.09780.Parallel Context Windows for Large Language Models., , , , , , , , , and . ACL (1), page 6383-6402. Association for Computational Linguistics, (2023)STEER: Assessing the Economic Rationality of Large Language Models., , , , , and . ICML, OpenReview.net, (2024)