Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention.

N. Wies, Y. Levine, D. Jannai, and A. Shashua. ICML, volume 139 of Proceedings of Machine Learning Research, page 11170-11181. PMLR, (2021)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Yoav Nebat

Marc Levine

Martin Levine

David Levine

Glenn S Levine

Other publications of authors with the same name

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design.Y. Levine, D. Yakira, N. Cohen, and A. Shashua. ICLR (Poster), OpenReview.net, (2018)Limits to Depth Efficiencies of Self-Attention.Y. Levine, N. Wies, O. Sharir, H. Bata, and A. Shashua. NeurIPS, (2020)SenseBERT: Driving Some Sense into BERT.Y. Levine, B. Lenz, O. Dagan, O. Ram, D. Padnos, O. Sharir, S. Shalev-Shwartz, A. Shashua, and Y. Shoham. ACL, page 4656-4667. Association for Computational Linguistics, (2020)PMI-Masking: Principled masking of correlated spans.Y. Levine, B. Lenz, O. Lieber, O. Abend, K. Leyton-Brown, M. Tennenholtz, and Y. Shoham. ICLR, OpenReview.net, (2021)Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks.N. Wies, Y. Levine, and A. Shashua. ICLR, OpenReview.net, (2023)Benefits of Depth for Long-Term Memory of Recurrent Networks.Y. Levine, O. Sharir, and A. Shashua. ICLR (Workshop), OpenReview.net, (2018)Rationality Report Cards: Assessing the Economic Rationality of Large Language Models.N. Raman, T. Lundy, S. Amouyal, Y. Levine, K. Leyton-Brown, and M. Tennenholtz. CoRR, (2024)Bridging Many-Body Quantum Physics and Deep Learning via Tensor NetworksY. Levine, O. Sharir, N. Cohen, and A. Shashua. (2018)cite arxiv:1803.09780.Parallel Context Windows for Large Language Models.N. Ratner, Y. Levine, Y. Belinkov, O. Ram, I. Magar, O. Abend, E. Karpas, A. Shashua, K. Leyton-Brown, and Y. Shoham. ACL (1), page 6383-6402. Association for Computational Linguistics, (2023)STEER: Assessing the Economic Rationality of Large Language Models.N. Raman, T. Lundy, S. Amouyal, Y. Levine, K. Leyton-Brown, and M. Tennenholtz. ICML, OpenReview.net, (2024)

BibSonomy

Disambiguation of "Levine, Yoav"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention.

Please choose a person to relate this publication to

Yoav Nebat

Marc Levine

Martin Levine

David Levine

Glenn S Levine

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Levine, Yoav"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Which transformer architecture fits my data? A vocabulary bottleneck in self-attention.

Please choose a person to relate this publication to

Yoav Nebat

Marc Levine

Martin Levine

David Levine

Glenn S Levine

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention.