S. Arora, A. May, J. Zhang, and C. Ré. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, page 2650--2663. Online, Association for Computational Linguistics, (July 2020)
S. Merity. (2019)cite arxiv:1911.11423Comment: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes.