Misc,

Single Headed Attention RNN: Stop Thinking With Your Head

S. Merity.
(2019)cite arxiv:1911.11423Comment: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes.

Abstract

The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of GPU-TPU-neuromorphic wafer scale silicon. We opt for the lazy path of old and proven techniques with a fancy crypto inspired acronym: the Single Headed Attention RNN (SHA-RNN). The author's lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTMs and get it to within a stone's throw of a stone's throw of state-of-the-art byte level language model results on enwik8. This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer. The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient. The attention mechanism is also readily extended to large contexts with minimal computation. Take that Sesame Street.

BibTeX key: merity2019single
entry type: misc
year: 2019
url: http://arxiv.org/abs/1911.11423
note: cite arxiv:1911.11423Comment: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{merity2019single, abstract = {The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of GPU-TPU-neuromorphic wafer scale silicon. We opt for the lazy path of old and proven techniques with a fancy crypto inspired acronym: the Single Headed Attention RNN (SHA-RNN). The author's lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTMs and get it to within a stone's throw of a stone's throw of state-of-the-art byte level language model results on enwik8. This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author's small studio apartment far too warm in the midst of a San Franciscan summer. The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient. The attention mechanism is also readily extended to large contexts with minimal computation. Take that Sesame Street.}, added-at = {2020-01-08T21:56:50.000+0100}, author = {Merity, Stephen}, biburl = {https://www.bibsonomy.org/bibtex/220002540cf9fdd6cc1a5716e2706385c/bechr7}, description = {[1911.11423] Single Headed Attention RNN: Stop Thinking With Your Head}, interhash = {409bd0a4ab31e2c4aedc10d89fcf53aa}, intrahash = {20002540cf9fdd6cc1a5716e2706385c}, keywords = {nlp}, note = {cite arxiv:1911.11423Comment: Addition of citations and contextual results (no attention head, single attention head, attention per layer), removal of wordpiece WikiText-103 numbers due to normalization issues, fix of SHA attention figure Q arrow, other minor fixes}, timestamp = {2020-01-08T21:56:50.000+0100}, title = {Single Headed Attention RNN: Stop Thinking With Your Head}, url = {http://arxiv.org/abs/1911.11423}, year = 2019 }

BibSonomy

Single Headed Attention RNN: Stop Thinking With Your Head

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on