Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Recurrent Memory Transformer

A. Bulatov, Y. Kuratov, und M. Burtsev. (2022)cite arxiv:2207.06881Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Zusammenfassung

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.

Beschreibung

Recurrent Memory Transformer

Links und Ressourcen

BibTeX-Schlüssel: bulatov2022recurrent
Eintragstyp: misc
Jahr: 2022
URL: http://arxiv.org/abs/2207.06881
Hinweis: cite arxiv:2207.06881Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

@wanderinglogics Tags hervorgehoben

Zitieren Sie diese Publikation

@misc{bulatov2022recurrent, abstract = {Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.}, added-at = {2023-07-07T20:17:18.000+0200}, author = {Bulatov, Aydar and Kuratov, Yuri and Burtsev, Mikhail S.}, biburl = {https://www.bibsonomy.org/bibtex/2fd94ed3c0de1349a118f0be4e2475d3d/wanderinglogic}, description = {Recurrent Memory Transformer}, interhash = {cc900dd752b0c1d1ed60f7b7e9daa24f}, intrahash = {fd94ed3c0de1349a118f0be4e2475d3d}, keywords = {rnn transformers}, note = {cite arxiv:2207.06881Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)}, timestamp = {2023-07-07T20:17:18.000+0200}, title = {Recurrent Memory Transformer}, url = {http://arxiv.org/abs/2207.06881}, year = 2022 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Recurrent Memory Transformer

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Recurrent Memory Transformer

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Recurrent Memory Transformer

Kommentare und Rezensionen
(0)