Abstract
This technical report presents the application of a recurrent memory to
extend the context length of BERT, one of the most effective Transformer-based
models in natural language processing. By leveraging the Recurrent Memory
Transformer architecture, we have successfully increased the model's effective
context length to an unprecedented two million tokens, while maintaining high
memory retrieval accuracy. Our method allows for the storage and processing of
both local and global information and enables information flow between segments
of the input sequence through the use of recurrence. Our experiments
demonstrate the effectiveness of our approach, which holds significant
potential to enhance long-term dependency handling in natural language
understanding and generation tasks as well as enable large-scale context
processing for memory-intensive applications.
Users
Please
log in to take part in the discussion (add own reviews or comments).