Misc,

Music Transformer

C. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. Dai, M. Hoffman, M. Dinculescu, and D. Eck.
(2018)cite arxiv:1809.04281Comment: Improved skewing section and accompanying figures. Previous titles are Än Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer".

Abstract

Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.

BibTeX key: huang2018music
entry type: misc
year: 2018
url: http://arxiv.org/abs/1809.04281
note: cite arxiv:1809.04281Comment: Improved skewing section and accompanying figures. Previous titles are Än Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{huang2018music, abstract = {Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.}, added-at = {2023-06-16T06:22:36.000+0200}, author = {Huang, Cheng-Zhi Anna and Vaswani, Ashish and Uszkoreit, Jakob and Shazeer, Noam and Simon, Ian and Hawthorne, Curtis and Dai, Andrew M. and Hoffman, Matthew D. and Dinculescu, Monica and Eck, Douglas}, biburl = {https://www.bibsonomy.org/bibtex/26a51fcac668daff84f5f92d228e30537/andolab}, description = {Music Transformer}, interhash = {3e569e94f446f357bd818b46c45c2d44}, intrahash = {6a51fcac668daff84f5f92d228e30537}, keywords = {music relative-positional-encoding self-attention transformer}, note = {cite arxiv:1809.04281Comment: Improved skewing section and accompanying figures. Previous titles are "An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation" and "Music Transformer"}, timestamp = {2023-06-16T06:22:36.000+0200}, title = {Music Transformer}, url = {http://arxiv.org/abs/1809.04281}, year = 2018 }

BibSonomy

Music Transformer

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on