Tree Transformer: Integrating Tree Structures into Self-Attention
Y. Wang, H. Lee, and Y. Chen. (2019)cite arxiv:1909.06639Comment: accepted by EMNLP 2019.
Abstract
Pre-training Transformer from large-scale raw texts and fine-tuning on the
desired task have achieved state-of-the-art results on diverse NLP tasks.
However, it is unclear what the learned attention captures. The attention
computed by attention heads seems not to match human intuitions about
hierarchical structures. This paper proposes Tree Transformer, which adds an
extra constraint to attention heads of the bidirectional Transformer encoder in
order to encourage the attention heads to follow tree structures. The tree
structures can be automatically induced from raw texts by our proposed
"Constituent Attention" module, which is simply implemented by self-attention
between two adjacent words. With the same training procedure identical to BERT,
the experiments demonstrate the effectiveness of Tree Transformer in terms of
inducing tree structures, better language modeling, and further learning more
explainable attention scores.
Description
Tree Transformer: Integrating Tree Structures into Self-Attention
%0 Generic
%1 wang2019transformer
%A Wang, Yau-Shian
%A Lee, Hung-Yi
%A Chen, Yun-Nung
%D 2019
%K transformer tree
%T Tree Transformer: Integrating Tree Structures into Self-Attention
%U http://arxiv.org/abs/1909.06639
%X Pre-training Transformer from large-scale raw texts and fine-tuning on the
desired task have achieved state-of-the-art results on diverse NLP tasks.
However, it is unclear what the learned attention captures. The attention
computed by attention heads seems not to match human intuitions about
hierarchical structures. This paper proposes Tree Transformer, which adds an
extra constraint to attention heads of the bidirectional Transformer encoder in
order to encourage the attention heads to follow tree structures. The tree
structures can be automatically induced from raw texts by our proposed
"Constituent Attention" module, which is simply implemented by self-attention
between two adjacent words. With the same training procedure identical to BERT,
the experiments demonstrate the effectiveness of Tree Transformer in terms of
inducing tree structures, better language modeling, and further learning more
explainable attention scores.
@misc{wang2019transformer,
abstract = {Pre-training Transformer from large-scale raw texts and fine-tuning on the
desired task have achieved state-of-the-art results on diverse NLP tasks.
However, it is unclear what the learned attention captures. The attention
computed by attention heads seems not to match human intuitions about
hierarchical structures. This paper proposes Tree Transformer, which adds an
extra constraint to attention heads of the bidirectional Transformer encoder in
order to encourage the attention heads to follow tree structures. The tree
structures can be automatically induced from raw texts by our proposed
"Constituent Attention" module, which is simply implemented by self-attention
between two adjacent words. With the same training procedure identical to BERT,
the experiments demonstrate the effectiveness of Tree Transformer in terms of
inducing tree structures, better language modeling, and further learning more
explainable attention scores.},
added-at = {2023-01-30T00:21:54.000+0100},
author = {Wang, Yau-Shian and Lee, Hung-Yi and Chen, Yun-Nung},
biburl = {https://www.bibsonomy.org/bibtex/292f174a1f6b4dec71c4c1c2642fe11a0/pomali},
description = {Tree Transformer: Integrating Tree Structures into Self-Attention},
interhash = {348bd33963ae3e9b08cdf99a573563c5},
intrahash = {92f174a1f6b4dec71c4c1c2642fe11a0},
keywords = {transformer tree},
note = {cite arxiv:1909.06639Comment: accepted by EMNLP 2019},
timestamp = {2023-01-30T00:21:54.000+0100},
title = {Tree Transformer: Integrating Tree Structures into Self-Attention},
url = {http://arxiv.org/abs/1909.06639},
year = 2019
}