copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

H. Chang, Z. Yao, A. Gon, H. Yu, and A. McCallum. Findings of the Association for Computational Linguistics: ACL 2023, page 12707--12730. Toronto, Canada, Association for Computational Linguistics, (July 2023)
DOI: 10.18653/v1/2023.findings-acl.805

Abstract

Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant nowadays? In this study, we discover that the answers to both questions are no. This is because the softmax bottleneck sometimes prevents the LMs from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose several softmax alternatives by simplifying the pointer networks and accelerating the word-by-word rerankers. In GPT-2, our proposals are significantly better and more efficient than mixture of softmax, a state-of-the-art softmax alternative. In summarization experiments, without very significantly decreasing its training/testing speed, our best method based on T5-Small improves factCC score by 2 points in CNN/DM and XSUM dataset, and improves MAUVE scores by 30\% in BookSum paragraph-level dataset.

Links and resources

BibTeX key: chang-etal-2023-revisiting-architectures
entry type: inproceedings
address: Toronto, Canada
booktitle: Findings of the Association for Computational Linguistics: ACL 2023
year: 2023
month: jul
pages: 12707--12730
publisher: Association for Computational Linguistics
DOI: 10.18653/v1/2023.findings-acl.805
url: https://aclanthology.org/2023.findings-acl.805

@tobias.koopmann's tags highlighted

Cite this publication

@inproceedings{chang-etal-2023-revisiting-architectures, abstract = {Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant nowadays? In this study, we discover that the answers to both questions are no. This is because the softmax bottleneck sometimes prevents the LMs from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose several softmax alternatives by simplifying the pointer networks and accelerating the word-by-word rerankers. In GPT-2, our proposals are significantly better and more efficient than mixture of softmax, a state-of-the-art softmax alternative. In summarization experiments, without very significantly decreasing its training/testing speed, our best method based on T5-Small improves factCC score by 2 points in CNN/DM and XSUM dataset, and improves MAUVE scores by 30{\%} in BookSum paragraph-level dataset.}, added-at = {2024-04-24T09:03:55.000+0200}, address = {Toronto, Canada}, author = {Chang, Haw-Shiuan and Yao, Zonghai and Gon, Alolika and Yu, Hong and McCallum, Andrew}, biburl = {https://www.bibsonomy.org/bibtex/2646d4d300054935bfefaed9500bb52eb/tobias.koopmann}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, doi = {10.18653/v1/2023.findings-acl.805}, editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki}, interhash = {2753121cd7756ce0429b1d810a79b8a5}, intrahash = {646d4d300054935bfefaed9500bb52eb}, keywords = {nlp pointer reading}, month = jul, pages = {12707--12730}, publisher = {Association for Computational Linguistics}, timestamp = {2024-04-24T09:03:55.000+0200}, title = {Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond}, url = {https://aclanthology.org/2023.findings-acl.805}, year = 2023 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

Comments and Reviews
(0)