R. Wang, E. Durmus, N. Goodman, and T. Hashimoto. (2022)cite arxiv:2203.11370Comment: ICLR Oral 2022. Code: https://github.com/rosewang2008/language_modeling_via_stochastic_processes.
Abstract
Modern language models can generate high-quality short texts. However, they
often meander or are incoherent when generating longer texts. These issues
arise from the next-token-only language modeling objective. To address these
issues, we introduce Time Control (TC), a language model that implicitly plans
via a latent stochastic process. TC does this by learning a representation
which maps the dynamics of how text changes in a document to the dynamics of a
stochastic process of interest. Using this representation, the language model
can generate text by first implicitly generating a document plan via a
stochastic process, and then generating text that is consistent with this
latent plan. Compared to domain-specific methods and fine-tuning GPT2 across a
variety of text domains, TC improves performance on text infilling and
discourse coherence. On long text generation settings, TC preserves the text
structure both in terms of ordering (up to +40% better) and text length
consistency (up to +17% better). Human evaluators also prefer TC's output 28.6%
more than the baselines.
%0 Generic
%1 wang2022language
%A Wang, Rose E
%A Durmus, Esin
%A Goodman, Noah
%A Hashimoto, Tatsunori
%D 2022
%K deeplearning languagemodel nlp
%T Language modeling via stochastic processes
%U http://arxiv.org/abs/2203.11370
%X Modern language models can generate high-quality short texts. However, they
often meander or are incoherent when generating longer texts. These issues
arise from the next-token-only language modeling objective. To address these
issues, we introduce Time Control (TC), a language model that implicitly plans
via a latent stochastic process. TC does this by learning a representation
which maps the dynamics of how text changes in a document to the dynamics of a
stochastic process of interest. Using this representation, the language model
can generate text by first implicitly generating a document plan via a
stochastic process, and then generating text that is consistent with this
latent plan. Compared to domain-specific methods and fine-tuning GPT2 across a
variety of text domains, TC improves performance on text infilling and
discourse coherence. On long text generation settings, TC preserves the text
structure both in terms of ordering (up to +40% better) and text length
consistency (up to +17% better). Human evaluators also prefer TC's output 28.6%
more than the baselines.
@misc{wang2022language,
abstract = {Modern language models can generate high-quality short texts. However, they
often meander or are incoherent when generating longer texts. These issues
arise from the next-token-only language modeling objective. To address these
issues, we introduce Time Control (TC), a language model that implicitly plans
via a latent stochastic process. TC does this by learning a representation
which maps the dynamics of how text changes in a document to the dynamics of a
stochastic process of interest. Using this representation, the language model
can generate text by first implicitly generating a document plan via a
stochastic process, and then generating text that is consistent with this
latent plan. Compared to domain-specific methods and fine-tuning GPT2 across a
variety of text domains, TC improves performance on text infilling and
discourse coherence. On long text generation settings, TC preserves the text
structure both in terms of ordering (up to +40% better) and text length
consistency (up to +17% better). Human evaluators also prefer TC's output 28.6%
more than the baselines.},
added-at = {2022-05-04T14:55:17.000+0200},
author = {Wang, Rose E and Durmus, Esin and Goodman, Noah and Hashimoto, Tatsunori},
biburl = {https://www.bibsonomy.org/bibtex/2f8989757cedf55e712847394f2cf729b/albinzehe},
description = {Language modeling via stochastic processes},
interhash = {dd4140aa018336f726b50a28ff573e17},
intrahash = {f8989757cedf55e712847394f2cf729b},
keywords = {deeplearning languagemodel nlp},
note = {cite arxiv:2203.11370Comment: ICLR Oral 2022. Code: https://github.com/rosewang2008/language_modeling_via_stochastic_processes},
timestamp = {2022-05-04T14:55:17.000+0200},
title = {Language modeling via stochastic processes},
url = {http://arxiv.org/abs/2203.11370},
year = 2022
}