Abstract
Recent progress in natural language generation has raised dual-use concerns.
While applications like summarization and translation are positive, the
underlying technology also might enable adversaries to generate neural fake
news: targeted propaganda that closely mimics the style of real news.
Modern computer security relies on careful threat modeling: identifying
potential threats and vulnerabilities from an adversary's point of view, and
exploring potential mitigations to these threats. Likewise, developing robust
defenses against neural fake news requires us first to carefully investigate
and characterize the risks of these models. We thus present a model for
controllable text generation called Grover. Given a headline like `Link Found
Between Vaccines and Autism,' Grover can generate the rest of the article;
humans find these generations to be more trustworthy than human-written
disinformation.
Developing robust verification techniques against generators like Grover is
critical. We find that best current discriminators can classify neural fake
news from real, human-written, news with 73% accuracy, assuming access to a
moderate level of training data. Counterintuitively, the best defense against
Grover turns out to be Grover itself, with 92% accuracy, demonstrating the
importance of public release of strong generators. We investigate these results
further, showing that exposure bias -- and sampling strategies that alleviate
its effects -- both leave artifacts that similar discriminators can pick up on.
We conclude by discussing ethical issues regarding the technology, and plan to
release Grover publicly, helping pave the way for better detection of neural
fake news.
Description
Defending Against Neural Fake News
Links and resources
Tags
community