Abstract

The success of long short-term memory (LSTM) neural networks in language process- ing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have ex- plicit structural representations? We begin ad- dressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical compe- tence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accu- racy (less than 1% errors), but errors increased when sequential and structural information con- flicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

Links and resources

Tags

community

  • @dallmann
  • @dblp
@dallmann's tags highlighted