Аннотация

The word2vec model has been previously shown to be successful in creating numerical representations of words (word embeddings) that capture the semantic and syntactic meanings of words. This study examines the issue of model stability in terms of how consistent these representations are given a specific corpus and set of model parameters. Specifically, the study considers the impact of word embedding dimension size and frequency of words on stability. Stability is measured by comparing the neighborhood of words in the word vector space model. Our results demonstrate that the dimension size of word embeddings has a significant effect on the consistency of the model. In addition, the effect of the frequency of the target words on stability is identified. An approach to mitigate the effects of word frequency on stability is proposed.

Линки и ресурсы

тэги

сообщество

  • @tschumacher
  • @dblp
@tschumacher- тэги данного пользователя выделены