bookmark

BERT Vector Space shows issues with unknown words · Issue #164 · google-research/bert · GitHub


Description

I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).

Preview

Tags

Users

  • @ghagerer

Comments and Reviews