Lesezeichen

BERT Vector Space shows issues with unknown words · Issue #164 · google-research/bert · GitHub


Beschreibung

I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).

Vorschau

Tags

Nutzer

  • @ghagerer

Kommentare und Rezensionen