Sentence Embeddings w/o Fine-Tuning?

Comment from Chuan Jiang on BERT Document Classification Tutorial with Code:

For the comment you made that embedding can only make sense unless model has been fine-tuned. However to do so we have to have training data with labels to fine tune it at the beginning. Otherwise, how can I compare semantic embedding for the following two texts (presumably similar) in a unsupervised way?

(1) What’s the population of US?

(2) How many US citizens are there in US?

Hi Chuan,

Fine-tuning is ideal, but you can definitely still extract sentence embeddings “unsupervised”.

A popular approach is to average together all of the word embeddings from the second-to-last layer of BERT.

I show an example of this in the Notebook / blog post
BERT Word Embeddings v2.ipynb.

Hope that helps!