I skimmed through the notebook which shows how to do the above. I have a couple of questions regarding this. Let us say that I want to fine-tune BERT on a dataset which has apparel descriptions, for example: “Feel angelic in the Extratropical Dress. In a beautiful neutral taupe shade, this dress is the perfect shade to compliment any and every skin tone.” Now, considering that I am using a domain specific dataset, I would like to add certain words to the existing vocabulary and the notebook demonstrates that fine.
Now, if we consider the word
tokenizer.vocab, but the meaning of the word
In the final section of the notebook, @ChrisMcC, you have shown a neat trick to customize the embeddings of a new word that has been added to a vocab. Can you please explain how this technique could be used when i am adding say 1000 new words to the vocab?