Ah! That makes perfect sense now, thanks .
BERT is actually capable of taking in two independent pieces of text, because it was pre-trained on some two-sentence tasks. At a low level, you give BERT two pieces of text by concatenating their token sequences, but inserting the special “SEP” token in between them first. You also add a special “Segment A” embedding to all of the tokens in the first text and “Segment B” embedding to all of the tokens in the second text.
In practice, whatever library you’re working with should have the ability to feed in two independent pieces of text and take care of the above steps for you, so make sure to use that functionality rather than simply concatenating your strings before passing them in. (It’d be interesting to try both approaches, though, and see whether it changes much!)
So the support is there in the architecture, BUT, it’s limited to two pieces of text. You could pre-train your own BERT model to handle more than just two pieces of text, but that can get expensive in GPU time…
I think I would experiment first with the two pieces of text to see whether treating them independently actually improves BERT’s performance. If it doesn’t improve it, then that simplifies things and you can just go back to concatenating your text before inputing.
Interesting to think about, thanks again for asking that!