Weekly Research Group 2021-04-22 - BigBird Architecture

This week, we wrapped up our exploration of BigBird’s architecture and implementation. We talked about Block Sparse Attention, the rationale behind their attention pattern, ETC vs. ITC, and the specifics of the huggingface implementation of BigBird.

You can watch the recording on YouTube here: https://youtu.be/XCcaAQujhXY

You can view my “Research Journal” doc here: BigBird Research Journal - Google Docs

For BigBird Q&A, we discussed when the “random” attention is applied, more clarification of what the global attention tokens are about, and where the sparsity actually comes into play in BigBird (because it’s not in the model’s parameters!).

Outside of BigBird, we talked a bit about applying the Transformer architecture to other data types, including images.

You can view our notes from the Q&A discussion in the following doc: Discussion Group Q&A 2021 Q2 - Google Docs

Next week, I’ll share my experiences applying BigBird to an application.