Intro to BERT

MLNLP

May 14

Written By Ron Yue

I was first exposed to BERT in January of 2020 when I landed an undergraduate research position. BERT stands for Bidirectional Encoder Representations from Transformers and it is a versatile NLP model that can be used for a variety of tasks. NLP tasks before BERT have always been solved independently of each other. In a sense, BERT is the swiss army of models because they have wide applications. For example, BERT is able to do sentiment analysis. I created a sports betting model for a class a while ago and one aspect I analyzed was sentiment analysis from discord servers/data. I experimented with VADER from NLTK as well as BERT to print sentiment scores. The difference between these two models is that VADER is designed for sentiment analysis of social media explicitly. BERT can do text prediction, text generation, question answering (chatbot), and many other things. BERT was made with about 3.3 billion words. It was trained on Wikipedia and BookCorpus. There are many smaller BERT models such as DistilBERT.

BERT is great for MLMs (masked language models, not multi-level marketing LMAO). Let's say I tell a friend "Let's go! I just _____ a huge pot in poker." What do you think I said? BERT is able to look for clues before and after my sentence and figure out that I likely said "won." BERT can also predict the next sentence. BERT is trained on both MLM and next sentence predictions.

BERT also utilizes the transformer architecture. Transformers use an attention mechanism to find relationships between words and phrases. This is based on the human brain as well. We do not recall the small talk we made with the barista a week ago or on the elevator to work. It's not important and our brain filters that out. Likewise, ML models need to figure out what is important. Transformers create weights to indicate which words are the most critical. This is achieved by processing an input through an encoder. If we need to predict an output, we utilize a decoder to bring results. BERT does not use a decoder.

BERT is a really powerful tool that can be used to great success in ml projects. We likely interact with NLP models every day (including BERT) without even realizing it. In future posts, I will talk about a project I did with BERT.

MLNLP

Ron Yue

Intro to BERT

Static Arrays

Bits, Bytes, & Memory

Rongtian Yue