LLMs 101: Large Language Models Explained.

Part 1: NLP and Word Vectorization
Part 2: From Word-Embeddings to Deep Learning and Language Models
“Masked” Language Models
“Predictive” Language Models
Part 3: What makes a language model large?

Share this guide

By Jose Nicholas Francisco

Machine Learning Developer Advocate

Last Updated

Jun 13, 2024

At this point, many of us have heard about ChatGPT, Bard, GPT-4 and other “Large Language Models,” often abbreviated as LLMs. But what exactly are these magical, chatty machines? What makes them “Large”? And how is it possible for computers—who, in reality, only speak in 0’s and 1’s—learn to articulate themselves like you and me?

Part 1: NLP and Word Vectorization

Well, let’s start from the very beginning. First, we have to understand how computers can make sense of words. If, for example, I say the word “fish” to a human like you, then the idea of a fish pops into your mind. But the concept of a fish is complex. You could be imagining a goldfish in a tank, a great white shark, or a delicious cut of salmon on a bed of rice.

Understanding the nuances of the idea of a “fish” isn’t tough for dynamic humans. But for a machine that only understands numbers, the task of understanding becomes a bit more difficult.

Luckily, a bunch of scientists, linguists, and programmers banded together to come up with a solution. That solution is two words long: Word Vectorization.

Without getting into the nitty-gritty of calculation, here’s the punchline: Because computers only understand numbers, we end up transforming the word “fish” into a very, very, very long list of meticulously calculated numbers. This list can contain thousands upon thousands of entries.

And once we’ve transformed a concept like “fish” into a list of numbers, the computer can then treat it like a number. And, surprise surprise: Computers are good at computing with numbers. TLDR: This is how language data becomes computable.

Note that once a word is “vectorized,” the resulting list of numbers is called an “embedding.” (For additional explanations of AI jargon, check out this blog post.)

So, long story short, because of word vectorization, computers can understand any single word you throw at them. But what about whole phrases, sentences, and paragraphs?

Let’s talk about it.

Part 2: From Word-Embeddings to Deep Learning and Language Models

There are many ways to train a computer to understand human language, but here are two of the most popular ones.

“Masked” Language Models

Let’s play a little game. Below are three sentences. Fill in the blanks to the best of your ability:

He’s a lot ____ than I thought. On Zoom, he looks 5’7, but in real life he’s 6’4.
My daughter and I went to the ___. She loved looking at the giraffes and the hippos the most.
We have an ongoing debate about which breakfast food is better, ____ or waffles.

Alright, let’s see how you did. If you answered

taller
zoo
pancakes

then great! You got a perfect score! However, you could’ve also said

bigger
safari
muffins

There are quite a few possible answers for each question. But as long as you fill in those blanks with something sensical, then congratulations, you think like an AI!

Or, rather, an AI thinks like you.

See, these fill-in-the-blank puzzles are one of the main ways we train computers to speak like humans. The intuition is this: If you have a strong grasp over how a language operates, then you should be able to fill in those blanks without any issue.

However, if these fill-in-the-blank puzzles stump you, then you need to study (read: train) a bit more. That goes for both humans and computers.

The most famous example of a masked language model is Google’s BERT. But there’s more than one way to teach a computer to speak like a human!

“Predictive” Language Models

It’s here where we delve into the viral world of ChatGPT. And GPT-4. And LaMDA. And much, much more.

See, instead of filling in the blank of an already-completed sentence, Predictive Language Models have to guess what word comes next, based only on the words that came before.

The intuition is this: When humans have an impromptu conversation, they’re constantly coming up with what words to say next. Sometimes, a human won’t know how the sentence they’re currently saying is going to end. That’s partially why we say words like “umm” or “uh” in the middle of our sentences. We’re trying to figure out what comes next.

We can therefore train computers to speak like humans by training them to think in the same way humans do mid-conversation. We ask “Based on the words I’ve just said, what words would make the most sense to come next?

Part 3: What makes a language model large?

Now, keen readers will notice that we’re not just talking about Language Models in this post. We’re talking about Large Language Models.

So that begs the question, what makes a language model large?

Well, the short answer is the number of parameters the model has.

If you need a refresher on what exactly parameters are, check out the video below:

But here’s the good news: We can understand the “large” part of the term “Large Language Models” in two distilled bullet points:

A parameter is just a number inside an AI that helps the machine calculate its outputs
An AI model is considered “large” if it has at least 100 billion parameters.

And that’s it! There are small language models, medium language models, and large language models. The biggest ones ones—like LaMDA and GPT-3.5—contain at least 100 billion parameters. And that’s Large Language Models in a nutshell! They’re simply computer programs that can use 100-billion-number equations to predict which words belong where in a sentence.

The result is an AI that can write haikus, work at law firms, and maybe even flirt with you.

With LLMs, the opportunities are only limited by your creativity! Have fun with them

LLMs 101: Everything You Need to Know About Large Language Models

Table of Contents