Article·AI Engineering & Research·Jul 6, 2023
8 min read

The 6 Foundational Courses To Learn Large Language Models

8 min read
Zian (Andy) Wang
By Zian (Andy) Wang
PublishedJul 6, 2023
UpdatedJun 27, 2024

Without question, Large Language Models (LLMs) are revolutionizing the tech landscape. The launch of ChatGPT in December 2022 and GPT-4 in March 2023 catalyzed an industry-wide shift, leading to a scramble among tech behemoths to develop their own LLMs. Prominent examples include Google's Bard, Meta's LLaMA, and Anthropic's Claude model, among others. Beyond providing entertainment through engineering “jailbreak” prompts or laughing at the terrible chess skills of ChatGPT, these models have surfaced with implications far beyond mere amusement. In the short span of these models appearing, they have shown potentials of replacing human jobs whilst introducing new professions dedicated to training and optimizing LLMs.

Furthermore, with the rise of increasingly complex language models, it is becoming crucially important that some form of interpretability is present to ensure transparency, accountability and understanding their decision-making process. 

Large Language Models are a type of deep learning model, and deep learning is a subset of Machine Learning. To fully understand LLMs, Machine Learning, and some fundamental deep learning concepts is a must. Like the saying goes, you can’t run before you learn how to walk.

Our approach begins with the rudimentary: courses imparting the elementary mathematics underlying machine learning and the theoretical knowledge you should possess before venturing into the realm of contemporary deep learning. Next, we'll introduce courses covering the essentials of Natural Language Processing - how textual data is processed and tokenized into a format decipherable by the model as well as elementary techniques that set the stage for modern deep learning techniques. 

Following this, we'll spotlight courses and resources focusing on modern language models, commencing with the predecessor of all LLMs: Transformers. Penultimately, we'll delve into Transformer variants like the GPT and BERT families of models. Finally, we will conclude with courses elucidating the basics of present-day machine learning libraries, and the practical applications made possible by the theories explored in preceding courses.

Phase 0: Theory and Foundations

With almost anything in the Machine Learning field, mathematical foundations are a must. However, for the purposes of learning and practicality, the mathematical knowledge required for most machine learning courses and papers are much less rigorous than what most might think. 

3Blue1Brown’s Series on Linear Algebra, Calculus, and Neural Networks

Across the three series, it is undoubtedly the best resource for anyone interested in starting deep learning from almost any amount of experience. The creator of the YouTube channel, Grant Sanderson, does an extraordinary job not only explaining the formulas and concepts, but also taking the audience through beautiful animations to intuitively grasp the ideas and logic behind every topic covered. 

Rather than focusing purely on the nitty-gritty calculations, these videos emphasize the underlying intuitions that drive Machine Learning algorithms and models. The combined 32 videos cover virtually everything, aside from statistics, that's needed for beginners to enter the field of machine learning without requiring years of foundational studies. Below is a general overview of what the courses will cover:

The Essence of Linear Algebra

  1. Vectors and matrices

  2. Linear transformations and basic linear algebra operations

  3. Determinant

  4. Inverse matrices, column, and null space

  5. Dot and cross products

  6. Eigenvectors and eigenvalues

The Essence of Calculus

  1. Derivatives, limits, and integrals

  2. Basic differentiation rules (power rule, chain rule, product rule, etc.)

  3. Implicit differentiation

  4. Epsilon Delta definitions

  5. L’Hopital’s rule

  6. Taylor Series

Neural Networks

  1. Gradient Descent

  2. Backpropagation

Phase 1: NLP Basics

With the basics of neural networks under the belt, the next step in action is to build on the knowledge, specifically for NLP using deep learning techniques. 

Sequence Models from Coursera by Andrew Ng

This course is the fifth one in the Deep Learning Specialization on Coursera, taught by Andrew Ng, the co-founder of Coursera and a pioneer in the field of artificial intelligence. This course dives into the world of sequence models, the dominant type of deep learning models used for processing text. This course focuses more on the practical aspects of things, with materials targeted towards those who want to implement Deep Learning NLP models in the programming language Python. 

By the end of the course, you will be able to build and train Recurrent Neural Networks (RNNs) and commonly-used variants such as GRUs and LSTMs. You'll also gain experience with natural language processing and Word Embeddings, and will be able to use HuggingFace tokenizers and transformer models to solve different NLP tasks such as Named Entity Recognition (NER) and Question Answering. The course spans four weeks, with each week dedicated to a distinct topic:

  • Week 1: Recurrent Neural Networks

  • Week 2: Natural Language Processing & Word Embeddings

  • Week 3: Sequence Models & Attention Mechanism

  • Week 4: Transformer Network

This course will be instrumental in building a solid foundation for understanding modern techniques to processing textual data as well as working with a variety of models that are the main frame of modern LLMs. 

NLP–Natural Language Processing with Python by Jose Portilla 

This comprehensive Udemy course presents an excellent primer on text data preparation and transformation techniques prior to inputting them into LLMs or any other NLP models. Jose Portilla's course combines theory with a project-oriented practical approach to handle real-world natural language scenarios using Python. While it does transition into applying deep learning methods to text data, the course's real strength lies in its thorough introduction to the fundamental aspects of the NLP field, often overlooked by most resources.

The course follows a seven-part structure:

  1. Python Text Basics

  2. NLP Basics

  3. Part of Speech Tagging and Named Entity Recognition

  4. Text Classification

  5. Semantics and Sentiment Analysis

  6. Topic Modeling

  7. Deep Learning for NLP

Phase 2: Deep Dive into Modern Language Models

The following courses delve into the intricate mechanisms underlying modern LLMs, offering a detailed exploration of theoretical aspects.

DS-GA 1008 - Deep Learning, NYU – Week 12

This open-access course, taught by Yann LeCun at NYU, provides a comprehensive overview of deep learning, from its rudimentary principles to its contemporary applications. Although its full content expands beyond the purview of LLMs, it is an invaluable resource for those wishing to dive into the theoretical aspects of deep learning.

Week 12's content includes three lecture videos and detailed written lecture notes on Deep Learning for NLP. It provides an in-depth look at the foundational architecture of modern LLMs, Transformers, discusses various techniques, architecture variants, and their limitations. Additionally, the lectures offer a summary of supervised and unsupervised learning for LLMs, both frequently employed techniques. Here are the three major topics covered:

  1. Deep Learning for NLP

  2. Decoding Language Models

  3. Attention and the Transformer

Note that the course also has an updated version, which does not cover as much of the “LLM” aspect of things but the NLP parts includes many topics that are related but not exactly applicable to LLMs:

  1. Speech Recognition and Graph Transformer Network (Week 11)

  2. Low Resource Machine Translation (Week 12)

  3. Joint Embedding Methods (Week 15)

CS324 - Large Language Models, Stanford University

The CS324 LLM course provides a more extensive and detailed coverage of topics compared to Phase 1 courses. It delves into specific, often overlooked, applications such as text encoding for unicode characters outside the English language. On the course website, there are detailed lecture notes provided along with up-to-date further reading materials. 

Besides the practical aspects of modern LLM usage like legality, security, environmental impact, and potential harms, the course also investigates key topics essential to LLMs' efficient implementation, such as parallelism and scaling laws. The course is divided into the following segments:

  1. GPT-family Model Capabilities

  2. Potential Harm of LLMs

  3. Data Behind LLMs

  4. Security Behind LLMs

  5. Legality Behind LLMs

  6. Preprocessing & Modeling Details

  7. Training Details

  8. Parallelism & Scaling Laws

  9. Selective Architectures

  10. LLM Adaptations

  11. Environmental Impact of LLMs

COS 597G - Understanding Large Language Models, Princeton University

This course offers a comprehensive study of the specifics of state-of-the-art model architectures. It dives into topics like prompting, reasoning, and in-context learning, moving beyond the foundational theory provided in Phase 1. The focus is shifted towards understanding the architectural modifications of recent LLM base models like GPT, T5, and BERT compared to the classic Transformer model. 

Although there are no lecture videos provided, the slides used for the lectures are given and there are many recommended further readings which goes much more in depth on the topics covered for that lecture. 

The latter part of the course expands to lesser-known applications of LLMs other than conversational chatbots, with a significant focus on addressing issues like LLM bias, toxicity, and possible mitigations. Here's a snapshot of the course structure:

  1. Specific Architecture Overview2 (BERT, T5, GPT-3)

  2. Prompting, Reasoning, and the Knowledge of LLMs

  3. In-context Learning

  4. LLM Scaling

  5. LLM Privacy

  6. LLM Bias & Toxicity

  7. Sparse Models

  8. Retrieval-based LMs

  9. Multimodal LMs

Conclusion

If, as it seems, we're entering an era dominated by Artificial Intelligence and Large Language Models (LLMs), it's crucial to keep abreast of the changes and developments in this field. This comprehensive guide to resources and courses on the foundational theory, implementation, and recent developments in the world of LLMs is an invaluable resource for anyone seeking to deepen their understanding or carve out a career in this rapidly evolving landscape.

From the mathematical foundations of Machine Learning to the intricacies of Natural Language Processing and the latest in transformer models, the resources listed here cater to every level of expertise. They allow beginners to take the plunge into the field, provide intermediates with a deeper understanding of the underlying principles, and offer experts up-to-date insights into the latest developments and research trends.


Note: If you like this content and would like to learn more, click here! If you want to see a completely comprehensive AI Glossary, click here.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.