Glossary
Hidden Markov Models (HMMs)
Datasets
Fundamentals
Models
Packages
Techniques
Last updated on February 23, 20244 min read

# Hidden Markov Models (HMMs)

## Introduction to Hidden Markov Models (HMMs)

Hidden Markov Models (HMMs), emerging in the early 1960s, extend the concept of Markov chains to more complex scenarios. A Markov chain is a stochastic model that describes systems where the probability of each future state depends only on the current state and not on the sequence of events that preceded it.  This is ideal for modeling sequential data to understand the evolution of various conditions or states that influence the likelihood of events.

Consider the UK's unpredictable weather, where the state of the weather—be it "Cloudy ☁️", "Rainy ☔", or "Snowy ❄️"—influences daily life, from dress styles to emotions. For example, on a rainy day, there might be a 60% chance of it continuing to rain, 30% of turning cloudy, and 10% of snowfall. These transition probabilities, along with the observable impacts on people, form the basis of a Markov chain.

The Markov chain is characterized by 3 properties:

• Limited number of possible states (outcomes e.g cloudy, rainy, and snowy)

• The Markov property (memorylessness)

• Constant transition probabilities over time.

However, real-world scenarios often involve complexities where these states are not directly observable, leading to the development of Hidden Markov Models. These models account for unseen factors influencing observable outcomes, hence the term 'hidden.' This mirrors real-life events where we can see observable outcomes, but figuring out what caused it in the beginning is a bit of a mystery. With HMMs, you are basically reverse engineering a Markov chain to uncover what's driving the observed sequence.

In the following sections, we'll explore the intricacies of HMMs and their applications, delving into how they extend and sophisticate the foundational concept of Markov chains.

• What's driving the observed sequence?

• What is the most likely next action or state based on the past observations?

## How HMMs Work

HMMs are stochastic in nature and operate on the principles of uncertainty. The foundational theories underpinning HMMs are essential to understanding their probabilistic nature:

• Independence Assumption: Assumes that the observed emissions are conditionally independent given the hidden states. Simplifies the modeling assumptions, allowing for efficient computations.

• Chain Rule of Probability: The joint probability of a sequence of events is the product of the individual probabilities. In HMMs, the joint probability of an observed sequence and a sequence of hidden states is computed as the product of emission and transition probabilities, simplifying calculations in the Forward Algorithm.

• Law of Total Probability: The probability of an event A is the sum of the probabilities of A given different mutually exclusive and exhaustive events B. It is used in the Forward Algorithm to compute the probability of an observation sequence by summing over all possible hidden state sequences.

• Bayes' Theorem: Describes the probability of an event based on prior knowledge of conditions that might be related to the event. The Baum-Welch Algorithm uses this concept for estimating model parameters by updating probabilities based on observed data.

It's important to note that these models have limitations when dealing with data that features constantly changing probabilities.

### Formal Representation of HMMs

To fully grasp Hidden Markov Models, it's crucial to understand their key components:

• States: The hidden variables of an HMM, representing the underlying causes of observed outputs, are its states.  They are not directly observable and are typically modeled as a discrete set. In speech recognition, for instance, states might correspond to phonemes. With English having 44 phonemes, our HMM could have 44 states.

• Emission probabilities: These probabilities reflect how likely it is to observe a specific output given a certain state. Represented as a matrix, each entry indicates the likelihood of observing an output in a state. For example, in speech recognition, the matrix would detail the probability of hearing a specific sound when a certain phoneme is spoken.

## Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.