Article·AI & Engineering·Mar 9, 2022

Detecting and Reducing Bias in Speech Recognition

Chris Doty
By Chris Doty
PublishedMar 9, 2022
UpdatedJun 13, 2024

Bias in machine learning has been a hot topic in the news lately, and bias in ASR is no exception. In the recent State of Voice Technology 2022 report, we saw that 92% of companies believe voice technology bias has an important impact on their customers, including in the domains of gender, race, age, and accent (both native and non-native speaker). So how do you detect bias in your automated speech recognition (ASR) systems? And what do you do if you find it? We've got answers below. But before we dive in, let's start with a definition of what "bias" is to make sure we're all on the same page.

Two Kinds of Bias

It's important to understand that when people talk about bias and machine learning, they might be talking about two different things. On the one hand, they might mean real-world bias related to things like race, gender, age, etc. But the term bias is also used in machine learning when a model is overpredicting or underpredicting the probability for a given category. That's true even if that category is something that we might not apply the term "bias" to in the real world. For example, a model developed by a brewery to track when its vats are at risk of overheating might sound the alarm frequently, even when there is no risk of overheating-a problem of machine learning bias, but not real-world bias.

In many cases, though, when people talk about "machine learning bias" generally-especially in the media-they're referring to the intersection of real world and machine learning bias: a case where a model is overpredicting or underpredicting in terms of something like gender or race. With these definitions out of the way, let's turn to examining where bias comes from before moving on to how you might figure out if you've got bias in your ASR system, as well as what your next steps are if you've found it.

Where Does Bias Come From?

Machine learning bias often comes from data-biased data leads to a biased model. If you've taken a statistics course, you're familiar with sampling bias, which is when data that is supposed to represent the true range of possibilities is instead skewed in some way. If you use data like this to train a model, the model is going to be skewed in the same way as the data. This sampling bias can show up in your model as machine learning or real-world bias because many organizations who want to create models rely on their own past decisions. If your company has a history of making biased decisions, any model trained on that data will likewise be biased.

For example, if a company that trains a model to help make promotion decisions has a poor track record of promoting women, it's likely that their model will make the same kinds of biased decisions if it's trained on the company's past behavior. In some cases, however, a model might be biased towards a certain group of people for less nefarious reasons. For example, an initial ASR model might simply be biased because it was trained on speech that was easy for the creator to collect-from themself, their family, and their friends, who likely all speak similarly. Here, the issue isn't biased past decisions, per se, but simply a lack of data to cover different possibilities.

How to Detect Bias in ASR Systems

So how do you know if you've got bias on your ASR system? Detecting it is relatively easy on its face. For example, you might ask if your transcripts for some speakers have significantly higher word error rates (WER) than for other speakers. But even the answers to a simple question like this can sometimes be difficult to understand on their face. Language is infinitely varied, and even though we speak of, for example, "American English", this actually represents a wide range of linguistic variation, from Massachusetts to North Dakota to Alabama. And in other places around the world, the variation between dialects is sometimes even greater than what we see in the US.

When you're thinking about how well your ASR system is performing, you're always going to encounter edge cases that it might have trouble understanding. For example, your model might encounter a non-native accent that it's never heard before-say, a native speaker of Polish who moved to South Africa at 16 and learned English there. To figure out if you've found bias in your ASR system, you need to be very nuanced in how you're looking at the data. For example, if you're finding a few sparse cases of poor transcripts in your ASR system, it could be that you're simply finding people with accents, like the Polish person mentioned above. In most cases, trying to chase down these edge cases and figure out how to fix them isn't going to be worth the time or effort, and isn't evidence that there's a systematic problem with your model.

But what if the bias that you've detected seems to be for an entire group of people? Maybe you notice that your call center in the Southern US has transcripts that are consistently of poorer quality than call centers in other parts of the country. Because this looks like a systematic problem, rather than a one-off problem, you'll probably want to take some kind of action to address this bias.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.