Sam Zegas: Welcome to Deepgram’s Voice of the Future Podcast, aka Our Favorite Nerds. At Deepgram, we are obsessed with voice, and this podcast is our explanation of the exciting emerging world of voice technology.
I’m your host today, Sam Zegas, VP of Operations at Deepgram, and our guest is Scott Eller, Founder, and CEO at Neuraswitch. Scott, thanks for being with us.
Scott Eller: Thanks for having me, Sam. Really appreciate it.
Sam Zegas: Awesome. Well, Scott, this is Our Favorite Nerds. So my first question is always, Why don’t you tell us a little bit about what kind of nerd you are?
Scott Eller: Absolutely. I mean, so I’m overall, I’m a technology nerd. So I’ve been, I’ve been in technology sales my entire career, but just seeing the way technology can impact our lives can make it for the better. Actually, very specific to me, technology saved my life but, you know, years ago beginning in my career, I sold CPQ solutions, configure, price, quote.
And we helped, quote, different CT machines and things like that from GE Healthcare accurately. And about 8 years ago, I actually had a crazy rare cancer that was essentially gonna limit my ability to speak, and technology was able to save that.
So it’s also, also kinda how I got into Neuraswitch. All speech related, all emotions related. And so really just, yeah, again, when I think about what kind of nerd I am, it’s, it’s the impact that technology can have on our lives and just how it can improve.
Sam Zegas: That’s a really powerful personal story to get involved with something related to speech, glad you came through it all right.
Scott Eller: I’d really appreciate that. It was, it was a fun, fun battle.
Sam Zegas: Yeah, yeah, well, I’m sure that we’ll come back to that a little bit or it will become clear as you tell us more about the product, but why don’t you dive in and give us a little bit of an overview about what Neuraswitch does and the history of how you got started?
Scott Eller: Absolutely. So, Neuraswitch, which we’ve been around since 2017 I am one of the founders.
Initially, you know, I’ve, I’ve answered the call from an old colleague of mine who wanted to create this company that was all around speech and speech was fascinating to me because I had just beaten this cancer that they literally had to remove half of my tongue and I had to relearn how to speak.
And the solution that my friend came to me, old colleague came to me about was we’re gonna be able to provide better speech solutions and also real-time emotions. So that’s kind of what got me into this whole world with speech and I figured, hey, people aren’t gonna be able to understand me, at least they’ll understand my emotions going through it.
Kinda fast forward to where we are now, we’re still working with call centers. We’re still in the speech technology. We’re relying on companies like Deepgram to provide very accurate transcriptions.
And what we do is reassure that all the sensitive data is removed from that. If you kinda think of the ordeal that I went through, I was constantly on the phone with doctor’s offices, things like that, getting my Social Security number, my last name, my date of birth, all forms of sensitive information.
So, working with different call centers about a year ago and one of our large clients, they were going through a lawsuit because some sensitive information got out. And we had the unique opportunity to help them.
And, really, we needed to identify a solution that could redact a Social Security number, but leave an account number intact because the account number was the same number of digits as a social.
And once we dug into that, we realized, oh my gosh, every single one of these conversations that our customers are transcribing, it is full of sensitive data, but it’s also full of information and data that they can leverage, that different departments can help improve their interactions with their customer. So, yeah.
Sam Zegas: Yeah. That’s really fascinating. You know, so, basically, you specialize in doing precision reduction, which I think is a really interesting problem in speech analysis, as a, as a space. And the first thing that comes to mind to me is sort of the, the joke image of a, a government redacted document in which It’s, like, the first line you can read, and then everything that people have had is blacked out.
And then actually, humans are trying to do that. And so you’d figure, like, okay well, how could a machine do any better than that, especially when it doesn’t understand context the way that humans do. But I guess that’s really the core of what you’re doing is teaching machines how to understand the context of what to redact.
Scott Eller: Exactly. And that’s really you just explain how big the issue is and how difficult it is, which is why we have to transform our business into focusing solely on redaction. And we call it conversational redaction because you really have to look at the conversation as a whole because as humans, we don’t necessarily always follow a script. Like, when I was constantly calling in the doctor’s offices, eventually, I didn’t sit there and say, I didn’t wait for them to ask me, “Hey, what’s your date of birth? What’s your last name? What’s this?” I just, “last name’s Eller’s date of birth is 18,” you know, and I just went right into it.
A lot of the solutions out there, they’re looking for a specific script. They’re looking for can I have your birth date? And then they key off of the birth date and they’ll redact, you know, five words after that or 10 seconds after that.
But a lot of times too, you know, there’s background noise as well. There’s dogs barking. I think we’ve had a little in our call. But you know, with that happening, people are going to say sensitive information over multiple utterances.
And at the end of the day, this solution has to be intelligent enough to realize that, hey, they may have said half of their Social Security number, the agent could have interjected, and then they may have said the second half two lines down.
So again, you really have to build into account every single scenario that could possibly happen
to ensure that accuracy is really up because all it takes is one piece of sensitive information to get through.
Sam Zegas: Right? Huge risks to enterprises that don’t think about how to do redaction well. And you’ve explained how the difficult thing that you’re trying to do with your solution here is to figure out how to teach computers how to get the context to be able to redact to just the right things but leave other important pieces of information intact.
Tell me about how this is being done in the past. I think it and when we were talking earlier, you had mentioned something called ‘pause and resume.’ Tell me what that is and why that wasn’t sufficient.
Scott Eller: Yeah. Absolutely. So first, you mentioned one of them upfront too where you just like a government document. It’s just over redaction or over redacting. That’s what a lot of companies have done historically where they redact everything. And then you have no context.
So then your marketing department, they have no insights and visibility into what’s going on in your conversations. But yeah. So ‘pause and resume’ is another big one tha,t it’s probably been, it’s been used for decades. And it can be either manual or automatic. So automatic and kind of, or automated is really where it’s gonna key off of certain phrases. So it may wait for the agent to say, can I have your credit card number?
And then it’s gonna and then it’s going to pause the recording and it will resume after either a certain period of time. Or, after the actual agent continues with the actual recording. It can also be done manually where the agent is just sitting there, they’re on the conversation, and they ask for it, and they manually pause it.
There’s so many different issues with that. I can’t even, I don’t even know where to begin, but I guess I’ll start with, with the pandemic, a lot of agents went home. They weren’t in a call center. They didn’t have people looking over their backs, seeing what they were doing. So, yeah, they could pause the conversation. Then they could write down your credit card number right next to them.
Sam Zegas: Mhmm.
Scott Eller: They could forget to pause it. They may not follow the script to a tee, and they’re saying things out of order so then the automated pause and resume isn’t picking up correctly.
So, really, what we found and what companies are starting to see now is, hey, we’ve been using this pause and resume for the last 20 years. It’s not compliant. There’s leakage all over the place. There’s sensitive information getting through, there’s fraud.
If you actually interview people at call centers, they’ll tell you that they have police come by pretty often and are actually arresting agents that are manually writing things down and then trying to sell that sensitive information in the black market.
Sam Zegas: Wow.
Scott Eller: So… yeah. It’s a big issue and I mean, you’ll see it just from press releases where people, companies will get fined and they’ll get fined a certain amount, typically, $150 dollars per record that gets out. But then they’ll have to create a press release. They’ll have to reach out to their customer base, “Hey, we had a hack. You may need to change your email address, whatever that sensitive information was.”
Sam Zegas: So there are a couple of different kinds of problems that come from having an inadequate reduction service. One of them is that it could be over redacted and then a lot of useful information is being removed. Another is that the redaction solution, something like pause and resume, leaves room for a lot of leakage, and then you can get people abusing systems like that. I guess, companies that don’t tackle these problems then are facing brand and trust image or damage to trust and to their brand image, probably regulatory fines. There’s probably a whole bunch of different landmines that people can run into here.
Scott Eller: Yeah. Absolutely. I mean, fines are just one part of it, and that’s pretty well defined. I mean, a company can typically calculate what they’ll be fined. It’s typically $150 dollars per record that gets out.
A specific example was Robinhood that got hacked a couple years ago. They had five million email addresses that got exposed.
Sam Zegas: Wow.
Scott Eller: And so if you if you multiply that by 150, you’re over 700 million dollars for fines, but they were able to settle for a mirror which is still insane 70 million dollars.
Sam Zegas: Mhmm.
Scott Eller: Where some of that payout just goes to the end people that had their data compromised and then the rest of it’s defined by state agencies and things like that. But it it is a massive issue, and but also, yeah, at the other point too, though, at the other end of the spectrum, over redacting is an issue too, because every single conversation with one of your agents and your customers, isn’t an opportunity to improve your internal processes.
It’s an opportunity to improve your customer communications and make sure that you’re doing all the right things. So you know, even outside of the fines if, you know, if you’re over redacting, you just don’t have that information at your fingertips to make all the right decisions.
Sam Zegas: Yeah. Something that we say often at Deepgram which really inspires us, inspires me personally, is the idea that speech, audio speech, is one of the world’s largest untapped unstructured data sources.
And so that every day, we’re generating globally billions and billions of hours of audio speech data, and then this struggle to find the appropriate ways to structure that data, which includes both transcribing it, but also marking different passages as sensitive in different sorts of ways and handling those things differently.
That is, a really exciting frontier speech technology, in which there’s a lot of value that can be added today if those technologies are just applied right, you know.
Scott Eller: Absolutely. And and you hit the nail on the head with how much data is out there. Like, if you depend on the industry too, like, if you’re looking at financial services or health care, they’re probably gonna be the more regulated industries out there.
But with financial services, like, with the bank, you have to store seven years of your conversations, of your data, whether it’s the audio or the transcript. So a lot of these larger banks, I mean, they have, they have storage units full of just hard drives of all this archive conversations that no one’s ever touched, no one’s ever listened to.
But if someone ever gets those hard drives or someone taps into it, they’re throwing it in the cloud, all that sensitive data is just sitting there. And, Sam, it doesn’t have to just be a Social Security number anymore. Like, if you think about it, every time that you’re calling in, when you’re answering certain security questions, like, your mother’s maiden name, your favorite pet’s name, your elementary school. All of those, all that information, is really keys to your information. If you have, if a hacker gets that information, they can call in and they can say, “Hey, my name is Scott Eller. My address is x my, you know, my mother’s maiden name is y,” and they can gain access to my accounts.
So it’s not only the sensitive information that you need to redact now, it’s all that PII, just the personal data as well, your email addresses, your, you know, things of your past that our password hits.
So that’s kinda where everything is trending now. It’s not just “hey, remove my credit card number and my Social Security number and we’re good.” It’s “you need to remove my address, my email address, my favorite pet snake,” all of those different components have to be protected.
Sam Zegas: All those different nodes in the web that together create identity manageable and security management. Yeah. That’s, that’s a difficult challenge. So I assume you’re using some sort of a machine learning model that then allows, the model would allow the output to flag certain kinds of potentially sensitive data in a transcript as someone is talking. Is that right?
Scott Eller: Correct. Yeah. So we refer to them as different entities, but there’s different entities that you can redact just kind of out of the box within PII and then PHI and, of course, PCI as well. But at the same time too, what we’ve and the whole reason we pivoted to solely providing redaction solutions is because there are solutions out there that only redact based on models, and we found that that’s just not enough. We’ve tested all of them. We’ve tested them against what we do as well. And the reality is every company that you talk to, if they’re going to implement a solution like this, they want 95% plus accuracy. Most of them are gonna say they want a hundred% accuracy.
Sam Zegas: Mhmm.
Scott Eller: The problem is no technology solution out there. Even, I love your guys’ transcription but you can’t state that you’re 100% accurate.
Sam Zegas: No, we don’t.
Scott Eller: There’s too many variables. There’s dogs barking. There’s the doorbell ringing. There’s phones ringing.
Sam Zegas: I would push that one step further even. We have human transcriptionists to label our audio data, to prepare that data for training, and even humans, even you and I cannot get a transcript to 100% accuracy and it’s because of the things that you said. It’s slang terms that maybe you don’t have a standard spelling. It’s moments of crosstalk where if we interrupt each other, what do you actually write during that time stamp?
So even, at a human scale, 98, 97-98% accuracy is excellent. And then if you can get machine systems that work in the low nineties to mid-nineties, that is really top notch.
Scott Eller: Absolutely. Yeah. So we, so we what we do is we basically we look at the entities. We, you know, we try to rely on technology as much as possible, but then we also build additional rules and logic around it because you’re gonna see certain things. As you listen to enough conversation, enough conversations, like, you see certain things, and then you kinda need to build out a dictionary for– like, last week, there was an example where I listened to a call, and it was “you can’t be lying to the client.” “You can’t be lying to an agent” or something, like, along those lines, and it transcribed it instead of “lying,” it was “a lion.” And so–
Sam Zegas: That’s helpful.
Scott Eller: I know. But, so you have to be have to be cognizant of those types of speech errors that happen, so you can ensure nothing is getting it through. Obviously, that example doesn’t really matter. I mean, if they said, “lying” or “a lion,” not not a huge impact. It’s not sensitive. But those scenarios do arise where it is sensitive information and it’s not even the machine learning portion of it isn’t even recognizing that it’s sensitive.
Sam Zegas: You know, you bring to mind another thing that’s on my mind a lot and this is a little bit more in Deepgram’s domain than yours, although I’m sure it crosses your mind too, which is bias in speech recognition. And the history of speech recognition software solutions over the decades has been one-size-fits-all standard newscaster dialect, you know, for English, let’s just say it’s sort of a very neutral standard English, neutral– end scare quotes.
But there are many different accents and different dialects in English. And if you train a model, that then is mistranscribing a whole demographic set that maybe have a different way of pronouncing words, you could really be, that is bias in the system. It can negatively impact one group of people just because the speech model was not designed in a way that was inclusive and equitable.
That’s something we think about a lot and, you know, it really affects this sort of solution too where the stakes are high.
Scott Eller: Absolutely. No. And we and we run into that all the time too. It’s you know, one of the first deals that we had a few years ago, and this was before even redaction. But it was a, it was like a Scottish accent.
Sam Zegas: Mhmm.
Scott Eller: And it was, like, Scottish English. It was so…I couldn’t even understand them just speaking. So making sure that we could transcribe it correctly was very, very interesting. But, yeah, there’s, you know, oftentimes we get, “Hey…” you know, the question I get is, “…what type of Spanish do you guys work with?”
Sam Zegas: Mhmm.
Scott Eller: What do you mean? It’s Spanish. And then it’s like, “No, there’s Mexican Spanish, and then there’s Latin Americas,” there’s all sorts of different types of Spanish, and it’s exactly what you just keyed in on, which is people speak differently in different areas, and you need to have a solution that’s flexible enough to to be able to handle that.
Sam Zegas: Yeah. I thought there’s so much to say there. I won’t go down that rabbit hole, but I don’t know–
Scott Eller: We’ll be here all day.
Sam Zegas: Yeah. Yeah. Well, great. So one of the themes that I hear you pulling out here is that in order for your solution to work, you really need to rely on the highest accuracy transcript as an input as possible because that helps give you all of the context clues to identify what kinds of redactable information may be contained and then it will take out just those things. So accuracy really seems like it’s the core tenet of what you–
Scott Eller: It absolutely is. I mean, accuracy drives everything that we do in this business. if, you know, if we’re not accurate, then we’re not gonna keep clients. If the transcription engine that we’re working with, if you guys, if you’re providing a transcript that’s only 50% accurate, then we can work with that. And by the way, you guys would never do that.
But we can work with it, but the professional services around actually and assuring that that is accurate would end up being it would be a training nightmare. So you at least wanna rely on the most accurate transcription possible. And then that limits the amount of work that we have to do to really achieve that end goal of 95% plus accuracy.
Sam Zegas: Yeah. Well, great. We’re really excited about the work you’re doing. You know, we, we’re constantly looking out into the market and tracking places where voice technology is creating user experiences that feel more humanized because machine learning is training computers to be able to recognize context in the way that a human brain can. So–
Scott Eller: Yeah.
Sam Zegas: Yeah, congrats on your work. It’s really cool.
Scott Eller: Thank you.
Sam Zegas: I’m glad to be part of it. So before we go here, at the end of every episode, I always take a minute to remind people just how far we’ve come with technology and how much we have to look forward to in the next decade. I’ve asked you to prepare an explanation of an older piece of technology that a younger person might not be as familiar with.
So you’ll explain like you would do a 10-year-old, in this case, how long-distance calling used to work and how texting limits used to work?
Scott Eller: Yeah. So really fun one, and actually, I love this topic. It’s really cool that you guys are doing this. But from just examples from when I was a bit younger, got my first cell phone, trying to tell a 10-year-old that, “Hey, here’s your phone, text, do whatever you want. But just so you know, you only get 100 texts per month or you’re gonna get grounded, like or you’re gonna get grounded for a week or a month or whatever that is.”
Sam Zegas: Yeah.
Scott Eller: But it was just insane. Like, back in, you know, 2001, 2000 around there, it was you had certain plans that things were not unlimited. It was you have this many texts, and then when you don’t, it’s 10 cents or whatever it was per text after that.
And I just remember I remember getting calls from friends or other people’s parents that would say, “Hey, you gotta quit texting my kid because it didn’t matter if you sent it or you received it. You still have to pay for it.”
Sam Zegas: Right.
Scott Eller: And yeah. So that you know, so the texting one I think is a pretty, my kids are gonna get a kick out of that later on in life. But also, another one that kinda impacted me a little bit previous to that, probably in the late, mid-late nineties was long distance.
So long distance was, it could be someone that’s 20 minutes away from you, just in the city over like, I’m in Reno, Nevada. Sparks is only 20 minutes away from me, but back in the day, a long-distance call was just calling over city lines.
And so, I mean, maybe a little too specific, but I went to a birthday party when I was younger, girl got my number, she lived two towns over and she kept on calling me and we talked a lot. Well, I got grounded the next week because her grandmother called my mom to yell about our high phone bill.
How was I to know that we were long-distance being 20 minutes away and that both of our parents were gonna get this massive bill just because we were having a normal conversation?
Sam Zegas: That’s so funny. Yeah. I can relate.
Scott Eller: Fast forward, everything is unlimited. So, you know, sky’s the limit. Call, text, do whatever you gotta do.
Sam Zegas: Yeah. I had an old, an elderly neighbor who used to go down to casinos a lot. The casinos would give her these little, like, calling card gift cards that–
Scott Eller: Oh my god.
Sam Zegas: –That you could use long distance. I loved that gift because I could call my friends from far away.
Scott Eller: Oh that’s awesome. I totally forgot about those. That was like the gift, oh you’re graduating from high school and going to college? Here are some phone cards.
Sam Zegas: It’s a calling card. That’s so funny. Well, awesome. It’s been really great to talk to you. Thanks for the time today.
Scott Eller: Yeah. Sam, absolutely. It’s been a pleasure, and I hope you have a wonderful day.
Sam Zegas: Likewise, so to all our listeners, thanks for tuning in. Come check us out for more info about Deepgram and about Neuraswitch. You can find Neuraswitch at neuraswitch.com. And of course, we are at deepgram.com and @DeepgramAI on all of our socials. So with that, we’re out, and we’ll catch you next time.