VOTF Episode 2: Transcription is calling, you should answer. Data liberation, now!

Sam Zegas

About this episode
You’ve got unstructured data for miles or maybe terabytes. But what insight can come from unstructured data sitting in a directory, unprocessed? The answer is bloody nothing, as you’ll hear from Pete Ellis, CPO at Red Box. Transcribing as much audio data as possible is the foundation for analysis that can be used to increase revenue for contact and support centers, which is why we need data liberation, now! Tune in and find out why you shouldn’t wait another minute to transcribe and analyze your audio.
To learn more about Red Box, visit https://www.redboxvoice.com/conversa/
Transcription is calling, you should answer. Data liberation, now!
Sam: Welcome to Deepgram’s Voice of the Future podcast. AKA, our favorite nerds. At Deepgram, we’re obsessed with voice, and this podcast is our exploration of the exciting emerging world of voice technology. I’m your host today, Sam Zegas, VP of Operations at Deepgram, and our guest is Pete Ellis, CPO at Red Box. Pete, welcome to the show.
Pete: Thanks, Sam. Good to be here.
Sam: Pete, this is, of course, our favorite nerds. And I generally start by asking people what kind of nerd you are. No. Totally show a nerd. I suppose I’m a technical nerd. Yeah. I’m going. quite hold now, but I still get heavily involved. You know, I sit on the the board of Red Box, but I still get involved in development and, you know, can we can we push voice around it, you know, a a millisecond quicker can we deal with, you know, another hundred thousand calls, you know, that so that kinda level. So I suppose I’m still a technical nerd. And so I do get heavily involved in our R& D So two seventy three, even though it’ll be twice as old as our developers.
Sam: Tell me a little bit about your background then. How did you get into this line of work?
Pete: Sure. Yeah. I mean, it’s for a good thirty years plus now. I’ve been really in talking in contact centers. I started off with a company in the nineties doing CTI. computer type of integration. So that’s sort of a great ground in standing telephony, call control, you know, how concert centers work, how telephony systems work, never since then, I’ve worked in companies around voice and voice architecture and contact centers. whether it’s in service providers, systems integrators, vendors like Redbox, and probably our biggest experience with six years of the buyer. So there’s some good years where Avaya was really. That’s what’s really in the world in terms of contact centers. And then came to Red Box in two thousand and sixteen. Really, because I could see sort of where I suppose there was a gap where collecting all this this valuable information, but we’re only really using it for compliance and regulation and not sort of Do anything more meaningful with a staggering amount of information we’re capturing. And some customers are capturing over a million calls a day. That’s a a staggered amount of content that he’s just being monitored for not used. So, you know, so there’s an opportunity. He was really at the start of this revolution of AI six years ago. There’s a a fascinating career arc to be in during a time of digital transformation because of of all the ways the digital transformation changed a lot of different sectors. Going from analog telephony into the world of digital voice platforms is must really have been an interesting change. It is because I suppose it’s it was quite slow. I mean, that was I didn’t write from the very beginning. And at the height of tech as well. But it these changes happen slower because, you know, voice is supposed to be adoption of the use of voice around the world is staggering. So I’ve got a change from digital from analog to digital to IP. It it takes its time. although I think we are going through another shift, certainly, as we’re gonna talk about today in AI, and that certainly is it’s going through inflection point in a a definite shift of this last year, twenty eighteen months for various reasons, which I’m sure we’re getting to.
Sam: Yeah. Well, we’ll we’ll be taking a lot more trends in the market and AI as we get deep into the conversation. But first, since some of our listeners might not be familiar with Redbox, why don’t you tell us a bit about what you do there?
Pete: Sure. Yeah. So I’m responsible for our product strategy and also our partnership strategy. And in this new world partnerships is really what it’s all about. Turning in terms of providing most value to customers. So Redbox has got a longstanding company agreement over thirty years now. Really born out a compliance recording you know, one of the top five companies in the world that provide compliance recording. And I suppose we the breadth is what we provide is is quite significant. So we’re providing across telephony systems, contact centers, which is again, we’ll we’ll probably talk about a lot today, but also the complex areas, so trading flows. That’s quite a complex tough environment, emergency services, so that sort of command and control systems connected to radio networks. as well as also a mobile network. So it’s really taken all updates around applying compliance policies around it so that our customers can meet their regulations, and that’s what we’ve been doing for about, as I say, last thirty years. We sort of had a change about four or five years ago as we started to see the advent of the transcription of voice becoming or ubiquitous, the quality where it needs to be, the speed of where it needs to be. we can start to see that, as I said, the very start, we’re sitting on this collection of data that’s our customers data that’s not really being fully utilized and not really understanding what values can be driven from it. So last four or five years, we’re starting to position ourselves to be that sort of liberator of this data.
Sam: You know, that’s something we’ve ever seen as ours. It’s our customer’s data, but how can we liberate this data you know, the highest quality, there’s much metadata, and how they’re making we get that data to where the customer can derive value from it. And there’s quite a lot of complexities in doing that. But and that’s what we sort of been morphing. We still do compliance recording and still acquiring in the market. But that need to be that gateway layer to this massive data throughout organizations is is quite a significant undertaking. It’s sort of been shifting our organization the last four, five years for them. Hence, why partnerships is important. Mhmm. So, yeah, we’re the liberator of it, but then somebody has to do something with it. that really that message really resonates with me. The framing of of your product as a liberator of the data. Our experience at Deepgram is that voice is this this huge untapped data source that’s constantly being generated and and used pun intended fluently by people who prefer to speak to each other when they need to communicate. But machines have really been locked out of that. And so liberating the value and the content of that speech data so that it is available to machines and available to AI is a a theme that comes out quite a lot here as well.
Pete: Yeah. Sam, what people don’t understand is, yeah, I think everything since voice move to IP that is quite easy to extract. There’s two challenges there. The actual IP side of it isn’t that difficult. and there’s some semi standards out there for capturing it. Mhmm. But the audio data is only is only one part of it. The next piece is also metadata. So what’s that what’s that conversation about? And there’s just no standards for that piece. So that’s what makes this quite complex. Yeah. Wait. Especially most customers have multiple different systems and collaboration systems like Zoom and Teams as well as a contact center. as well as some telephony system as well as some mobile systems. So, yeah, it’s not straightforward. Not at all.
Sam: Yeah. It’s really a a frontier of research for us and for a lot of companies to figure out how to understand intent and understand topic and identify or, I guess, think about how you would summarize a conversation that’s being had between people.
Pete: Yeah. That’s it took a point, sorry, in the early days, That’s what’s in in in certainly in terms of things like transcription, you know, people we were, let’s say, taking high quality audio in stereo because you need that to understand both sides who’s who’s speaking. Mhmm. A lot of the analysis was based on keyword searching. Right? Since then with emotion, then take care of that contextual stack to really get to understand that what’s happening in that conversation. in terms of intent and real understanding. So you can then start to accelerate your understanding of either your customer or your agents. we’ve seen that shift almost four, five years. It’s been quite significant. No. You’re you’re absolutely right. I and I would dig in there and say one of the things that we research a lot of Deepgram currently is the idea that keyword searching as an approach for understanding which has gotten us this far at least is really inadequate for the needs of businesses, the needs of people generally because you can say the same keyword in two different intonations, and it will mean something very different. And a human knows how to how to understand that intrinsically, but a machine does not until you have trained an AI system to be able to disambiguate. Yeah. Absolutely. And it’s quite a big difference out there in those stuff, you know, just doing the basics. And to be fair, the basics are good as well. And it’s good to use basics for the basics, but I think once we start getting good conceptual analysis of the conversations, that’s where you really are starting to drive the value, and we can go and store those use cases to move along. But, yeah, really interesting to see. And it’s happened quick. You know, in real terms, you know, I think going from TDM to to IP years. And so people are still on, you know, old old telephony lines. But certainly, in terms of AI on machine learning, really has accelerated. Definitely, certainly, last two years. Yeah. We did a recent survey that shows it’s accelerated even more than we’ve even filled. You know, we’ve started really seeing it drive really quickly. We did a recent survey about a a it was a plan sort of a campaign called being human. you know, how can machines get to the point where you’re replacing some of those activities that humans are doing? And we’ve seen over eight hundred customers that thirty percent or an average that we’re using, thirty percent we’re using AI today or voice AI. which I thought was quite staggering because it’s it was nowhere near that four years ago. Yeah. Really is. I know that you at Redbox, you work with a number of different sectors. health care, public safety, government, financial services, others as well. What common themes or what common needs do you see across all of those varied use cases? I suppose in terms of oh, jeez. That was on my phone. Twelve calls. Sure. I thought it because once it uploaded, why it was ringing. Let me just take it off fully. So I think the the the I’m I’m I’m just gonna ask the question against that we have a quote rate before it, and then you can go into your answer. So I’ll I’ll do just a second of quiet for a sec. So I know that you serve a number of different sectors. health care, public safety, government, financial services, and others as well. What common themes do you see in the use cases or what common needs do you see among them? Sure. I think we’ve we’ve gone through quite a lot of use cases over the last four, five years. And, you know, we a lot of what we’ve done has been in compliance all those use cases in the early years was around compliance and surveillance of traders and of agents. What we’ve seen more recently in the last two years than some explosion is really. in the concept sensors around, you know, knowing your customer and customer service, so the whole context of how you deal with your customer and how you provide service to your customer on one side. And there’s quite a a plethora of use cases and requirements there. And on the other side is how you really get more out of your agents. So essentially, how do you make every agent, your best agent? whether that’s through analysis or also real time coaching. And that’s that’s probably one of the biggest shifts we’ve seen more recently is the agent side. as opposed to just the customer side. Most people start on the customer side and then move down. But those are the two key pieces because historically, you know, the customer side was done through CCAT and MPS’ customer surveys. And the agent side was done through quality management systems, our spreadsheets, and and listening to phone calls. you know, neither of these two things are are representative across the whole organization. Mhmm. And they can’t analyze every customer either. So they’re quite restricted and quite specific in areas, and quite manual, and quite labor intensive. So to sort of geared up to look at, how can we automate, how can we do that better. But those are the two key themes. there’s lots of neat pieces here and there, but those are the sort of two blocks that we see moving forward. Interesting. So it sounds like some of the value that you provide comes from asynchronous speech processing, but there are other things that you provide that really depend on a real time use case. Is that right? Okay. Yeah. Real time. Real is we started looking real time probably good three years ago. The problem was there was nobody there really to do do anything with it. So we could provide it and support it, but then you need some application to take that and do something with it and do some kind of next next best action. But more recently, agent assist, we’ve seen exploding. And you’re gonna think, yeah, what’s some of the biggest challenges in contact centers? It’s attrition of agents and training of agents. As I said before, you want every agent to be your best agent. And the only real way of doing that is through training, coaching, whereas real time analysis and conversations gives you for every single agent in your context and it’s a a way of coaching and guiding agents in real time based on a set of metrics that you’ve seen over time is the best. I suppose the best way of an agent can perform for a customer and also supplement training because there’s that training time when new agents go on board. It’s Once they get it completely right, there’s a staggering innovation and cost justification business improvement going forward. because, you know, we we deal with contact sensors in thousands. You know, the attrition and the training is a is a complete nightmare. It really isn’t. in no ways every agent, their best agent, nowhere near. And this gets you a step further because, you know, when your QM is good, workflows, for monitoring and managing your agents is good, but you can only take a selection. You know, customers, as I say, are taking up to a million calls a day. it’s impossible. If you had thousand supervisors looking at calls, it’s just impossible. We sort of saw that in the early days in trading flows. It’s straightforward to take, you know, a staggered amount of calls, a very short calls. And as part of regulation, they have to monitor those calls. And again, And when we miss it in the early days, that sort of boundary moved from, I think, from five percent to potentially towards twenty percent. And it’s just impossible you have to automate. That’s sort of forced it. But now real time agent assist. Yeah. It’s very interesting, but the challenge then for every step of the way. So for ourselves and yourselves is, you know, can we have the least delay in that chain? Mhmm. When I talk to the very staff on milliseconds, it’s really important to us. Yeah. can we shave off fifteen milliseconds on, thus, we send it to you to Deepgram. And then to Deepgram, you know, analyze that, transcribe it within, you know, sub second and in milliseconds before it then goes to the analytics. So all that chain And, you know, if it goes over at at two hundred and fifty milliseconds, yeah, you’ll see that interruption of the desktop. So, yeah, it’s it’s a very interesting area in real time. Yeah. We’ve seen a a dramatic uptick in interest in real time applications. You know, our real time service that we offered at Deepgram over the last couple of years And the two things that really stand out to me are both how critically important the response time is, and we put a lot of effort into making sure that we’re the a solution we can possibly be the fastest out there. And then but really getting it getting a highly accurate transcript the first time, including the ability to get custom branded terms or industry jargon correct in that in that live real time transcription that’s being delivered. Those are very interesting challenges to us, some things that we know are really critical to making those systems worthwhile to customers. Absolutely. Yeah. So just yeah. It’s the speed. It’s the accuracy. it’s configuration. So you’re gonna be able to put in those custom terms and that’s not easily available. And and there’s sort of the compute size of the don’t forget, there’s a cost at the iQ as well. whether it’s in the cloud or on premise, the cost factor is just as crucial to customers. So it’s getting a compute footprint down because well because now we’re doing something we’re all doing something that’s quite CPU intensive, whether you, you know, you’re doing the ASR in your own. We’re doing the capturing hours or you’re doing the analytics. somewhere in the cloud. It’s staggering the CPU intensity. So you gotta make those services as perform as they can be. and, yeah, the milliseconds count. I know in real world, you know, you’ve gotta you you need a a portion of a call to make some sense out of it. But again, you’re still gonna make each piece as perform as it can be. And we’re a small player. So here we go up against my big player. So for us, we have to be the best No. It’s it’s a it’s a difficult world because, you know, when you say, do you have to send it that quickly and perform that quickly? Possibly not, but it improves the downstream, vendors like yourselves, and the analytics. Mhmm. So you can look for the best as possible. Let’s dig into into that point a little bit deeper. What else is difficult about what you deliver? Obviously, you’re right at the cutting edge in order to compete in the the industry that you’re in. And you’ve talked about speed as one of the things that is difficult and that you need to deliver on. What are the other things that make it difficult to do what you do? Sure. I think so there’s two or three things you need, certainly, in our space, that flows downstream to DSL and see analytics. If we haven’t done our job, it really makes hyper d downstream and that look very good. Mhmm. So firstly, we have to capture it at high quality. So I’m compressed. And and for that, you need to then have the flexibility to where you capture these audio conversations, you know, because sometimes you’ll be holding to the platform you’re capturing from. but you can mitigate that by capturing that that traffic at different points in the network. So that’s step one. Two, you have to be capturing in stereo. if he’s talking about audio, it has to be in stereo because you have to guarantee who’s speaking to, who’s speaking at any one point in time. unless you’re doing a story, you can’t guarantee that. You can try, but you can’t. And again, you again, that’s is about flexibility where you capture and where you can capture stereo. To some point, that call is in stereo, typically. Mhmm. and then the metadata. And this this is the challenge because all, technically, contact centers, trading floors, They all operate completely differently in terms of metadata. There is no standard. And it’s all different and be able to marry that metadata to work with that. that conversation is easy when you’ve got ten or a hundred calls. But then when you’ve got a midlink calls, that’s really, really difficult. plus then if customers have, as I say, they have a front office contact center. We have a a back office telephony environment. We’ve also got teams for their collaboration, and they’ve got AT and T plan mobiles. How do you then track? And a customer that’s maybe gone from the front office to the back office are gone through it to a mobile, tele worker or mobile field worker? How do you track all that? And all the associated metadata when there’s this much load on the system? And then So you’re doing that in real time, and then how can you get that in real time to where the customer wants it to go? Which is weird to Deepgram to transcribe to their own. Or even Postal, Yeah. Doesn’t it says the same problem caused by a real time. Real time stands. Another performs hitting that. Another thing that I imagine is really difficult working at the scale that you work is that when you’re talking about a call center that processes processes millions of calls you’re dealing with almost that same number of different speech idiosyncrasies. People calling them a different accents and dialects and and to be able to take that sort of complexity and and distill it down to text that is accurate. It is actually quite a difficult thing. Yeah. But that’s really where you Deepgram comes into play. You know, that’s much red box. And I think I think the the challenge that we’ve seen over the last four, five years is that if you don’t mind me saying, there’s a finite capability in those that have gone down the open source rooms — Mhmm. — in terms of how fast and how quickly you can do that. to the level of accuracy that the market now needs. I think that for me anyway, from the Deepgram build is from the ground up for this purpose. it allows you to scale. As I say, concert centers are not really getting any smaller. For for those that are saying your voice is is is not really the main channel anymore. He used to be going down about a percent a year for many years, but then since the COVID period, we’ve seen it go up. And because you got one more online organization, organizations, yeah, concert sensors are growing bigger and faster. So the amount of data recapturing delivery is just staggering. If you put it in this way, each call takes I don’t know. It’s around twenty or thirty thousand packets of data. on average. So put that against a million calls. So there’s billions of packets a day that we’re working to capture, formulate, center Deepgram for you then to build your models against it and get all that sent out in milliseconds. Yeah. That’s the difference, I think, in this new world is is is the levels of volume that we’re seeing now. It’s a fast and fast what we used to see years ago. And we we yes, we operated a lot more in the mid mid market space historically. And also in the tier one enterprises now, But even so, we’ve still seen that shift, especially now certainly there, you can do a lot more in your contact center. and especially if you’re making every agent, the best agent, you know, sales performance, which is a big driver for AI at the moment. is sort of getting quite a big resurgence. Mhmm. Yeah. I I can really see clearly that companies that don’t adopt the latest speech technology in their call center, they I would expect them to see a hit to the performance of their agents as time goes on. There there probably a lot of ways that they will start feeling pain if they don’t adopt these innovations as they come out. Yeah. It’s a difficult one, isn’t it? Because there’s lots of reasons why you would do it and there’s lots of benefits, but you are making what was done previously a lot better and a lot more complete a lot more as well as consistency. So your consistency before was in your surveys. and you see such scores, you feedback from the customers via whatever means you are doing or through your manual monitoring of your agents. But the consistency of it comes in in terms of how you’re communicating to that customer in that in that instance or how each supervisors are reviewing those agents. There isn’t inconsistency Whereas from machine perspective, it’s always consistent. Mhmm. It has a set of rules. It’s built up a set of trends across data, across the content center, don’t forget. It’s not just data across the top of your contact centers. It’s only CRM data as well where you have to hand supervisor doesn’t have that. You know, customer doesn’t have that. So it’s it’s putting all this extra data together along with a personal conversation. You’re getting for formatting, adding alignment to the trends and you getting consistency. And also, you’re also what we’ve seen, actually, one thing very interesting we’ve seen with the customer very only. We had an issue with settling this data out. Yeah. We’re we’re we’re nobody’s perfect. and we noticed the effect it had on customers day to day operations. So we’re using one of our partners AI partners was analyzing it. and it affected the way they could move agents and workload and messages out to their customers. instantly. It it affected them completely. And we didn’t realize, you know, the impact this was having our customers. That’s what we saw. So when we failed to deliver some audio for period of time for there’s a link issue between ourselves and the cloud. But, yeah, it affected their ability to be reactive. if you think can you think of, you know, manual based performing concert centers, can they really be reactive that well? Right. because the concert centers could be spread across multiple locations. especially these days, most of the work workers are at home. So are they seeing the trends that are coming in? Something’s happening and they need to react? AI sees it. Yeah. Sorry. Those all pretty cute, pretty hot. No. Yeah. I I actually wanna go there next. You you’ve mentioned a couple of times there’s been dramatic transformation in the market due to AI recently. What do you see there? And how does AI fit into your solution? We’ve seen a massive ramp up. I suppose we’ve seen it because we as well as we’re still doing compliance recording, but we’ve go into the self fulfillment engine with this plumbing or this gateway to give everybody access. Even if they got existing recording platforms in there. We still come in to provide access to high quality audio and so forth and metadata. We’ve seen a massive ramp up of that. And I think it’s we’ve seen it over the years, but I think what’s happening with a lot of the partners that are doing the analytics is they’re starting to get to a level of scale now that we didn’t have before. So I think it was for three years. They were doing projects, and they were taking departments. And the starting out to see that we’ve had such an impact in those departments. It’s going wider. So that’s the way they come to us. I think when it’s on a small scale, they’re able to get, like, this information in batch are collated, you know, manually and so way shape or form, and it sort of works. But once you start to get into the five hundred plus and thousand seats, that you you you’ve gotta be getting automatically and a lot more consistently across the enterprise. So we’ve seen it because of that. Although these companies have been around for the last two or three years, they started stepping up in how much they’re delivering now. So they’re delivering millions of hours of analytics to you instead of thousands. But I must admit, it was quite surprised to see the survey we did. Because the eight hundred customers, I didn’t expect thirty percent of those eight hundred customers to already deploying these technologies. So I suppose the danger for those that aren’t is you know, feel left behind to a certain degree. Adding current greater cost because you’re still doing manual and you’re still touching part of your contact center. A whole lot of my cost centers. So, you know, the costs are are quite important. Mhmm. Anytime I am thinking about I’m also thinking about the problem of bias that gets introduced into AI systems based on the data that they’re exposed to through training. How do you think about bias and how do you manage it? So that sits it sits out of our realm. So we, you know, we’re fulfilling the data. All we can do is ensure that you know, we a, we don’t miss a call because we come from the compliance world. And that’s for us, that’s a really important thing. It would be the sort of the highest quality that you know, there’s no mistakes in the transcription or it’s as good as it can be. And it’s in stereo, so you know, you haven’t got any issues in terms of who’s speaking. but also the extra metadata. So we can help. We can’t as far as we can’t deal with it anymore than just provide everything as as accurate as we can do. to understand. Yeah. It so in the ecosystem that you are part of, it’s really a responsibility of the end recipient of of of your products to understand how bias may be present in the sort of analysis that they’re doing, I imagine. Yeah. I suppose it it’s also learning curve for them because they’ve all come up through, you know, quite basic analytics on the data to really getting far more contextual with it and and taking more feeds of data rounding that. So they’re not getting by ourselves. They’re not getting incorrect results out of it. And they’re taking bigger data sets now. As you know, bigger data sets really improve the accuracy of what you’re delivering. So I suppose that’s what we’re saying, but it’s it really has only been this last year. I think in terms of the big data sets. It’s been quite small. And so, yeah, I think a virus is changing somewhat. Mhmm. Good. Something else you mentioned earlier that I was interested in is was a statistic that said that for a while, it seemed like the use of voice was decreasing over time by percent a year or something like that. Yeah. And yet, it seems as that that trend may be reversing itself or voice may be having some sort of a resurgence now. What do you see happening in that in that front? Well, it definitely was. If you looked at the analyst reports, you seem to be going down about seventy eight, which is not Exactly. You know, it’s not massive compared to the the amount of actual, you know, talking systems that are out there. But certainly, yeah, over the last two years, because of things like code that we’ve seen come back. But I think what’s compounded that on top of that then is they’ll then see what they can do with this data. This still is Although we did this research and it was thirty percent using it, I still think it seems to have like art. A lot of our customers don’t know how to get access to their audio codes. So I think the fact that that’s changing, and let’s see what other people are doing. That sort of fostering that this is a lot of interesting data, and we can actually improve our sales, improve our customer services, improve our relationship with our customer without deploying these channels that perhaps are not giving us closer, a relationship or as closer an interaction as voice would. especially if you can make all your agents, your best agents. That’s a big change then from having the concern that an interaction with a poor agent is worse than an interaction, let’s say, with a bot. Yeah. That that’s a really interesting point. I had not seen that statistic that you cited before. However, I have been aware of this sort of seeming cultural narrative around why it is that people might move away from voice. You know, it’s more convenient especially for the youngest generations who don’t wanna get on the phone and have a live conversation. Maybe they wanna use a chatbot or something. And certainly, that movement helps certain kinds of products to improve themselves, but we didn’t see that it completely crushed voice. You know, there there is absolutely a stable need and a need that may be growing over time for really, like, quality voice
Host & Guests
