
AI Minds #066 | Ilan Avner, Director of Product Management at AudioCodes

Ilan Avner, Director of Product Management at AudioCodes. AudioCodes is a global leader in unified communications, voice, contact center and conversational AI services and solutions for enterprises, enabling them to improve their customer experience (CX) and employee experience (EX) through enhanced communications and collaboration.
Ilan Avner is a seasoned product leader with deep expertise in voice, telephony and voice conversational AI. As Director of Product Management at AudioCodes, he drives the development of AI-powered voice solutions that connect contact centers with conversational platforms, enabling intelligent voice bots, live-agent assist tools, and seamless customer interactions. Ilan is passionate about transforming how businesses use voice to engage, automate, and scale customer communication.
Listen to the episode on Spotify, Apple Podcast, Podcast addicts, Castbox. You can also watch this episode on YouTube.
In this episode of the AiMinds Podcast, Ilan Avner, Director of Product Management at AudioCodes, unpacks the evolving landscape of voice AI in enterprise settings.
Ilan shares how organizations are embracing AI-driven voice technologies—from call summarization to full-scale conversational IVRs—to streamline operations and improve customer engagement.
He explores the practical challenges of integrating speech-to-text and text-to-speech in multilingual environments, and how AudioCodes helps bridge the gap between telephony infrastructure and AI platforms.
The conversation highlights the step-by-step journey enterprises take to scale voice AI, and how platforms like Voice AI Connect and Live Hub are accelerating adoption.
Listeners will gain a behind-the-scenes look at how voice AI is reshaping enterprise communication, from agent assist tools to real-time automation.
Show Notes:
00:00 Navigating New Tech Confusion
05:32 "Call Summarization Enhances Efficiency"
07:07 Contact Center AI Integration Solutions
10:23 Translation Challenges in Multilingual Contexts
15:24 "Bot Setup in Five Minutes"
17:07 Understanding AI Agents Terms
More Quotes from Ilan:
Demetrios:
Welcome to the AI Minds Podcast is a podcast where we explore the companies of tomorrow being built AI First. I am your host, Demetrios and this episode, like every other episode, is brought to you by Deepgram, the number one speech to text and text to speech API on the Internet today. Trusted by the world's top enterprises, conversational AI leaders and startups, some of which you may have heard of, like Spotify, Twilio, NASA and Citibank. In this episode, I am joined by Ilan, Director of Product Management at Audio Codes. Ilan, how you doing?
Ilan Avner:
I'm doing great. Busy, I assume. Just like everyone else in this crazy changing world.
Demetrios:
I want to get deep into all of the AI stuff you're doing, and especially the voice AI stuff that has been on your mind. You're a product guy, so we can attack it from that angle. I know there are many ways that you can incorporate voice into different products or the advantages that AI play in voice and this new paradigm that we now have, almost like a new toolkit to draw from. What are some ways that you've been seeing customers of yours get the maximum amount of ROI from AI?
Ilan Avner:
In general, I can say that everyone here is about new announcement, new technology, new providers, almost on a weekly or a monthly basis. And when we're speaking with our customers, we see confusion because they keep on seeing changing things and they want to start with something. They see also a lot of demos, very exciting demos. And then they have their own telephony system, their own contact center, and they're asking, I see a lot of things, where should I start with? And the ways that we're seeing our customers starting with. And also our recommendation is to start with something simple. Now, when you speak about voice, conversational AI, and this is what we're doing voice, conversational AI. The first thing that you may think about is voice bots, but you don't have to start with voicebot specifically. If you are a large organization that have a contact center, one of the things you can start with, and we're seeing customers doing that, you can start with call summarization, for instance.
Ilan Avner:
And the reason to start with call summarization is because it has a clear roi. And also it's very easy to do. It's almost out of the box. Now we have seen large customers starting with the call summarization and starting deployments from the time they started testing. In few months, they're already in production. And this is because that depends. Yes, There are challenges for the speech to text and we will speak about that and also the call summarization.
Ilan Avner:
But LLMs can do summarization pretty easily. It's almost out of the box. There are challenges to connect to the telephony system, There are challenges to connect to do the speech to text. So this is one use case. Then it can be basis also for building voice bots because you already have all the knowledge, all the data that you can train, all the transcripts, you already have them. And then you can turn to voicebots and deploy conversational IVRs and then go to voice bots and then do real time assistance to agent, not only course summarizations. So these are the type of things we're seeing our customers are doing.
Demetrios:
It's almost like a crawl, walk, run approach where you start with the summarization that's a crawl. And then when you want to get to walk, you can go a step further and say I want to get a voice bot in. And the run almost is probably like the agents that are taking action. Is that how I understand it? Do you see it that way?
Ilan Avner:
You need to start with something and the best is to some to start with something simple. In parallel, you can also start with something else. But you need to start with something and then you have the experience with playing with that, with doing with the technologies, with the speech to text, with LLMs, with the conversational AI platforms. Then you can take the next step of trying to starting to walk and doing voice bots and conversational IVRs and other things.
Demetrios:
So there's a few use cases when it comes to the call summarization that have become very common. I think one is customer support and on the other side you have sales. And so plugging in call summarization there is fairly common. Have there been any use cases that have surprised you.
Ilan Avner:
About call summarization? I think what we're doing, it's no surprise, because we're speaking with let's say large contact centers and they want call summarization basically for multiple reasons. But the main reason is to there is a wrap up time of the agent dealing with customer queries. So after the call he may spend five minutes or maybe more to summarize the call and then call summarization helps him with the wrap up time so it can minimize the time consumed. Let's say instead of five minutes to one minute and it will be more accurate, it will be easier for the agent as well. So it's not that there is a specific use Case for that it can serve any contact center.
Demetrios:
It makes sense. And then you see that there's a clear ROI there because of the time saved.
Ilan Avner:
And then you also have for contact center, you have offline analytics. Once you do the real time transcription, you can store it and afterwards you can do not only call summarization, you can do any analytics on what's going on to better understand anything going on on the contact center. And with LLMs, it's again very easy to do reason for the call, sentiment analysis, qi for the agents and so forth.
Demetrios:
Fascinating to think about. Now, what are some ways that you're plugging in with customers?
Ilan Avner:
Let's say you have a customer that wants to do call summarizations, he has a contact center. And the first one of the things he will ask himself, how do I integrate? Let's say I want to use Deepgram as a speech to text and I have my own LLM and I want to integrate that with my contact center. And this is where customers are approaching us because we are experts for voice over IP telephony and voice in general. And we can do all the connections needed. So we can do the voice acquisition in real time from any contact center and then use Deepgram as the speech to text and send it to the LLM for call summarization. And then this is how the integration is done. Of course, this is not the only use case. We help with other options as well.
Ilan Avner:
And we help with voice bots implementation, conversational IVRs and also real time translations. And the way that customers are using our services is that the main reason is for all the telephony infrastructure, the connectivity, the voice, the orchestration, and maybe afterwards we can also speak a little bit about Live Hub. That gives you even much more options.
Demetrios:
You mentioned before that the text to speech aspect is still in that run phase and there are issues there, or there's caveats there. Can you go into what you've been seeing in the wild?
Ilan Avner:
I can just give you an example of a call that I had just a half an hour ago. And in this call someone is trying to do call summarization. I will not mention the names of the customer. It's a large bank in Europe and there have multiple languages and one of them is Latvian. And for the speech to text, they're trying to find the best speech to text for a Latvian. It's not easy.
Ilan Avner:
And also there are more things, even if that's Spanish, for instance. Then you have the enterprise jargon Sometimes you may use names and the alphanumeric and we see issues with that as well, that the transcription is not that accurate. And then we're seeing customers using various providers coming to us asking us how can I optimize my speech to text detection? And this is when we involve providers like Deepgram.
Demetrios:
And I can imagine the jargon in the business context. You have not only the vertical specific jargon, but then I lived in Spain for a while and I know that the Spanish love to use English words but say them with a very Spanish accent. So for a speech to text provider to be able to figure that out, it's not like they're saying in English or they're not speaking English. They're just saying an English word with their own flavor to it.
Ilan Avner:
I can tell you the most complicated part. Another thing we're doing now is written translations and this is the most complicated one because you have multiple things that may fail. It may be the speech to text, the LLM and then the text to speech. And one of the things we see, and this is why we consider to use Nova 3 for that, is that exactly what you're saying, that a customer may speak and then let's say he's speaking Spanish, but he may speak some English words within his Spanish and then he can speak, he can say his email and then so there are multiple challenges just like the ones you mentioned. And I assume this will be sold and maybe already sold with Nova 3 that has the multilingual option.
Demetrios:
I know that folks in India were mentioning too. Hindi is a very hard one because in Hindi there's such a mix of English and Hindi and you have one or two words that are coming in English or sometimes it's an entire sentence that comes in English and then you switch right back to Hindi. And so a lot of times the speech to text model will get triggered when something switches to English and then it will have the whole rest of the conversation be totally off when it switches back to Hindi.
Ilan Avner:
That's exactly like you said for Spanish, but much worse for Indian.
Demetrios:
Now the part about making text actually speak, what are some things you've been seeing in that realm?
Ilan Avner:
So also here we're seeing issues going back to the real time translations in case you need to say, let's say numbers. Okay. So we have multiple options to say the numbers and you may use SSML tags, but it's working with some of the providers and maybe some of the voice names, but it doesn't work for anything. So you need to understand how to fine tune the text to speech engine, that it will say exactly what you want him to say. More things that we see by the way, for instance in Hebrew, we don't have a good text to speech engine in Hebrew. So we also have issues with the languages, issues with how to say the things. And we're seeing very good voices lately from multiple providers, including Deepgram.
Ilan Avner:
So it's getting better and better.
Demetrios:
The voices are becoming more natural. But again, there's these almost like hacks that you need to do and you need to know about. There are still roadblocks that you hit as you're trying to implement it. Like we were talking about with the language switching, or if you have certain accents that you want to be able to understand, you might have to optimize for them. It feels like if it were easy, everyone would be doing it.
Ilan Avner:
It's not easy at all. Even customers that are working for a while with speech to text, then from time to time they also see some issues and they need to also to understand that there are ways to optimize the detection for the speech to text I'm speaking, and they need to realize how exactly to optimize that. There are multiple ways to do that with each of the providers, including Deepgram. And they need, sometimes they need to optimize.
Demetrios:
Now, you mentioned before a hub that you're working on. Can you tell me more about that?
Ilan Avner:
So what audio codes are doing in the area of voice conversational AI, it's called Voice AI Connect. Voice AI Connect is the option to connect anything related to voice conversational AI from Agent Assistant, correspond voice bots and real time translations to connect it to your telephony system, to your contact centers, to your business flows, and also doing all the orchestration. We can offer that on premise, let's say, as a dedicated instance for a customer. That's called Voice AI Connect Enterprise. But we also have a SaaS platform and the SaaS platform is called Live Hub. And with LiveHub, you can connect bots that you build on any conversational AI platform. For instance, you can build a bot with Copilot Studio from Microsoft, with Google Dialog Flow, with Amazon Lex, with Rasa, and also other providers we integrated with tens of them. You can develop your bots with these platforms and then connect them to telephony and also speech to text and text to speech.
Ilan Avner:
And you can do it in five minutes. In five minutes you can go and Allocate the phone number, make a connection to your bot, define your speech providers and the languages, and then in five minutes you can start speaking with your bot. Lately we also introduced the AI Agents framework that you can build everything on Live Up. So you have the choice either to work with your platform if you develop the platform, because we also have public APIs, or you can work with a third party like the ones where I mentioned before, or you can develop everything on Live up and then you can also choose your providers for the Speed services, including Deepgram. And we're seeing a lot of customers asking us about Deepgram for both text to speech and speech to text. And our expertise is connecting you to any voice channel. So whenever you develop a bot, it's not only that we connect. You can build a bot on Live up or maybe connect it to a third party.
Ilan Avner:
We can also connect you to any voice channel. And this is our expertise. This is what we're doing for the last 30 Years. We can connect it to any contact center. We can connect it to Microsoft Teams. You can do web calling, you can do WhatsApp calling that is coming very soon. You can buy phone numbers on Live Hub and connect them to your bot. You can do escalations to human agents.
Ilan Avner:
So that's a platform that can serve all your voice connectivity on a single platform and also. And you can build them multiple voice conversational use cases on the same platform.
Demetrios:
How do you view the difference between bots and agents? Or are they the same thing in your eyes?
Ilan Avner:
There are so many terms and anyone that is speaking, that is saying a specific term may mean something different when we're saying AI agents or anything I said is about building a voice bot that can serve as a conversation in Live VR or maybe that can save if someone wants to check the balance in the bank or if someone wants to set an appointment. This is what I mean when I say AI agent or a voice bot. For me it's the same thing.
Demetrios:
Makes sense.
Hosted by

Demetrios Brinkmann
Host, AI MindsDemetrios founded the largest community dealing with producitonizing AI and ML models.
In April 2020, he fell into leading the MLOps community (more than 75k ML practitioners come together to learn and share experiences), which aims to bring clarity around the operational side of Machine Learning and AI. Since diving into the ML/AI world, he has become fascinated by Voice AI agents and is exploring the technical challenges that come with creating them.