This is the transcript for “The Evolution of Conversational AI in the Car and Beyond,” presented by Shyamala Prayaga, Senior Software Product Manager at Ford, presented on day one of Project Voice X.
The transcript below has been modified by the Deepgram team for readability as a blog post, but the original Deepgram ASR-generated transcript was 94% accurate. Features like diarization, custom vocabulary (keyword boosting), redaction, punctuation, profanity filtering and numeral formatting are all available through Deepgram’s API. If you want to see if Deepgram is right for your use case, contact us.
[Shyamala Prayaga:] Thank you so much, Bradley. Hello, everyone. I know a couple of you, and it’s great to see you guys. And for the new folks who are out there, welcome. It’s a pleasure meeting you all. My name is Shyamala Prayaga, and I am the Product Owner slash Manager for Ford Motor Company. And I lead the autonomous digital assistant effort, so I look at all of the conversational AI work for Ford. So I launched the gen four of SYNC and worked on the Alexa integration, and now working on the autonomous digital assistant. So when I’m not working, I love doing gardening, and that’s my garden on the left-hand side and the produces from my garden. And fun fact, I don’t own a pet, but I love petting dogs. So if you see me outside, you know, petting any dog, that’s because I love dogs a lot, but I don’t own one.
Anyways… so conversational AI products are everywhere, and we have seen it. And it’s not new. We have been talking about conversational AI products throughout the day-to-day and how the evolution has happened, right, and all the great things happening, the technology, and all of those things. And we’ve also seen, like, conversational AI products are out there at so many different locations and areas, like, from your home to the appliances you use and also in the drive-through or kiosk. In all these areas, retail is also picking up. So conversational AI is definitely picking up a lot, and a car is not a new space as well. And we have seen voice assistant in the car. It’s interesting to look at these statistics, which says, like, out of the two fifty nine million people in the United States, one twenty seven point five… or one million folks are using voice in the car, and this is more than the smart speakers in the house or people… utility of the smart speakers.
But I would tell you one thing. Voice assistant in the car is not new, and you’ll be surprised to see the statistics and the history that voice has been there even before Amazon was launched or even before Siri was launched. So let’s look at some of those things. The first voice assistant was launched in two thousand four, and this was Honda in collaboration with IBM. And they were the first OEM to launch the voice assistant. Although it was not a full-fledged voice assistant, it was a voice navigation system. And, basically, what it did was it was more of a command and control. It allowed the user to call someone or, you know, go to a POI and stuff like that. The biggest drawback with the the system was natural language was missing, so it was more like command-based thing.
So you have to say something exactly as it is entered here, which is, like, call the name or dial number or go home, go back. So it was a lot more structured-based, command-based system, and the second drawback with the system back in two thousand four was it was too screen dependent. So people did not know how to design a multimodal system back then, so it was too much screen dependent, which means, like, you are distracted. But the whole point of voice in the car is to reduce the distraction, so you have to think about how to do that. So Honda tried, and they kept improving eventually. In two thousand seven, Ford was the one who launched SYNC in collaboration with Microsoft, and this is when they actually designed an assistant called Samantha, which was able to make calls and text and controlling music and do all sorts of things. Again, with Samantha, the big problem was, again, it was not natural language.
System and technology were still improving, so natural language understanding was something the industry was still evolving and learning and improving. So that was another thing. So after that, not that from two thousand seventeen to two thousand thirteen, nothing happened, all the companies were trying and improving. And there were lot of other things happening, and everyone was trying their own assistant. But in two thousand thirteen, Apple changed the game when they launched CarPlay for Siri in the car, and people started using voice assistant in the car like they would be. And this is when the embedded systems, you know, technology came into picture where you can just embed your phone. So Android Auto was another thing which came eventually. So this gave people so much flexibility and freedom to use their phone in the car, like the way they use and all the things including the voice. Now two thousand thirteen… and we know two thousand fourteen was a game changer in so many ways.
And it is worth mentioning Amazon here because this changed the entire dynamics for a lot of different industries. Amazon launched Alexa in two thousand fourteen, and after that, you know, a lot of things happened. So after the launch of Amazon, companies were still working, and there were a lot of OEMs who are trying to think about what to do. And Ford was the first company who actually integrated Alexa into this navigation system where people can… it was not exactly, like, embedded in the system. It was a cloud-based system, so…
Ford was able to make a call to the cloud-based system. Because the cars became smarter, they had connectivity, so they were able to do that. But at the same time, what happened was users were now able to say, turn on my car or open the garage or lock my house, and all the things Amazon is capable of doing from the comfort of their house, but also through the car. So that was a game changer.
And then Amazon and Ford had a lot of other integrations as well, including FordPass and stuff like that. After that, two thousand eighteen… because by now, voice became, like, the forefront of everything. Almost everyone attempted to do something in the conversational AI space. So almost every OEM started building voice assistants in collaboration with Cerence or SoundHound or any other player out there, and that’s when you know Mercedes also launched Mercedes-Benz User Experience, which was their voice assistant. So a lot started happening. Not that Cerence did not exist before that or somehow did not exist. They were also working with these OEMS and trying to do a lot of things, and we’re improving back and forth along with the OEMS. But two thousand eighteen start… really started seeing, like, lot of more adoption and lot of more acceptance in terms of making the assistants better. Now two thousand eighteen, it was interesting because Amazon was like, we have a device at home, and then it’s great. But then now we want people to use voice in the car, and they also did some studies. And I think there was a survey they did they did with J.D. Power, where people said they want a voice assistant like they use in the home, the same assistant in the car.
So Amazon thought, what should we do? And they designed something called Echo Show, Echo Auto, which was a small device which you can plug into the car, and you can use all of the Amazon features into it. But then, the biggest drawback with this thing is, although it is amazing, it can play music, and it can turn on your garage or home control system and all of those things, it would still not be able to do the in-car controls like changing the climate in the car or starting the car or all of the things which a car would need, not, like, just bringing in that device. So still, people used it, and it was only an invite-only program for a couple of people, which went well. But then after that, Amazon did realize, like, this industry is growing bigger, and lot of different companies are also getting into the space. And by now, what started happening is there were lot more companies in the conversational AI space who were trying to get into the cars, so the space became bigger.
So they did a partnership with lot of different companies, including Cerence and Baidu, BMW, Bose, and lot more companies, and they created an initiative called Voice Interoperability, where they thought, like, multiple assistants can reside in the vehicle. It doesn’t have to be a competition. They all can reside together, and depending on wake-up word, one of them can be invoked. So this is an effort which is still going on, and they are working together to come up with a solution to make the system more interoperable. Now as this is happening, Amazon also realized, like, the pact will not work in the car because it is not able to control the system, so they launched something called Alexa Auto SDK, which is a software development kit deployed in the vehicle. Because another problem with automotive is not all of the vehicles comes equipped with a connectivity. Right? So then how do you make the system work? You need to have embedded things in the vehicle, which is able to control your in-car system, like climate controls or your car in general like starting the car and stuff like that. So they came up with an Alexa Auto SDK, which is able to do all of the Amazon things, what Amazon is capable of doing, but additional things which are embedded to the system. And this SDK resides into the vehicle and then is able to control a lot of things because it is directly connecting to the vehicle modules and is able to work.
So this was interesting. So when they launched it, Lamborghini was the first brand to leverage this SDK and create their voice assistant. Although Ford was the first to use the cloud-based system, they were the first one to integrate the in-car… car voice assistant using the SDK. And after that, General Motors in twenty twenty one also gets a voice assistant in collaboration with Google because Google also came up with something called Google Auto Services, which is another SDK in the vehicle using which… people have… be able to get the entire infotainment system, but also the Google Assistant and create these actions into the system. So this has been a great evolution so far, as you can see, from two thousand four, where we did not have a natural language system. And it was so screen heavy, and everything was embedded. Software updates were so difficult back then because they were not possible at all. You have to get a USB to get an update, and all of those things were complicated, so you are stuck with that… the only voice assistant which was there without any updates to the point where we have software updates. We have hundred percent connectivity. Users have so many options to choose from.
We have come a long way. And I would say, like, as we get into the future, there are many platforms, and we heard from couple of vendors today about their technologies and what they have been working on. And that is a good news, because that is going to help feed into the automotive segment as well as other segments as well to create a much better system. There’s lot of technology advancement happening, and we have seen that as well. Right? Systems are becoming more natural language. The speech recognitions have much more accuracy, and they are able to understand people better, more accents, and stuff like that as well. And we’ve also started to see that there are many more use cases, like voice commerce and voice advertisements, and these kind of things will also pick up pretty fast.
And as we start to think about these things, at some point, we have seen this, and we have heard this. Like, how can we start using sentiment analysis or emotions and stuff like that? So we’ll start to see, like, as the technology has matured to this point, we’ll start to see emotion control units, which are, like, small chips deployed in the vehicle, which is able to read the users’ emotions through the cameras in the car and then able to help the user, because now these systems are smart enough to tune and change the responses based on the user’s emotion like a normal human being would do. So it’s pretty much possible that we’ll start to see, like, the evolution of car would start to have these kind of things as well. And then we’ll also start to see more humanized technologies. There’s text-to-speech engines out there, which are becoming more humanized, but you are seeing lot of more humanistic voice in there, a lot of different emotive voice in there, and stuff like that. And the same thing will start to happen where technologies will evolve more and become more inclusive in terms of how they understand or they interpret different kind of users. So with that, I wrap my talk, and thank you so much.