AI Summit NYC Recap: Deepgram (and Robodoge) take on the Big Apple
This year, Deepgram was a proud sponsor of the AI Summit in New York, a leading industry event that highlights the impact of artificial intelligence on businesses and how to unleash rapid transformation.
We were extremely excited to meet everyone, and we felt so grateful to encounter people whose enthusiasm met (and even exceeded) ours.
However, if you weren’t able to make it to the Big Apple this year, here are some highlights from the Deepgram team.
🎙️ Exploring endless applications with Speech AI
In case you missed it, we showcased how developers can integrate Speech AI into their products across a wide range of use cases. For example, XRAI Glass developed a set of AR Glasses implemented with speech-to-text AI, so that anyone—including the deaf and hard of hearing community—can add subtitles to their everyday conversations. Meanwhile, Badger Global uses speech-to-text to provide real-time translations in over a dozen languages.
Moreover, we had the chance to reveal our latest product, Aura! Aura is our first text-to-speech model for real-time voice AI agents. You can read more about the latest release here.
(P.S. If you’d like to see some cool AI apps—especially ones that revolve around audio and voice technology—check out this apps page.)
📽️ On-stage at the summit: Our take on LLMs
Our VP of Research, Andrew Seagraves, had the opportunity to join several industry experts from Fidelity, NATO, and Vanguard on the AI at Scale stage to discuss the Promise and Perils of LLMs (large language models).
Here’s our take on the promise of LLMs:
“One of the unique aspects of LLMs is as you scale up the model, increase the number of parameters, and train it in a self-supervised way, the model starts to acquire incredible abilities that you didn’t train it to have, resulting in LLMs being able to perform an incredible number of tasks, which would have been done by specialized models before, making them tremendously useful for businesses.” –Andrew Seagraves, VP of Research
Deepgram’s approach to leveraging LLMs has been primarily to build specialized models for speech analytics, allowing businesses to derive value from human conversations. We have several models that companies can leverage for summarization, topic detection, and intent recognition. Our customers derive value from LLMs in voice applications, primarily as a result of having high-quality LLMs and high-quality and scalable ASR (automatic speech recognition) models like Nova-2.
Several of the application areas include:
Post-call analysis – LLMs are used in place of specialized NLP models. Enterprises are leveraging Deepgram’s scalable and highly accurate transcription capabilities in combination with LLMs to analyze massive amounts of call center audio.
“Co-pilot” tools – LLMs are used to augment or enhance human capabilities for applications like real-time agent assist systems, where ASR and LLM models run in real-time during a customer call, potentially in conjunction with a RAG system. In general, the panelists all agreed that co-pilot tools will drive enormous business value in the next 6 months to a year.
Conversational AI agent systems – LLMs are used in conjunction with ASR and TTS (text-to-speech) systems in the full Language AI loop. Speech-to-text is the first step in the Language AI loop that's comprised of 3 stages: perception, understanding, and interaction.
This has also led to the development of chatbots that can interact with humans, and we believe these intelligent systems will power the next generation of autonomous voice bots.
On the “perils” of LLMs, there was strong consensus that AI regulation is coming in some form, but no one really knows what that will look like.
The panel also discussed risks around privacy and the fact that public data is used to train models. We made the point that as LLMs improve, inevitably we will use them to generate synthetic data and that eventually we may reach a point where models are trained entirely using synthetic data.
This led to an interesting discussion about hallucinations. In a world where LLMs are trained with synthetic data, won’t they hallucinate more? Our view on hallucinations is that they are a fundamental “feature” of current LLMs (particularly of causal LMs) rather than a “bug”.
The next-token prediction training objective and its associated inference procedure basically guarantee that LLMs have the potential to hallucinate.
In the near future, we will probably see more advanced architectures that will eliminate the potential for hallucination by design.
You can find the video of Andrew’s panel (and many other Deepgram presentations) on our YouTube channel!
🐶 The talk of the town: RoboDoge, our infamous mascot!
Last but not least, we brought some fun to AI Summit this year, with the presence of our beloved mascot. Robodoge, our voice-activated robot dog, stole the spotlight with several tricks, from rolling over on command to doing backflips. If you captured Robodoge’s antics on video, be sure to share the joy with us on social media by tagging @Deepgram on LinkedIn and @DeepgramAI on X!
Shoutout to @DATAcated for taking this video of RoboDoge and Jose Francisco at our booth!
And if you’re curious about how a robot dog could possibly understand voice commands, here’s a high-level overview:
The dog comes from Unitree (the Chinese equivalent of Boston Dynamics). When we bought him, he had no ability to hear anything. That’s where Deepgram technology comes in.
RoboDoge has three computers inside his torso. The first is inaccessible to the user, while the other two are essentially playgrounds for anyone who knows how to code.
The “forbidden” computer is the one that controls exactly how the robot's joints move. For example, the engineers at Unitree have to control things like “Should the back-left leg bend at a 30 degree angle while walking? Or a 37.5 degree angle?”
… Most users don’t want to worry about programming the robot’s actions at such a low level.
As such, the dog comes with a series of pre-programmed actions inside the forbidden computer—rolling over, backflipping, and dancing, for example. So while users don’t have the ability to control the angles of RoboDoge’s joints, we do have the ability to tell the dog which pre-programmed action we want him to do.
And so, to make the dog follow voice commands, just follow these steps:
On a Raspberry Pi, use Deepgram’s API in tandem with the dog’s API to map transcribed words onto robotic actions. (For example, when Deepgram’s AI hears the words “roll over,” the transcription should hit an endpoint that maps the transcribed words onto the command for rolling over.)
Hook up that Raspberry Pi to one of the two user-accessible computers on the dog’s back.
Connect that Raspberry Pi to a bluetooth microphone.
Speak into the microphone, and the dog should take the appropriate action!
And the results would look something like this. 😉
🔭 What lies ahead for Speech AI
The potential applications of Speech AI are expanding, promising innovative solutions that reshape the way we interact and unlock a new world of business transformation.
From creating inclusive learning experiences to addressing accessibility, enabling faster customer resolution in real-time, and creating personalized interactions through virtual agents.
Whether it’s text-to-speech or speech-to-text, the world is your oyster.
And if you’re interested in exploring how Deepgram’s solutions can elevate your AI initiatives in 2024 and beyond, you can contact our experts. Whether you have specific questions or want to brainstorm ideas, we’re here to help you navigate the possibilities of Speech AI. Lastly, if you want to give our speech-to-text API a try, you can sign up and get started with $200 in free credit. Or, if you’re interested in joining our waitlist for Aura text-to-speech, you can sign up here.
See you next year!
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.