Leaving You Speechless: The Top 7 AI Text-to-speech and Voice-cloning Startups of 2023
Jose Nicholas Francisco
Here at Deepgram, we don’t just enjoy making our own AI, we also love to admire the innovative models that our fellow startup peers build. And as a foundational AI company whose roots lie in audio data, we have a soft-spot in our hearts for speech AI in general.
Specifically, voice-synthesis AI is booming. These vocal models have gone viral in numerous ways. Whether it’s hearing an AI version of Taylor Swift sing songs by The Weeknd, or tuning into Drake singing an entire album he never wrote, or even listening to a synthesized Morgan Freeman translate French radio.
Voice synthesis AI has gone so viral that even people who are barely involved in the AI community are starting to turn their heads. Just read the comments underneath the aforementioned album:
But alright, enough talking about the hype. Let’s actually see what the hype is all about. Below is a list of seven voice-synthesis AI startups that have caught the collective eye of the community.
ElevenLabs presents itself as a leading provider of advanced voice synthesis solutions, boasting state-of-the-art algorithms and cutting-edge neural networks. The moment you land on their homepage, you’ll instantly see their showcase of diverse, lifelike voices, all generated by nothing more than fancy linear algebra and strong statistical analysis (read: AI).
What sets ElevenLabs apart is their commitment to customization and personalization. Their platform allows developers and businesses to tailor the voices to match their specific requirements, ensuring a unique and immersive experience for end-users. Whether it's creating voice assistants, voice-overs for media content, or interactive dialogue systems, ElevenLabs empowers organizations to integrate natural-sounding voices seamlessly.
Their big goal is to “instantly convert spoken audio between languages.” However, along the way, they’ve developed an array of impressive text-to-speech technologies. Check out their showcase here! (As a sci-fi nerd, my personal favorite is this one.)
Furthermore, ElevenLabs emphasizes the importance of democratizing access to voice synthesis technology. By offering an intuitive API and comprehensive documentation, they make it easier for developers to integrate their solutions into various applications. This inclusivity and user-centric approach set ElevenLabs apart as they strive to make high-quality voice synthesis accessible to a wider audience.
As the impact of voice technology continues to grow, ElevenLabs also continues to amplify the possibilities for human-machine interactions. From entertainment and gaming to virtual assistants and accessibility applications, ElevenLabs' innovative solutions are set to shape the future of voice synthesis and enhance our digital experiences in remarkable ways.
And now we move onto the startup with my personal favorite name among them all: Lyrebird.
If you’ve never heard or seen an actual Lyrebird in nature, check out this video. This incredible bird can mimic any noise with nothing more than the larynx in its throat on the feathers on its back—from camera shutters and chainsaws to the calls of other birds nearby.
But alright, enough National Geographic talk. Let’s get techie.
The first thing you should know about Lyrebird is that they were bought out by Descript in 2020. Lyrebird is no longer its own startup, but rather a research division within Descript. Check out their site here.
Upon visiting Lyrebird's dedicated homepage within Descript, you’ll find that they are extremely straightforward—both in terms of their technology and in their company’s personality. Right above the link to their Ethics page—which states that “with great innovation comes great responsibility”—you’ll find an interactive demo where you can try out their AI by simply writing some text.
Lyrebird's technology offers users the ability to generate realistic, human-like voices and manipulate speech with remarkable precision. Through the seamless integration of Lyrebird's capabilities within Descript's audio editing platform, users are equipped with powerful tools for editing, remixing, and refining their audio content.
From voice-overs in videos and podcasts to creating engaging virtual characters and interactive storytelling experiences, the possibilities are extensive. The integration of Lyrebird's technology within Descript's platform enhances the user experience, offering a robust suite of tools to manipulate audio content with ease and precision.
By leveraging Lyrebird's voice synthesis capabilities, Descript has expanded its offering to include cutting-edge audio editing features. Users can now craft immersive audio experiences, experiment with different voices, and achieve unprecedented levels of creativity in their projects.
(Note: my favorite Lyrebird voice is the one named “Don.” Check him out in the demo on this page.)
Lyrebird's research division continues to innovate within Descript, driving advancements in audio technology and pushing the boundaries of what can be achieved with voice synthesis. These innovative companies have laid the foundation for audio content creators, podcasters, and filmmakers to unleash their creativity in unprecedented ways. As they continue to explore the possibilities of voice synthesis, the collaboration between Lyrebird and Descript promises an exciting future for the world of audio content creation and manipulation.
WellSaid Labs’ mission is to empower businesses and content creators with the ability to generate lifelike speech that captivates audiences. WellSaid Labs' cutting-edge technology combines the power of deep learning, natural language processing, and advanced audio engineering techniques to produce voices that are virtually indistinguishable from human speech.
What sets WellSaid Labs apart is their focus on teams and collaborative work. As they say themselves, WellSaid labs is “built for collaboration,” and everyone on a given team can work “together to tell a unified story—with one voice or many.”
The homepage highlights WellSaid Labs' commitment to flexibility and ease of integration. With their robust API, developers can seamlessly incorporate WellSaid Labs' voice synthesis technology into their applications, unlocking a world of possibilities for voice-driven user experiences. The platform's intuitive controls and comprehensive documentation make it accessible to both seasoned developers and those new to voice synthesis.
Have you ever watched dubbed anime? Did you stream Squid Game in English? Or are you perhaps a fan of old Bruce Lee movies?
If you’ve answered “Yes” to any of the questions above, then Papercup’s product should feel rather familiar to you!
Exploring the Papercup homepage reveals a company dedicated to solving the challenges of multilingual communication through their groundbreaking voice-to-text transcription technology. Their advanced machine learning algorithms enable real-time, accurate transcription and translation of spoken content, regardless of language or accent.
Or, to put it in three words, Papercup specializes in “AI Powered Dubbing.” (Their words, not mine.)
Papercup's technology is designed to facilitate seamless multilingual experiences. They outline it in five simple steps:
First, you submit an existing video for translation and voicover.
Then, Papercup’s AI automatically transcribes, translates, and creates human-sounding voice over.
Then, Papercup’s team of QAs verify the quality and make any necessary adjustments.
You receive a dubbed version of your video!
Whether you’re an enterprise or an individual content-creator, Papercup offers services for you. SkyNews, Bloomberg, and Insider all have used Papercup in the past to expand their viewership beyond English-speakers. And now you can too! 👀
With Papercup's technology, businesses can unlock the full potential of their audio content. Papercup empowers organizations to break down language barriers, connect with global audiences, and make their audio data more valuable and actionable.
As the world becomes increasingly interconnected, Papercup's dubbing technology is poised to play a pivotal role in facilitating effective communication and understanding across linguistic boundaries. By harnessing the power of audio data and machine learning, Papercup is shaping the future of multilingual communication and transforming the way businesses leverage spoken content.
And now we move on to the voice-to-text company with my second-favorite name (after Lyrebird): Murf.ai.
Murf.ai’s focus lies in creating human-like virtual agents capable of understanding and responding to natural language. But not only do they synthesize AI voices, but they also offer you the ability to clone an actual human’s voice. As Murf puts it:
“Tired of hearing machine-like, monotonous-sounding voice clones? Not anymore. With Murf, generate an AI voice clone that mimics real human emotions like anger, happiness, sadness, and more.”
On this page, you can even compare the human voice and their AI clone!
Much like its peers mentioned above, Murf emphasizes Ethical AI and Data Security. Not to mention, they offer voice over in 20+ languages, from Arabic to Scottish to Hindi, and even Portuguese!
As the demand for personalized and interactive customer experiences grows, Murf.ai's natural language understanding technology is paving the way for more intuitive and human-like virtual agents. By combining the power of AI with effective communication, Murf.ai is transforming the landscape of conversational AI, enabling businesses to connect with their customers in more meaningful and impactful ways.
Exploring deepdub.ai's homepage reveals a company dedicated to simplifying the localization process and expanding the reach of audiovisual content. This company’s main focus is the entertainment experience. And, as the name implies, they specialize in dubbing!
And to showcase their dedication to internationally breaking language barriers, deepdub reveals coverage of their company on N12 Israel News with a demo of the reporter speaking in different languages.
Their advanced AI-powered voice dubbing technology empowers content creators, filmmakers, and media organizations to efficiently and accurately translate and dub content into different languages, unlocking the potential of global markets.
deepdub.ai's technology leverages deep learning algorithms and neural networks to analyze and replicate the nuances of human speech. By capturing the essence of the original audio, their automated voice dubbing solution ensures that the translated content maintains a high level of authenticity and naturalness, preserving the intended emotional impact and cultural nuances.
With deepdub.ai, businesses can unlock the global potential of their audiovisual content. By breaking language barriers and delivering high-quality voice dubbing, deepdub.ai enables organizations to engage with diverse audiences, expand their reach, and create a truly global presence.
Finally, Typecast AI's homepage reveals their dedication to aiding storytellers, marketers, and educators create content as efficiently as possible. Whether you’re a YouTuber or a CMO (or both), Typecast offers something for you.
And just like many of its peers above, Typecast offers a Free version of its services so that you can test out the product to see if it’s right for you. But perhaps the best part of Typecast is its, well, cast of original characters. Yes, they offer classic humans—like Nathan, whose “tone is great for communicating short stories, or Julia, whose “clear and crisp voice can make any story lively”—but my personal favorite is Killian the Vampire, pictured below.
You can try out Killian on this page. But there are many more Typecast characters readily available for you upon signup! (If you have a minute to spare, check out the bearded and muscular eLearning expert Rex on this page.)
If you’re in the business of content-creation, the world of marketing, or the community of influencers out there, you’ll surely find a text-to-speech AI model useful at some point in your journey. Whether you’re a NewTuber or a marketing veteran, there’s always something to appreciate about the fact that a bunch of linear algebra and probabilities can come together to form human-like voices in multiple languages.
So if you want to create a new synthesized Taylor Swift album, or if you simply want to have Killian the Vampire read you a bedtime story, check out these companies above!
(And if you want to go in the other direction—utilizing speech-to-text rather than text-to-speech—check out Deepgram 😉)