Accuracy Matters: Improving Speech Recognition Through Data Processes - Esteban Gorupicz, CEO, Atexto- Project Voice X
This is the transcript for “Accuracy Matters: Improving Speech Recognition Through Data Processes,” presented by Esteban Gorupicz, CEO at Atexto, presented on day one of Project Voice X.
The transcript below has been modified by the Deepgram team for readability as a blog post, but the original Deepgram ASR-generated transcript was 94% accurate. Features like diarization, custom vocabulary (keyword boosting), redaction, punctuation, profanity filtering and numeral formatting are all available through Deepgram’s API. If you want to see if Deepgram is right for your use case, contact us.
[Esteban Gorupicz:] Hi, everyone. I am Esteban Gorupicz, founder and CEO of Atexto. Let me introduce myself. I started to work with speech processing technologies fifteen years ago and with crowd sourcing technologies, and I founded Atexto four years ago to help companies solve the main problem related to voice technologies adoption worldwide. That is the problem of accuracy… or better say, the lack of accuracy. And when I speak about accuracy, I’m not only speaking about the word error rate.
I’m talking about bias or fairness and also talking about language support. And these kind of problems are the ones our company is trying to solve, and to do that, we built a software platform that is code free, tool website for machine learning teams, for data science teams, to help them visualize, label, and collect speech training data faster. For example, related to labeling speech, we are the only platform that allows data managers to label, not only text, but also sounds in recordings. And these teams can do it by themself.
But, also, we have a crowdsourcing facility, a crowdsourcing platform fully automated with more than one point five million users… or rather registered from more than fifty countries to perform micro-tasks related to annotation speech, annotation text, audio transcription to build curated datasets to train machine learning models and improve the accuracy of these kind of products. Also, through our data manager platform, data managers can run voice data collection projects, and they can, for example, select the country of residents of the people they need to repeat… allow different training phrases, pronouncing brands, pronouncing product names, and and this kind of information. And they can select their mother tongue, their country of residents, the gender distribution, also the kind of frequency they need for the recordings.
For example, as you know, if you need data to train a speech recognition model for a call center, you need recordings from from the telephone. That is not only related to an eight kilohertz frequency. It’s related also to to the distance to the microphone. It’s very important to to better train the model, but they can select a collection project utilizing a a desktop computer where you can utilize another kind of microphone, and it’s useful for for voice assistance in the car, for example. The the cool thing about our platform is our clients don’t need to define… to design a UI to perform this kind of task. They don’t need to design a workflow. They don’t need to set up golden questions, consensus algorithms to curate the information, to curate the the data we are collecting. They only need to provide the prompt. Our users will pronounce, and our software will curate… will filtrate the correct utterances, and we’ll send to to our clients only the best ones. Also, we have another module that is a ASR benchmark module where you can run experiments to measure the word error rate, but also token error rate related to punctuation, related to capitalization, inverse text normalization. And you can come… can compare different brands of your own speech recognition engine against the main vendors that is Amazon Transcribe, IBM Watson, Google, and and all all of them, not only by recording, but also by gender, by age, by ethnic origin. So our platform are helping big companies to measure the fairness of the the voice-based products they are releasing to production environments in a way to be… fairness with all of their clients, and that that is a a very important thing, I think. The last, I I will be very, very brief because I… I’m the last one on the stage today, and brevity is a… is the soul of wit. We are helping our clients, and this is very important for us, to feel their own long-term defensibilities around data. We ambition a future where every company will be a voice-based company. And in this future world, we not only accelerate, we… not only we’ll accelerate the voice technology adoption, we also democratize this kind of technology for every company. This is our vision. This is Atexto. Thank you for hearing me. Thank you.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .