AIMinds #014 | Neil Chudleigh, Creator and Founder of SuperWhisper
About this episode
Neil Chudleigh is the Creator and Founder of SuperWhisper, an AI voice to text application for Mac and iPhone. He is bringing local AI models to your Mac so you can write better and faster with voice. Before, he co-founded and built PartnerStack, an affiliate network for B2B SaaS companies. Which drives the affiliate programs for companies like Monday.com, Intuit and Deepgram.
Listen to the episode on Spotify, Apple Podcast, Podcast addicts, Castbox. You can also watch this episode on YouTube.
AIMinds episode 14 features Neil Chudleigh and SuperWhisper. Neil explains what SuperWhisper brings to the table and discusses on the technical roadblocks encountered during software development. Not only does he tell us how he tackled the challenges that surfaced in developing the offline models, but he also shares structured tactics for overcoming areas of concern in edge computing and real-time troubleshooting.
Here are some of the highlights in this episode:
Super Whisper utilizes both cloud and local AI models for voice-to-text translation, with local models granting users privacy and the ability to function offline. The real-time mode option provides a comprehensive tool for transcribing notes and meetings.
The platform is adept at dictation, providing users with features that can simplify and restructure their language. It gives users the opportunity to access both free and paid tiers, allowing full exploration of the platform.
The versatility and control provided by Super Whisper are unprecedented, making it desirable across numerous professions. It records voice notes and converts them into text, dictating emails, notes, and even code, making professional workflows more efficient.
Fun Fact: SuperWhisper encapsulates a broad range of users, from professionals like lawyers and mental health therapists to productivity enthusiasts and people with specific medical conditions. The software's wide reach suggests its usefulness to varying demographics and needs.
Show Notes:
00:00 Entrepreneur built high-quality voice-to-text software.
05:36 Deepgram makes affiliate tracking easier.
06:56 Advantages and disadvantages of local and cloud AI models.
10:37 Voice-to-text app converts speech to various formats.
15:30 Responding in own voice, magic calendar hyperlink.
19:15 Real-time mode provides transcript for note-taking.
22:57 App offers limited paid features for 15 minutes.
24:14 Language model allows for sentence restructuring.
29:53 Medical professionals and others use extensive documentation.
31:47 iOS app encourages recording thoughts during walks.
34:44 Expressing appreciation for involvement in startup ecosystem.
More Quotes from Neil:
Transcript:
Demetrios:
Welcome back to the AI Minds podcast. This is a podcast where we explore the companies of tomorrow built AI. First, I'm your host, Demetrios, and this episode is brought to you by Deepgram, the number one text to speech and speech to text API on the Internet. Trusted by the world's top conversational AI leaders, startups, and enterprises like Spotify, where you probably listen to some songs. Twilio, NASA, the one that sends rockets up into space, and Citibank. Today we're talking with Neil, the creator of super Whisper. What's going on, dude?
Neil Chudleigh:
Hey, how's it going?
Demetrios:
I'm so excited because your story is one that when we chatted a week ago, I thought, we have to get you on the podcast. I know. Just for a little background for people, you came into the startup program, you're building super whisper, and we're going to get into what exactly it is. I just would love to start a little bit with some of your learnings over the years because you have such a rich history. And so tell us about what you've been up to for the past couple years.
Neil Chudleigh:
Yeah, I started out actually in the affiliate space building software and recently built out a side project called Super Whisper to do really, really high quality voice to text on Mac and iOS. And I really just built it for myself. I really wanted a high quality experience. I'm a huge user of it. Voice to text on my phone and, and I thought the Siri implementation, sorry, Apple, but it's very, very bad. It gets most words wrong. It doesn't work in a lot of apps, and seeing all the technology that was available, just decided to throw something together. And it turns out it was really great.
Neil Chudleigh:
It worked amazingly well. I could get it to do things that apples dictate just wouldn't, wouldn't even come close to.
Demetrios:
So crazy.
Neil Chudleigh:
Yeah, yeah. It just went from there. The feedback's been incredible. So I put it out and just had hundreds of people gushing, saying how much they were loving it and how much they were converting over to using dictation daily, and decided to take what was really just a side projects, kind of build it for yourself sort of thing, and take it into be a real commercial product.
Demetrios:
Yeah. You scratched your own itch.
Neil Chudleigh:
Yeah. It's been a huge learning experience, too. I mean, I have a lot of software development experience, but this is my first time building something that's so heavily, that requires so much compute on the user's computer. And then also my first time doing a Mac application that uses this level of their native APIs, the first time doing a really heavy audio management application. So it's been really fun. I'm just a state of learning. It's really great.
Demetrios:
Yeah. And I want to get into also the fact that you are now the solo founder, the solopreneur, and how that learning, I'm sure has compounded too, because 100% your last venture. For those that are listening, this is not your first venture. You kind of glossed over it. But maybe we could talk about what you mean by saying you were in the affiliate space, especially because we as Deepgram, we just set up an affiliate program and we are using a tool called Partnerstack, which I think you know quite well.
Neil Chudleigh:
Yeah, yeah. So I founded Partnerstack along with three co founders in 2014 and we went through YC and yeah, I worked on it for nearly ten years. Yeah, we basically help software companies find people who love their products and have a significant marketing channel and want to co market with the products that they love. Somebody who writes about AI might be an affiliate of Deepgram and write about what they're able to do or what products are doing with Deepgram, or they might have an advertising engine or something along those lines. Maybe a software review site, maybe they make YouTube videos. Possibilities are kind of endless and we basically help track what sales they're driving and then pay them out their commission. So it's a very different business than what I'm doing now, but it is a very exciting one. I mean, the level of partnerships and the depth they go to is quite amazing.
Neil Chudleigh:
And the whole, the whole landscape of it's quite fascinating thing. It really peels back the curtains.
Demetrios:
Yeah.
Neil Chudleigh:
A lot of the stuff on the internet.
Demetrios:
It abstracts away lots of those headaches of tracking the affiliates and empowering your super users to be able to talk about you and get a little something for talking about you. So if anyone out there is interested in talking about Deepgram and wants to become one of our partners in the ecosystem, feel free to just go to Deepgram.com affiliates and you can find all the information that you need there. But now, Neil, getting back into the inspiration for super Whisper and how you've learned over these last x amount of months building you, there's something that you said that I want to dig into, which is you're pushing a lot of the compute out to the user. And so presumably that is because most of the application or most of the use cases are happening on the phone. So let's start with just like, what is superwhisper? What are some of the use cases that you're seeing people using super Whisper for and then maybe you can go into some of those complex details of pushing out the compute to the edge devices.
Neil Chudleigh:
Yeah for sure. Alongside the cloud models that I have available for use in superwhisper. It's one of the few use cases I've seen of local AI models uh is voice attacks and I think there's advantages and disadvantages to each. And um, you know if you're on a lower power device, an older device, cloud models such as Deepgram are probably going to be your best bet in terms of getting a fast response. But if privacy is really important to you or your connection's not good, um, local models are quite, are quite a like promising um uh, solution. I mean you can run um quite large models these days on, on consumer hardware and you know they'll give you quite good results and you know it's completely private, it'll run you know, totally offline and uh, of course it does take up some system, some significant system resources so there's, there's management along with that and, and making sure that it doesn't overload your Mac and yeah, so that's actually where superwhisper started out. It was completely offline models offline first and have been integrating more and more cloud models as time has gone on. But yeah it's quite interesting.
Neil Chudleigh:
I mean especially when you look at cases of users who perhaps they're handling really sensitive information, health related, legal, government, just really private information. Obviously if you type that into your keyboard, it's offline. If you speak it into your microphone, why not have it offline as well? That's what I mean. It's been challenging to manage that and I think one of the most difficult things has been building software around. The experience of using and installing and using offline models is something, there's not really a playbook for that. Right. Like this is kind of a fairly new phenomena that you've been able to, that these things have existed, that you could run them yourself and that something like that would go on a piece of consumer software. So that's been on top of the technical challenge of actually building software for it.
Neil Chudleigh:
I think the user experience and the UI surrounding those models and explaining it to the user has been an interesting design challenge and something im still battling with.
Demetrios:
Yeah, presumably youre trying not to brick out the consumer's computer or his phone and that can be a fun challenge because like you said, you're forging your own path and you may run into questions that don't really have an answer, or it's not like it's clearly laid out. You probably have to go and troubleshoot and go deep into different communities or Reddit forums or whatever it may be, to figure out has anyone else encountered this problem? And you probably are finding a lot of obscure GitHub repos that are like, oh, this is to fix that one little problem. And so that you don't break out the computers. And so let's just get the, uh, let's get the breakdown. What does super whisper do?
Neil Chudleigh:
So super whisperer takes your voice, uh, you're recording, and it takes the form of a spotlight bar, kind of like Raycast, or the spotlight bar that's built into macOS or, uh, Alfred, if you're familiar with any of those, and it'll take your voice and record it, you open it up with a keyboard shortcut, say what you're trying to say, and it'll take that and translate it into text perfectly. If you want to take that a step further and transform that text into an email or notes or even, I've used it for code. People have used it for all sorts of things. It can actually translate using AI models. So you can run that text, your voice to text results through an AI model immediately after that, and you configure it to do that automatically. It'll automatically paste that into whatever app you're using. So you can imagine you have Gmail open, you're reading someone's email and you just respond to it aloud and have the resulting perfectly formatted email with, say, an opening paragraph, bullet points, a couple of highlighted questions, a summary paragraph, and then the sign off. And all of that is done off of a quarter.
Neil Chudleigh:
The amount of words of what is actually presented in the email. You can configure this to whatever style you like. I have modes where it's extremely informal, how I would write instant messages to friends or family, and then stuff that's much more formal, such as writing documents, prose notes, that kind of thing, and then business email, personal email, that kind of thing. So it's quite flexible, you can do a lot with it. And one of the other big use cases is recording meetings. And the actual meeting. Recording has been in the product for a while, so it'll give you a live transcript of the meeting. So whether you're on Google Meets or Zoom or a teams call slack, you can record a meeting and have all those transcripts end up in the same place.
Neil Chudleigh:
And a feature I'll be launching today actually is speaker separation. So it'll be able to identify speaker one, speaker two, speaker three, and then you'll be able to label those chunks as a certain person. So I think that's interesting. A lot of meeting recorders have that utility built in already, but the difference with super whisper is you'll be able to have that transcript in the same format as regardless of which meeting software you're using. And then on top of that, you can have that transcript pushed over to the language model and have it summarized, do key takeaways and notes again in a consistent format, regardless of how you're having that conversation software wise. So it's a big tool. There's a lot you can do with it. And I think it's really exciting finding the ways that people are.
Neil Chudleigh:
My philosophy to building it is give people the control over the tool. I'm not building a very specific meeting mode. I show you how to configure it for meetings. So give people the power to do those things, give them the power to configure the underlying tools and provide guides on how to do that. And I've been surprised with all the different ways that people have found of using the tool to be productive in their daily lives.
Demetrios:
So it's almost like a design choice is flexibility over that opinionated type of view.
Neil Chudleigh:
Exactly. I think there's ways to always kind of bring that flexibility and the complexity that comes with it down for people as they're onboarding. But I think one of the core things, I believe, with the app is that the way that you write across all of the software you use in your daily life is more similar than it is different. And you kind of want to take that with you. And the way that individuals write is so different as well. Right. The way that you write your messages, it's almost like a fingerprint. You can say, like, oh, I don't even need to see who's writing it.
Neil Chudleigh:
Like, oh, that was Demetrios. Right? You have a style. I think not giving people that control, um, is ultimately, uh, you know, ultimately how you, like, build something that they're not gonna stick with for a long time.
Demetrios:
Yeah.
Neil Chudleigh:
Yeah.
Demetrios:
And there is one thing that I wanna call out, which I loved from some of the videos that I've seen, is that you respond to an email, and when you respond, it is in, like, your voice, not just because you literally are speaking it. And so, of course it's your voice, but when the email gets written, it's in your voice. And then you did something pretty sneaky that I thought was like, wait, what was that black magic that I just saw? And it was saying here, if a time works, book something on my calendar. And then you were, you just said something, you said some special words and said, like insert calendar link or Calendly link, something like that. I can't remember exactly what it was. And it hyperlinked your Calendly linked to that inside of Gmail and everything. And so I thought that was pretty fancy.
Neil Chudleigh:
Yeah. Yeah. So there's lots of features like that in superwhisper. That one in particular is, if you're familiar with tools like text wrangler, it's basically just looking for keywords in your transcript and then replacing those with something that you've set up. I've set them up for all my social media account links, all of my calendar link, my email address, stuff like that. So you can very quickly make sure that it goes in perfectly because sometimes voice detects it's not going to get a URL. Exactly. Perfect if you dictate it.
Neil Chudleigh:
And dictating it is kind of cumbersome.
Demetrios:
So painful.
Neil Chudleigh:
Yeah.
Demetrios:
Nobody's going to be sitting there saying like LinkedIn.com/8569.
Neil Chudleigh:
Yeah. I mean, you're not going to remember it.
Demetrios:
Yeah, there's all those things.
Neil Chudleigh:
It might get a few characters wrong. One of the other things that's worth pointing out is different than the dictation tools that people might be used to that you'd find on Android or iOS. You don't have to dictate punctuation. Super whisper is going to pick it up off of the pauses and intonation in your voice. And even if you take a long pause in the middle of a sentence, it's going to take the context of that sentence and the words that it's found and decide, okay, is that two sentences or is that one? And it's going to join them or separate them accordingly. It's trained off of video and subtitles as well as recordings of people dictating. So it really does like the models have, they have an understanding of when, you know, when, like your intent as a speaker. So it's much less.
Neil Chudleigh:
I know a lot of people who are like, oh, I've tried dictation in the past and I just find it, it's too nitpicky, it's too fiddly. I can't get out what I'm wanting to say faster, and I'm having to go back and edit. And I find if I can convince those people to try it, they often come away very excited about the improvement over what they're used to, so.
Demetrios:
Well, because what happens to me a lot is that I'll be looking, so I'll be speaking and dictating and then reading what is coming out. And because what comes out is incorrect, it throws off my dictation. And then I have to stop dictating and then go and manually type in and correct something or whatever it may be. And so it totally is that frustrating moment of like, oh, where was I? Or, ah, you know, and then start back up the dictation and go for it. So. So you're trying to avoid that, I'm guessing.
Neil Chudleigh:
Yeah, yeah, 100%. So there is a, like I said, there is a real time mode that kind of gives you sort of a transcript, um, of, of what's being said. Typically, you're using those for if you're taking, you know, notes on a video or you're in a meeting. Um, but yeah, a lot of people prefer to, once they get into the flow of using it and trust the tool, um, you know, they prefer to have that off when they're, when they're just writing, because not seeing what you're saying or how the computer's interpreting it until you're done frees up your brain to continue thinking about whatever you're talking about. So 100% focus on your message and what you're trying to write. And I think that's important, especially when you have a language model kind of cleaning up the punctuation, grammar, sentence structure, reorganizing the ideas, and just making sure that you're not so constrained to make sure each word placement is perfect as long as the general ideas gets across. You can even do mid sentence corrections if you say something that's not quite right and then say, oh, wait, sorry. No, I meant 03:00 p.m.
Neil Chudleigh:
Not four. You can set up the language model to go back and take that 04:00 p.m. Piece of information and replace the three, or it'll know that is the correct piece of information. It really does release you from the nitpicky nature of traditional voice to text.
Demetrios:
And how funny is that, that the user experience is actually better not seeing what you're saying because it makes complete sense to me when we're talking right now. There is no text that is coming up showing us what we are saying in this conversation, I think would be way more distracting if I had subtitles that were happening in real time as we're talking, and I wouldn't be able to follow the conversation or give you my full attention. So it makes a whole lot of sense that you wouldn't necessarily want that, especially not 100% of the time.
Neil Chudleigh:
Yeah, I mean, it's going to be up to user preference, and I think a lot of people initially are maybe uncomfortable with it. So there is the option there. I think there is some utility in it as well. Being able to go back and reference, say it's a many person meeting. Maybe you get pulled over to something else, being able to scroll back up and read through the transcript to catch back up, that's utility and it's in the tool. So I mean, you can choose, but yeah, for me personally, if I'm writing something, I don't want to see the immediate feedback. I find it's too much.
Demetrios:
Well, let's talk about the writing because I think that is a huge use case, and I'm seeing more and more people talk about how they are writing their blog posts by dictating. I personally have the hardest time continuing my train of thought when I dictate, and so I feel like it's great to get almost like a first pass and try and get everything out there really quickly, but at the same time, I feel like I get to a certain place and then I forget where I wanted to go or my words get ahead of me. And so maybe it's more of an art form and if you practice it, you get better at it. But I would love to hear how you've been using it for writing.
Neil Chudleigh:
I have the app structured in a way where you have access to all the features, every single paid feature for 15 minutes of recordings, and then it bumps you back to the freemium or the free tier, which is still a pretty good tool. There's tons of people who just use the free tier option and continue on with it, and it's great. I think it misses a lot of the huge power advantages with the tool, but I think if you sit down with it for 15 minutes and break through, break through, maybe your past experiences with dictation and the first passes and build that trust with it too, that it's going to capture everything. I think that that's great. And like a lot of people, they'll convert at that point. They'll kind of typing will start to feel slow and they'll want to be dictating everything. So the process of writing with it, I mean, the tool is not meant to be used as one big recording and then give you a whole document necessarily. It can do that.
Neil Chudleigh:
I think it's best used in conjunction with pop open word or notes or whatever you're writing with, especially if you have an idea for something, just get it out there on the page and go paragraph by paragraph or two paragraphs at a time. With the language model, you can get it to restructure your sentences. If you want to do things like simplify my language or alternate the vocabulary or the hard hitting sentences and softer, more comfortable sentences to create interesting writing, those sorts of transformations on what you've said is possible. So I think what it can do for a writer is quite powerful and something that should be able to free them up to not so, not think so much about the mechanics of what they're writing and more so on the idea of what they're trying to get across. So, yeah, I mean, I would just say it does take a bit of effort. You know, the first time you sat down to a keyboard, were you instantly comfortable typing? My guess is no. So it is learning to use a new tool, actually.
Demetrios:
And now that you say that, it's funny, because I do see it as something. It's maybe the way that I was trying to do it in the past wasn't necessarily the optimal way. And I like this idea of, hey, maybe you're going through a document. Like, I'm reading a paper about AI, for example, and I want to give my thoughts on that paper. I want to highlight some of the key sentences or key points in my own words, as opposed to just, like, clipping it and highlighting it. And so I have it on.
Neil Chudleigh:
Right.
Demetrios:
I have super whisper with me, and it is on listening while I am scanning this document, but it is taking notes as I'm speaking them. Oh, wow. Isn't this interesting? They're doing this, or they're trying that. Okay. They're using these words, or they're using this formula. Definitely not going to try and speak a formula into transcription yet. I don't think that's gonna be there at this point in time. I don't even know.
Demetrios:
So a lot of the different letters that they use. Yeah, I could get. I could get lost. I could get very lost. But the. But, yeah, like, having it there is just, again, it's hanging out, listening to me, and taking down everything that I have as opposed to what I was doing in the past where it was like, okay, now it's time to take all of my ideas that I have in my head and get them onto a piece of paper. And so I have to have them formulated, and I have to know exactly how I want them to come out and the structure and the organization it's a different way of using it and a different way of thinking about how it can be used. Yep.
Demetrios:
Yep.
Neil Chudleigh:
100%. I think, you know, again, that flexibility exists to fill the gap. Like, I want people to understand what the utility. Like, what the utilities are under the hood, and. And to apply them to their. To their daily lives in ways that they find useful. I don't want to be. I want to show some examples, but not be prescriptive and allow them to, you know, cause the workflow for a lawyer and a student taking notes are very different.
Neil Chudleigh:
And the way that, even between two lawyers, the way that they want to interact with voice, probably pretty different. You know, they're different people, different experiences, maybe different skill sets. Maybe one of them is a lifetime dictator and has a workflow that. That they like, and the other one's never used dictation before. And, you know, so there's got to be that flexibility to adapt to the workflow that they're used to, but hopefully elevate it. And then. And then, you know, something different for. For someone who's.
Neil Chudleigh:
Who's just entering, um, you know, building it into their daily workflow. So.
Demetrios:
That's so funny you say a lawyer, too, because my dad is a lawyer, and I remember him when I was growing up, going into his office, and he had one of those recorders, and he would sit there and talk into it, and it would be very short snippets, right, like this, blah, blah, blah. And he had his style and his workflow, and then later, he would get the. Whoever it was to type it up. So he would be recording it, and then he would get some assistant to. To type it up. And now all of that does not need to happen. Like, he doesn't have to have somebody type it up. It can be in real time.
Demetrios:
He can see it as he's doing it, or right after he's done it, he can get a summary of all of it, or maybe suggestions on where to make it better. And so I think that is super cool. I imagine you've seen a few other use cases. Like, it sounds like for a lawyer, that's a no brainer.
Neil Chudleigh:
Yep.
Demetrios:
Also a student, another no brainer. Like, what are some of the other ones that people have come to you with?
Neil Chudleigh:
Yeah, basically everybody in the medical fields, and I think a lot of people go to traditional medicine, like in hospital, but even more heavily outside of that. Outpatient clinics take an absolute mountain of notes every day, and their requirements to take notes are actually quite high because of the filing process. That they have afterwards to interact with insurance providers. So they're having to document codes for different procedures and they have to take very rigorous notes of every single session because if they don't and they get audited, they're in trouble. So, yeah, a ton of, a ton of different applications there and not even necessarily physical health, but psychology therapists, that sort of thing. Tons of productivity enthusiasts. So people who are content creators, writers, a lot of people who are in really into personal knowledge management, a lot of software engineers.
Demetrios:
Yeah, I could see it being fascinating for someone who just picks up their phone in the morning and speaks into it as like a diary in a way, and then you can have it all there and have it be something that's just able to capture your thoughts. Like, I wish I could do it every morning when I woke up and just capture how I'm feeling, what I'm getting ready for in the day. Almost document my life in that regard.
Neil Chudleigh:
Yeah, yeah, a lot of people are more so than, I guess, waking up in the morning, but with the iOS application I'm getting the feedback that a lot of people are taking it on a walk. So start recording and then just record thoughts as they come up on their walk, which is quite nice. I really like the idea of just taking that time to have nothing in front of you and just be talking one of the other. The most motivating, actually, now that I'm thinking about it, is those with disability, both permanent and temporary. So I've gotten people who have installed it on their parents laptop, if their parent has, you know, dementia. Um, I have people with dyslexia telling me that it's helping them with organizing their thoughts. Um, you know, sometimes writing can be overwhelming. Um, people with repetitive strain injuries, um, or, or if they're.
Neil Chudleigh:
I had one guy who's getting back surgery, so he was in bed for three weeks. Um, and so he had his laptop propped up and he's like, you know, has to be in this one position. Can't move his back basically, and was using dictation for that whole time. I had one of my close friends actually broke his hand opening a jar of pickles, of all things, snapped. Yeah. Like one of the bones in his hand. I know.
Demetrios:
Oh my God. Crazy.
Neil Chudleigh:
And he became a power user of super whisperer immediately after that. Yeah, it's been, it's been, it's been awesome hearing about like how it's helping people and that continues to expand as, as, like the product is, you know, coming to the phone and people are getting their hands on it there.
Demetrios:
So excellent. Neil, this has been fascinating, man. I really appreciate you coming on here and talking about your journey and talking about super whisper. Anything else you want to mention before we jump? I know there's all kinds of cool stuff that you're doing, and so people can get involved by going to super whisper, googling it, but then you've also got really cool use cases and demos on your YouTube channel. So I encourage, if anyone is at all curious, go check out super whisper on YouTube and you'll see all the different ways that Neil is talking about right here.
Neil Chudleigh:
Yeah, yeah, it's superwhisper.com. Superwhisper app on Twitter. Yeah, that's pretty much it. Yeah. I would really recommend checking out the videos. They're. They're the easiest way to really understand what you can do with it, so.
Demetrios:
Yep, yep. As it was for me, that was. What was the mind blowing piece. When I saw you put in that calendly lane code, I was just like, whoa, play that back. Did I see that?
Neil Chudleigh:
Right?
Demetrios:
So I love it, man. Well, this has been great, and I super appreciate you being part of the Deepgram startup ecosystem. It's really cool to see and wish you continued success with everything you're doing with super whisper.
Neil Chudleigh:
All right, yeah. Thanks, man. Take care.