Podcast·Mar 28, 2024

AIMinds #011 | Laurin Wirth, Co-Founder at WhisperTranscribe

Listen on your favorite platform

Listen onApple Podcasts

Listen onSpotify

Listen onYoutube

AIMinds #011 | Laurin Wirth, Co-Founder at WhisperTranscribe

Demetrios Brinkmann

AIMinds #011 | Laurin Wirth, Co-founder at WhisperTranscribe AIMinds #011 | Laurin Wirth, Co-founder at WhisperTranscribe

Episode Description

Laurin Wirth talks about how they founded WhisperTranscribe and how it focuses on AI transcription and content creation.

Table of Contents

Show Notes:More Quotes from Laurin:Transcript:

Share this guide

Subscribe to AIMinds Newsletter 🧠Stay up-to-date with the latest AI Apps and cutting-edge AI news.SubscribeBy submitting this form, you are agreeing to our Privacy Policy.

Table of Contents

Show Notes:More Quotes from Laurin:Transcript:

About this episode

Laurin Wirth lives in Austria and founded WhisperTranscribe together with his Dutch and Ukrainian Co-Founders Anne-Albert and Tatiana. His hobby includes running enough for him to be called a running nerd and he is also a rookie in playing chess.

Listen to the episode on Spotify, Apple Podcast, Podcast addicts, Castbox. You can also watch this episode on YouTube.

In this episode of AIMinds, Laurin shares his journey from moving to the Netherlands to study economics, working in low code software development, to co-founding Whisper Transcribe with a focus on AI transcription and content creation.

Here are three key takeaways from this episode:

Value-based pricing: Whisper Transcribe has adopted a pricing model directly linked to transcription minutes, emphasizing transparency and value for the customer.
Focus on user experience and quality: The company differentiates itself by prioritizing user experience, leveraging high-quality AI models like GPT-4, and offering a privacy-focused, fast desktop app for content creation.
B2B growth strategy: As the market evolves, Whisper Transcribe is shifting towards a B2B focus, targeting reduced churn rates and leveraging regional advantages for growth in the German-speaking market and beyond.

Having built a bootstrapped startup while juggling freelance work, Laurin shares his thoughts into overcoming challenges and the pivotal role of partnerships, such as with Deepgram, in addressing customer needs.

Fun Fact: The inspiration for Whisper Transcribe came when Laurin witnessed a journalist transcribing an entire interview by hand, sparking the idea to leverage AI transcription models to address the time-consuming task.

Show Notes:

00:00 Start of Laurin’s journey.
04:04 Development of OpenAI Whisper.
07:43 Created image transcription tool inspired.
10:28 Efficient content creation for creators and companies.
16:57 Startup shifting focus from B2C to B2B.
19:44 Pricing strategy for AI-based products.
23:29 Founders juggled freelancing and delayed startup launch.
26:10 Wrap up.

More Quotes from Laurin:

Transcript:

Demetrios:

Welcome to AI Minds, the podcast. This is a podcast where we explore the companies of tomorrow being built AI first. I am your host, Demetrios, and in this episode, it is brought to you by Deepgram, the number one speech to text and text to speech API on the Internet, trusted by the world's top conversational AI leaders, startups, and enterprises like Spotify, the one where you listen to your music, Twilio, NASA, and Citibank. Today we are joined by none other than Laurin, the co founder of Whisper Transcribe. How you doing, man?

Laurin Wirth:

Yeah, thanks a lot for having me. I'm excited to be on a podcast and tell a little bit about how we use steepgram in our small startup and what our startup is to start out with.

Demetrios:

Before we get into all that fun stuff, let's start with your story. How did you get into tech?

Laurin Wirth:

Yeah, so I'm originally from Austria, but decided quite early on to move to the Netherlands and actually study something a bit unrelated, economics. Sometimes I wish I could go back and actually study computer science, software engineering or something like this. I do think it's something that helps you as a founder, but luckily I have a great co founder that helps us with all the technical things. And together we built whispertranscribe. He really focuses on the technical implementation. We do a bit like the product together, and I fill in all the other parts of what is needed to make this a success.

Demetrios:

Wait, just real fast, let me back up for a second. Because you moved from Austria to the Netherlands, studied business, got out of school, and jumped right into the startup life.

Laurin Wirth:

So I still did have a bit of like an in between stop. I thought, all right, so I've missed out on studying computer science. So what's the next best option? And there is quite one of the market leaders in low code software development sitting in Rotterdam called Mendix. And that was kind of my first stop. I thought, all right, if the road to computer science is a bit too long, then maybe I can kind of fast track it by working for a local provider. And yeah, that's what I did for a bit over two years. And then COVID hit and I thought, now is the time to try this out myself and see what kind of business I can start.

Demetrios:

Well, talk to me a little bit about the inspiration COVID hit. You realized you wanted to do something on your own. Were you throwing around a bunch of different ideas?

Laurin Wirth:

Yeah, absolutely. I think probably a lot of the listeners might be familiar with Pieter levels and his kind of like twelve months, twelve ideas, twelve startups. Kind of framework. So I've been doing that for around a year. And kind of the second part of the year I already joined forces with one of my old comrades from my studies and now co founder Albert, and we started just building some smaller projects together. And then really, I think things changed as the whole generative AI part really hit the market. First with the old stable diffusion, where we did some projects, some smaller projects in the image creation space. And then I think that really led into the whole whisper transcribe startup.

Laurin Wirth:

Yeah, I can give you a bit of a background of how that came to be, actually, because I was sitting in a co working space and we were working on one of our image startup ideas just to see and test if that's something that's working. And whilst I was speaking to people, actually, I noticed there was one journalist that was actually still transcribing by hand. And recently we had also played around with some of the transcription models, starting with OpenAI model. So the OpenAI whisper, and yeah, we just saw how much time and money still people spend on just transcribing simple text. And that's really then why we shifted our focus, just because this was such a clear value add to so many people that were directly around us. And the initial idea came from building a wrapper around the OpenAI whisper API, just because it was still actually, even now still is something that's not super accessible to a lot of people. Because if you want to use the OpenAI whisper model, you either need to run it on your own machine, which requires a lot of computing power, or you need to plug into the API. And I know you guys also offer like a whisper implementation, which is great.

Laurin Wirth:

And the alternative is of course to go to open AI directly, but either way you need to have some developmental skills, so you need to be able to interact with that API. And that was our first prototype of the idea. And also whisper transcribe, where we built desktop app, which was really just a wrapper around the OpenAI Whisper API. And we started selling that as a license, just a one time payment, launched it on product hunt, and really saw first traction, made like the first $1,000 by just selling licenses, but then did realize, hey, if we like, you know, just the transcription is actually not all that people need. It's quite powerful to combine it with large language models and have on the one hand direct the transcription and on the other hand really create content from the transcription. So that's where we at now, and we really integrate with obviously like Deepgram, which has really been a game changer for us because it offers speaker recognition, so we can really provide very cheap, very accurate and very fast transcripts. And then on the content generation side, we use GPT 4. And we've been working a lot with different prompts to really create awesome content for all content creators.

Laurin Wirth:

So regardless of whether you're a podcaster or whether you're like a YouTube creator, you can plug in your audio, your video into our app, and it will generate summaries, blog posts, show notes, titles, and it's really a little bit like a small chat GPT, but then for your own content. So that's a bit the story of how we started with Whisper transcribe, and also the steps we've taken since then to really continue to build it and to add more value to help create those, create content faster.

Demetrios:

Okay, so let me break this down real fast, see if I'm following you. 2020 hit, you decided you wanted to go out on your own, do something. And after messing around with a few ideas, you were trying what, one idea? A month after the inspiration from Pieter Levels, who, for those that do not know, is a famous personality on Twitter who is very big in, I think, the indie hacking community and making different products and not having to go out and raise vc money and do it in a way that you need to hire a big team. It's more solopreneur type style things. And so you were inspired by that and you created the stable diffusion type, or you created a product that was leveraging the image creation tools like stable diffusion, because that was hot and that had come out. And when you were in a co working spot, you realized, wait a minute, there's some cool things that we can do with this image creation, but what's probably more of a pain for a lot of people is just transcribing things. And what was it, you were working next to a journalist and you saw that this person wasn't able to transcribe what they were doing?

Laurin Wirth:

Yeah, absolutely. So I really watched someone transcribe a whole interview by hand and spent, I don't know, two, 3 hours on one interview to just get a transcript. And I thought, all right, there must be a better way, especially with all the advances in the AI transcription models. And that was really kind of what kicked all of this off. Yeah.

Demetrios:

So then launched on product hunt, saw initial traction, saw people were liking it and asking for it, and you're like, hey, wait a minute, there might be something here, so let's go deeper in it. And that's when the product journey really, I think, probably started to unfold for you and you started to recognize that we can add a lot more to this rather than just transcribe. Because if you're just doing transcribe, yeah, that's great. But what is the differentiating factor other than it's easier than sending it to the OpenAI API or the Deepgram API or whatever? And so then you thought, we're going to go after the creator niche. Is that how you landed on it? And how did you land on the creator niche to try and go after that specific slice?

Laurin Wirth:

Yeah. So indeed, we are currently really going after the creators and also companies that are creating content. And the reason why we've landed on that side is just because I think they can leverage, at the one hand, the transcript, but they also get a lot of value out of creating content. So a lot of these podcasters or content creators employ a small team of people just to create content from their existing audio video. So this will be a blog post, or to continue to post on Twitter, Instagram to write the show notes so you can add it to your Spotify. So all these things, they are a lot of work, and we have found that we can reduce that work by a lot. I mean, maybe the last final touch still makes sense if someone from that podcast production team does that, but we really save like 90% of the time that you would have to spend previously to create all that content. And that's really also what our vision is going forward, that we don't only want to concentrate on that written content, but we also want to continue to grow and add value to these content creators by adding video content to this whole flow so that you can, in our app, like with the help of large language models, figure out what are the most engaging parts of your audio or of your video, and that we help you create small videos for TikTok, YouTube shorts, Instagram, et cetera.

Laurin Wirth:

So to really be that one place for creators to repurpose content and all of it, of course, with the help of AI.

Demetrios:

One thing that I've noticed is there are a considerable amount of people that are attacking this problem set. Right. How do you look at differentiating yourself from the competition?

Laurin Wirth:

Yes. So the way we see this is that going forward, a lot of these models will become significantly cheaper. So our current philosophy is that we will provide the highest value, even if that means we don't have the highest profit margins. So what we see a lot of competitors do like, they use cheaper large language models to create the content so GPT 3.5, or like some of the Facebook models that they self host, which creates worse content, but of course is a lot cheaper. We do use GPT 4 to create all our content, which we find works a lot better, especially if you have specific requirements. So within our app, you can actually also use it similar to chat GPT, give it an example of what your tweets usually look like, and then ask it to create five additional tweets in the same kind of language and in the same kind of style as your previous tweets. And that really helps differentiate our service from what a lot of the competitors do, which still only offer, I guess, like, cheaper AI models. So that's one part of differentiating that we do at the moment.

Laurin Wirth:

The second part is we have different to actually all of our competitors. We have like, a desktop app that has the advantage that all your files are actually your files, so they live on your computer. We don't save any of your files. A lot of times, obviously, the audio is something that's personal or maybe not something that you want to share with any third party. So we do all of this on your machine, or save it, at least on your machine, and that helps us to be a bit more privacy focused, but also makes our app a lot faster, because there's not always like, if you have a large audio file that you need to process and that you first need to upload to the server and access, that oftentimes makes things a lot slower. So it's also like one of the feedback points we've gotten from a lot of our customers that did compare us to our competition. We're very easy to use, we're very fast, and we're privacy friendly, and we generate high quality content because we use GPT 4 instead of cheaper, large language models.

Demetrios:

So it feels to me like you're taking the approach of making sure that the user experience is top notch and you're differentiating yourself on user experience. To say, this is going to be fast. This is also going to be thinking about the quality of the output so that the creator doesn't need to spend as much time tweaking that output and polishing it so that it is at the level that they can send it out. Is that what I'm hearing?

Laurin Wirth:

Yes, definitely. And to put things into perspective, I think there was these comparisons between GPT 3.5 having around like a 70 iq, and GPT 4 having like, 140 iq. So there is really a significant difference in how well the content is written, how much it sounds like AI generated content or not. And yeah, we want to create long term lasting relationships with our customers. We want to make sure that they are happy, so we are happy to lower the margins that we have. Especially also because we're pretty sure a lot of these costs will go down in the long term. But if we already make our customers happy now, then I think they will stay with us. And yeah, that's just how we see the market and how we want to work together with our customers.

Laurin Wirth:

Excellent.

Demetrios:

So talk to me about growth and how you look at growing and basically making sustainable growth, because I'm assuming you're bootstrapped.

Laurin Wirth:

Yes, we are fully bootstrapped. I'm filling in, I guess like all the different roles that are not directly programming the product, so I'm also in charge of selling it at the moment. When we started out, we really focused on more B2C, so like smaller podcasts, people that run a podcast by themselves or with a team of two people or something like this. And that's been great to really validate the product. And I think going forward now, we will focus more and more on B2B with the reasoning that I think as OpenAI and GPT plus expand their offering, I think it will become just more and more difficult on that B2C level to really compete with chat GPT, as there's like GPTs that do something that is similar to what we do and a lot more services indirectly integrate into GPT plus. But of course, as soon as we target B2B, a lot of the other advantages that we have, such as privacy, an easy way to collaborate between team members, a lot of these things start to add more value. And I think it's also a bit more of a sustainable way to grow. Just because B2B tends to have a bit lower churn rates.

Laurin Wirth:

You don't need quite as many customers to really have significant or like revenue to grow your revenue. And funnily enough, a lot of these B2B clients are also less demanding in terms of what they need. As long as your service is up and running and you've not oversold over promised anything, then they tend to be quite happy with what they get.

Demetrios:

And are you going after certain markets or certain languages?

Laurin Wirth:

I mean, I have the advantage, of course, or we have the advantage of being in Europe. I mean, the largest market for sure is the US, but I think that's also, there is the most competition in the US market. So we will for sure partly go after the US market, but we are also going after the german speaking market because I'm austrian, so it's just a lot easier for me to kind of know the ins and outs of that market as well as the dutch and belgian market, because Albert is Dutch and will definitely have an advantage there.

Demetrios:

Yeah, that makes sense. And that's why I was thinking that. And when it comes to there's something that you said that I don't want to miss, I feel like you kind of glossed over it and now I'm having a hard time remembering exactly what it was because I asked the other question. But I think there's another big point that I love asking people about when it comes to building businesses on top of AI, since it's almost like the API calls are cost of goods in a way. And so when you're thinking about pricing your product, are you doing it purely on consumption? Do you know that the consumption is going to be x amount from API calls? And so you can charge a markup on that, which is that delta of your product, or are you looking at it as like a seat base type thing? And depending on the amount of people that you have, you have different pricing options. Because that was coming back around to what I thought I forgot, but I didn't remembered, is the ability to be able to collaborate with team members. And so looking at that and saying, okay, we're going to figure out a pricing here, our b to b pricing is going to be x amount. How do you factor in? We're going to be needing to have API calls for the transcription and then API calls for OpenAI's GPT 4.

Laurin Wirth:

Yes. So I mean, the way our pricing is set up, everything kind of links to transcription minutes. The amount of content is directly related to how much, the amount of GPT 4 usage is directly related to how many minutes you transcribe. So that's really the way we set our pricing. And then maybe, I think one thing that holds true for more or less, like all companies within this market, is that I think as this is still so novel, people don't really do cost based pricing. They have more of a value based pricing, I think. And that's probably something that will also change in the future as the market becomes even more competitive. Because at the moment, I think the highest costs are really customer acquisition costs.

Laurin Wirth:

And then a lot of the customers that we work with, they have seen these kind of services for the first time, so they don't really have a reference point or they haven't done a lot of comparison with other services that are on the market. But they just see, hey, this saves me 5 hours per week. Awesome. My time is valued at x, so I can definitely pay $30 per month for this. So that's how we approach pricing. And we'll also continue to approach pricing that it will be linked directly to the transcription. Just because it's most transparent, it's super easy to understand. And yeah, I think the same thing also holds true for B2B.

Laurin Wirth:

There is not really a lot of additional costs for adding additional team members. The only real cost is the API costs or the AI costs that we incur. And as long as we bake that into the contracts that we have, then that's just the easiest and most transparent.

Demetrios:

So what have been some of the major hurdles that you've had to overcome while building whisper transcribe?

Laurin Wirth:

I think one of the challenges was that all three of us, all three founders, were still working more or less full time on freelancing and other projects when we started this. And I think we could have been a lot faster to market if that would not have been the case. And I mean, as you mentioned, it is of course, like a competitive market, and I think first mover advantage is definitely a thing. So that's been somewhat of a challenge for us. And I think some of our competitors with products that are worse than what we offer have managed to capture a decent chunk of the market just because they were a little bit faster than us. So that's been one of the challenges. And of course also juggling these two things at the same time. So, like the freelancing and at the same time starting this company, that's been one of the challenges we've been facing, really.

Laurin Wirth:

And outside of that, I think we've been quite lucky with having good partners. Just like Deepgram, you guys have been like a pleasure to work with and quite easy to integrate. And we've been also quite lucky with a lot of very nice customers that have been providing us with good feedback and let us really build the product with their needs in mind.

Demetrios:

What have some of these main product features that people are asking for been?

Laurin Wirth:

Yeah, so one of the big features people requested was actually speaker recognition. So the diarization part, and that's also how we got to work with Deepgram, just because we're looking around on the market to look. Okay, what kind of services offer this? And one of the, I think, very cost effective and accurate services that does this is Deepgram. And yeah, it was, I think, quite simple to integrate, especially as we already had a lot of the front end to display everything. And I'm sure my co founder Albert would disagree with some of the things I say. Of course, he has been struggling with most of it, but from what I heard, it was quite pleasant and easy to just add that one additional API key change, a couple of things. And yeah, of course you guys were also quite supportive in setting everything up.

Demetrios:

Yeah, it's awesome to have you as part of the startup program too. It's worth mentioning that you all joined the startup program and you've been rocking ever since. Well, even before you joined the startup program, I think you were rocking, but hopefully we just poured a bit of kerosene on the fire and so you're able to hit escape velocity much easier. It's been a blast talking to you, Laurin. I appreciate you coming on here and sharing a little bit of this story with us.

Laurin Wirth:

Yeah, thanks a lot for having me, and I really enjoyed having our conversation and also using Deepgram and being part of the startup community. Thanks a lot.

Demetrios:

Excellent. We'll talk to you later. Bye.