Article·Tutorials·Jun 13, 2024

Converting Speech to Text in Flutter Applications

In this tutorial, learn how to use Deepgram's speech recognition API with Flutter and Dart to convert speech to text on iOS and Android devices.

Before We Start Create a Flutter Application Add Device-Specific Permissions Android iOS Add Your UI Handling the Text State Install the Dependencies Handle Audio Input Learn more about Deepgram

Share this guide

By Greg HolmesGuest Author

Last UpdatedJun 13, 2024

Before We Start Create a Flutter Application Add Device-Specific Permissions Android iOS Add Your UI Handling the Text State Install the Dependencies Handle Audio Input Learn more about Deepgram

In this tutorial, you'll learn how to transcribe your message in real-time from your device's microphone using Deepgram's Speech Recognition API. The audio will be converted into data and live-streamed over WebSocket to Deepgram's servers, and then once transcribed, returned in JSON format back through the WebSocket.

Before We Start

You will need a Deepgram API Key for this project - get one here.

Next, head over to Flutter's documentation with instructions on installing Flutter onto your machine.

Create a Flutter Application

Depending on which IDE you're using to develop your Flutter application, you'll need to configure it a little to be able to create a new Flutter project. So follow the instructions for your IDE on the Flutter documentation page, Set up an editor.

Add Device-Specific Permissions

Android

For your application to perform certain tasks on Android, you need to request permissions for these, such as accessing the internet or recording audio, so open the file android/app/src/main/AndroidManifest.xml and inside the <manifest ..., add the following lines:

While you're in the Android directory, you'll need to change what versions you're defining for the SDK and what version you're targeting to compile. This change meets the requirements of the third-party package you'll install later. Open the file: android/app/src/build.gradle and first fine the line: compileSdkVersion flutter.compileSdkVersion. Replace this line with compileSdkVersion 32.

Next, find the following two lines:

Update these to the versions shown in the example below:

iOS

For your application to access the microphone on your iPhone or iPad, you'll need to grant permission to this component. Inside your Podfile, locate the line: flutter_additional_ios_build_settings(target) and below this add the following:

Then inside your Info.plist, within the <dict></dict> block, add the following two lines:

Add Your UI

The first thing you're going to need is a UI to be displayed on the mobile device; this UI will need three components:

A Text area to display all transcribed wording,
a "start" OutlinedButton to begin the transcription,
and a "stop" OutlinedButton to stop live transcription.

Open the file lib/main.dart. In the _MyHomePageState class, replace the contents of this class with the build widget example shown below containing these three components:

You can test your changes work by opening a new Terminal session and running flutter run. If you have connected your mobile device to your computer, your device will now have the application installed onto it, and you will see a screen similar to what's shown below:

Handling the Text State

Next, your application needs to handle functionality to change the text displayed from a state instead. Find the line: class _MyHomePageState extends State<MyHomePage> { and just below this add the definition of the variable myText with the default text contained:

In your _MyHomePageState classes Widget build(), find the line: "This is where your text is output". Replace this string with your new variable that will update whenever a response comes back from your transcription requests. So replace this line with myText.

Two new functions are now needed to manipulate this variable. The first one (updateText) updates the text with a predefined piece of text, while the second (resetText) resets the variable's value, clearing the text from the user's screen.

Within the _MyHomePageState class, add these two new functions:

These functions aren't used at the moment, to rectify this, find the OutlinedButton with the text Start, and populate the empty onPressed: () {} function, with the following:

Install the Dependencies

Three third-party libraries are needed throughout this project, these libraries are:

sound_stream, to handle the microphone input, convert it to data ready for streaming over a WebSocket.
web_socket_channel provides functionality to make WebSocket connections which is how your application will communicate with Deepgram servers.
permission_handler handles the mobile device's permissions, such as accessing the microphone.

In the root directory of your project, open the file that handles the importing of these libraries, pubspec.yaml. Now locate the dependencies: line and below this add the three libraries:

Open a new Terminal session and navigate to your project directory. Run the following command to install these two libraries:

Handle Audio Input

All of the configuration is now complete, it's time to handle the functionality to transcribe. Back in your main.dart file, at the top add the following libraries that you'll be using in this application (including your three newly installed third party libraries):

Below these imports, add two constants that you'll be calling in this application:

These two constants are:

serverUrl to define the URL the WebSocket will connect to (Deepgram's API server in this instance). For more information on the parameters available to you, please check the API reference
apiKey, your Deepgram API key to authenticate when making the requests,

With this tutorial, you'll need to request permission to access your microphone before attempting to transcribe your messaging. You'll do this when the app has loaded (it will only request permission once), add the following initState() function, which also calls onLayoutDone when the layout has loaded on the screen:

Now below this initState() function add a new one called onLayoutDone, which is where your app will request permission:

It's now time to introduce the WebSocket and sound_stream to the project. First, you'll need to initiate the objects you'll be using that records sound and the web socket itself. Below your line String myText ... add the following:

When the application closes, it's good practice to close any long running connections, whether that be with components in your device or over the Internet. So, create the dispose() function, and within this function cancel all audio handling, close the websocket channel:

Next, you need to initialize your web socket by providing your serverUrl and your apiKey. You'll also need to receive the audio stream from your microphone, convert it into binary data, and then send it over the WebSocket for Deepgram's API to transcribe. Because this is live transcription, the connection will remain open until you request it be closed. Add your new _initStream() function to your _MyHomePageState class.

This functionality doesn't yet do anything; add a new _startRecord function, and within this, add the call to _initStream(). Calling this function tells sound_stream to switch on your microphone for streaming.

Also add the following _stopRecord() function to stop the _recorder

In the first OutlinedButton, with the text Start, find the onPressed: () {} function and add the following to call your _startRecord function:

In the next OutlinedButton, the text is Stop, find the onPressed: () {} function and add the following to call your _stopRecord function:

Your application is ready to test once you have added functionality to start and stop the transcribing. If you go back to your Terminal and run flutter run, you'll see the application refresh on your mobile device. You may be prompted to give microphone access, so be sure to approve this. You can now start transcribing!