How to Transcribe Only What You Need with Python: Listening Before Connected
Imagine a fast-food restaurant taking orders in real-time using a speech-to-text API.
The challenge is that the customer will start speaking and sending audio data before the WebSocket connection opens. We need a way to capture that audio along with transcribing whatever the customers say after the WebSocket has been opened until they are finished speaking their order.
One solution is using a buffer, or a queue, to store the audio data before the WebSocket is connected. In Python, we can implement a buffer by using a list. We can add the audio data in bytes to the queue before the WebSocket connection is made and even continue using the buffer during the speech-to-text transcription after the connection is made.
In the next section, we will see to implement this solution using Python and the Deepgram speech-to-text API.
Using a Buffer in Python to Store Audio Data from Speech-to-Text Transcription
To run this code, you’ll need a few things.
Grab a Deepgram API key from Deepgram
Install the following packages using pip
The following is the solution implemented in Python with a quick explanation of the code:
Python Code Explanation for Using a Buffer with Speech-to-Text Transcription
Since we’re working with Python’s asyncio, we need to create a callback function as defined by PyAudio. This callback puts an item into the queue without blocking.
Next, we define an outer function called process() that gets the authorization for Deepgram. We create a context manager to async with websockets.connect to connect to the Deepgram WebSocket server.
The sender() function sends audio to the WebSocket. The buffer audio_queue.get() removes and returns an item from the queue. If the queue is empty, it waits until an item is available.
The receiver() function receives the transcript, parses the JSON response, and prints the transcript to the console.
Lastly, we run the program using asyncio.run(run()) inside of main.
Conclusion
We hope you enjoyed this short project. If you need help with the tutorial or running the code, please don’t hesitate to reach out to us. The best place to start is in our GitHub Discussions.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.