Imagine being in a virtual veterinary appointment. Your beloved pet isn't feeling well, and the vet starts explaining their conditions, treatments, and necessary medications using complex medical jargon. You're stressed and don't fully understand what is being said, let alone remember all the instructions to care for your pet at home. In our digital age, these situations are common in online appointments. 

Sometimes, you may have follow-up questions for the vet, but they’ve already disconnected from the call. Other times, there might be scenarios where you want to recall what the vet has discussed with you. Regardless of the scenario, telehealth apps highlight the need for clear, unambiguous communication. 

Misunderstandings can have dire consequences, leading to inaccurate diagnoses, incorrect treatment execution, and overall client dissatisfaction. Wouldn’t it be great if we could track these conversations and refer back to them as often as we could?

In this article, you will build the client-side of a virtual veterinary application that breaks down complex vet terminology into simpler language. Ready? If you are familiar with Deepgram, consider building a similar app and then coming back to this article to check the solution. Not familiar with Deepgram? Delve right in! 🚀

What is Deepgram?

Deepgram, an automatic speech recognition (ASR) platform for developers, offers more than just transcription services. It leverages state-of-the-art machine learning (ML) models to turn spoken words into accurate text in real-time. Deepgram can improve the performance of models up to 3x compared to OpenAI’s Whisper.

Beyond transcription, Deepgram's speech-to-text and understanding API is enriched with audio intelligence technology. This suite includes sentiment analysis, which can discern the emotional undertones in speech, a critical feature for applications requiring a deep understanding of user sentiment.

Additionally, Deepgram simplifies the management of long conversations through a summarization feature, ensuring key points are retained. Its topic generation capability further enhances comprehension by identifying and emphasizing the main topics within complex dialogues.

These features position Deepgram’s API for developing solutions where understanding and clarity in virtual veterinary consultations are paramount. Let’s learn more about this project.

The Idea: A Virtual Vet/Client App

One application where Deepgram can improve is telehealth apps, such as telehealth for pets. The healthcare industry, whether for pets or humans, is sprinkled with so much medical jargon, which can create confusion for patients or pet owners. This gap in understanding calls for an innovative solution.

Consider an application that uses Deepgram's audio intelligence in a virtual vet setting. It transcribes the vet-client conversation in real-time, highlighting key topics such as diagnosis, symptoms, and treatments. Beyond transcription, Deepgram's sentiment analysis detects emotional cues in the vet’s voice, subtly emphasizing specific advice.

The summarization tool captures the essence of the advice, sparing clients the need to take extensive notes or struggle with referencing the call again later. This approach can significantly simplify veterinary dialogue, reduce misunderstandings, and ensure pets receive the accurate care they need. Deepgram can change how online consultations with vets work by making them less stressful and useful.

Setting Up to Use Deepgram

Before integrating Deepgram into your application, you must register on the Deepgram website and obtain an API key. This key lets your application connect to Deepgram’s servers and use their automatic speech transcription services.

To get a free API key, follow the steps below:

Step 1: Sign up for a Deepgram account on this page.

Step 2: Click on “Create a New API Key.”

Step 3: Add a title to your API key and copy and paste it somewhere safe for later use.

Install the Required Libraries

Next, we’ll create your virtual environment. Feel free to use any virtual environment manager (i.e., conda). Make sure also to use Python 3.10 or above, as the minimum requirements for the deepgram-sdk require Python 3.10 and above:

conda create --name deepgram python=3.10
pip3 install deepgram-sdk python-dotenv

🚨To follow along, find the complete code walkthrough in this GitHub repository.

Using Deepgram's Features in the App

Before jumping into the code, make sure you have the following:

  • Your Deepgram API key.

  • You need an audio file with a recording of a vet diagnosis or any voice recording you wish to use to test. Here is a sample audio file in the repo to follow along.

Once you have the prerequisites above, we can import the necessary libraries and define our API key and files for our sample implementation:

import os
import json
from dotenv import load_dotenv

from deepgram import (
    DeepgramClient,
    PrerecordedOptions,
    SpeakOptions,
    FileSource,
)

# Load environment variables
load_dotenv()

# Optionally, you can import your API_KEY via system variable as best practice.
# Retrieve API Key from environment variables
API_KEY = os.getenv("DG_API_KEY")
if not API_KEY:
    raise ValueError("Please set the DG_API_KEY environment variable.")

# Path to the audio file and API Key
AUDIO_FILE = "your_audio_file.m4a"
deepgram = DeepgramClient(API_KEY)

Next, let’s define some convenient helper functions. These functions are meant to easily reference components of the JSON returned from Deepgram’s API. As aforementioned, these helper functions can help us with the following tasks:

  • get_transcript: Easily request Deepgram’s JSON response back for use by passing in the audio file and model options.

  • get_topics: Shows all the unique tags of a transcript. This can be useful for tagging visits, which can be used to search for relevant visits that are also similarly tagged.

  • get_summary: Get the summary portion only of the JSON as a string. Some applications can parse this to briefly display the visit.

save_speech_summary: Writes an audio summary to disk. This is most useful when saving audio summaries for the user to replay.

def get_transcript(payload, options):
    """
    Returns a JSON of Deepgram's transcription given an audio file.
    """
    response = deepgram.listen.prerecorded.v("1").transcribe_file(payload, options)
    return json.loads(response.to_json(indent=4))

def get_topics(transcript):
    """
    Returns back a list of all unique topics in a transcript.
    """
    topics = set()  # Initialize an empty set to store unique topics

    # Traverse through the JSON structure to access topics
    for segment in transcript['results']['topics']['segments']:
        # Iterate over each topic in the current segment
        for topic in segment['topics']:
            # Add the topic to the set
            topics.add(topic['topic'])
    return topics

def get_summary(transcript):
    """
    Returns the summary of the transcript as a string.
    """
    return transcript['results']['summary']['short']

def save_speech_summary(transcript, options):
    """
    Writes an audio summary of the transcript to disk.
    """
    s = {"text": get_summary(transcript)}
    filename = "output.wav"
    response = deepgram.speak.v("1").save(filename, s, options)

Now, we can efficiently use these functions in our main loop, allowing us to extract information as needed from the API response. We’ll write code logic to ingest the audio file and then define our settings using Deepgram’s audio intelligence models. 

Lastly, we’ll use the get_transcript() function to pass this information to get the API response. The JSON can now be used with the helper functions defined above to extract information for our app.

def main():
    try:
        # STEP 1: Ingest the audio file
        with open(AUDIO_FILE, "rb") as file:
            buffer_data = file.read()

        payload: FileSource = {
            "buffer": buffer_data,
        }

        #STEP 2: Configure Deepgram options for audio analysis
        text_options = PrerecordedOptions(
            model="nova-2",
            language="en",
            summarize="v2", 
            topics=True, 
            intents=True, 
            smart_format=True, 
            sentiment=True, 
        )

        # STEP 3: Call the transcribe_file method with the text payload and options
        r = get_transcript(payload, text_options)

        # STEP 4: Print responses that can be used for integration with an app
        print('Topics:', get_topics(r))
        print('Summary:', get_summary(r))

        # STEP 5: Additionally, these summaries can also be spoken back to you
        speak_options = SpeakOptions(
            model="aura-asteria-en",
            encoding="linear16",
            container="wav"
        )
        save_speech_summary(r, speak_options)

    except Exception as e:
        print(f"Exception: {e}")


if __name__ == "__main__":
    main()

Now run the script from the command line:

python demo.py

Running the script, you should get an output similar to the following:

Topics: {'Monitoring', 'Symptoms', 'Gastroenteritis', 'Nutritional monitoring', 'Gastritis', 'Anti nausea medication', 'Pregnancy diet'}
Summary: The speaker discusses their patient, who is experiencing signs of bowel preparation and GI issues. They recommend a meal plan and a drug injection for GI issues, and suggest feeding the patient a balanced diet for a few days to improve GI health. The speaker also mentions sending home equipment for monitoring the patient's well-being.

That’s it! You have a working script you can integrate with your interface or applications. See more samples and open source community showcases on the Deepgram repository.

👉 Remember to find the complete code walkthrough in this GitHub repository.

Further Improvements

Our implementation transcribes and summarises veterinary consultations, only scratching the surface of what we can do to improve our vet telehealth application. Integrating large language models (LLMs) like GPT-4 could open up new possibilities for further improvements. 

This integration could facilitate a chat feature where users can ask the system questions about the vet consultation using natural language, addressing questions and clarifying information. 

Furthermore, predictive analytics can significantly elevate our application's capabilities. By using ML algorithms, we can analyze patterns in pet health data to predict potential future health complications. This could create opportunities to suggest preventative actions. For instance, observing changes in a pet's weight or behavior could flag early symptoms of health issues like diabetes or arthritis, triggering timely vet consultations.

Moreover, analyzing a pet's health data can allow the application to customize a specific health maintenance plan, suggesting diet adjustments or ideal times for vaccinations. This method offers proactive health management, which increases the app's usefulness as a complete health and wellness tool for pets.

Conclusion

Deepgram's advanced automatic speech recognition technology could change how virtual veterinary appointments are done. We made it easier for veterinary professionals and pet owners to talk to each other by turning complicated, jargon-filled conversations into simple, easy-to-understand text.

This ensures that vets can convey crucial medical information effectively and empowers pet owners to understand and care for their loved pets appropriately.

Here are several helpful references and resources for learning more about Deepgram:

Join the Deepgram Discord.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo