Leading Cloud Communications Platform

About the Customer

The customer is a prominent global provider of cloud-based phone system software, delivering seamless customer communication solutions for support and sales teams worldwide. They offer virtual phone numbers, interactive voice response (IVR), call routing, and integrations with popular CRM platforms. With over 19,000 business clients globally, their software facilitates efficient remote communications and enhances customer experiences for businesses ranging from SMBs to large enterprises.

Customer Challenge

The company initially implemented an in-house transcription solution powered by an open-source AI model (Whisper) to process customer support and sales calls. However, limitations quickly emerged, including:

Lack of real-time capabilities: Whisper only transcribed pre-recorded audio, hindering the company's ability to provide real-time agent support.
Limited scalability: With variable call volumes due to seasonal fluctuations and business growth, the existing system struggled to scale effectively.
Insufficient accuracy and latency: The previous solution lacked the precision required for high-quality customer interactions and could not meet latency expectations for real-time use cases.
Excessive resource demands: Internal development and maintenance consumed significant engineering resources, diverting attention from strategic product developments.

These shortcomings impacted customer experience, prevented the launch of critical new AI-powered products, and increased operational expenses.

Proposed Solution & Architecture

To overcome these challenges, the company migrated to Deepgram's AI-powered speech-to-text platform integrated with AWS infrastructure, resulting in a robust, scalable, and real-time transcription solution.

Technical Architecture and AWS Services Implemented:

Real-time Speech-to-Text Pipeline:
- Incoming Call Routing (Amazon Route 53):
  Calls are securely routed using Amazon Route 53, enabling reliable and performant DNS management of inbound traffic.
- Virtual Private Cloud (AWS VPC):
  All audio streams are securely transmitted within a dedicated AWS VPC, providing isolated, secure network connectivity between the customer’s call system and Deepgram's transcription API.
- Speech-to-Text Processing (Deepgram Listen API):
  Deepgram's real-time transcription service receives live audio feeds, transcribing them with extremely low latency and high accuracy.
Scalable Compute Infrastructure (Amazon EC2 and EKS):
- Amazon EC2 C5a instances: Handle API load balancing, ensuring rapid request-response cycles and effective management of transcription requests.
- Amazon EC2 G5 GPU instances: Run GPU-intensive inference workloads required by Deepgram’s speech models, enabling ultra-fast, accurate real-time transcription.
- Amazon Elastic Kubernetes Service (EKS): Manages container orchestration, automating the horizontal scaling of both C5a and G5 instances in response to varying workloads, efficiently handling peak call volumes without manual intervention.
Model Storage and Versioning (Amazon EFS):
AI models used by Deepgram for speech transcription are stored on Amazon Elastic File System (EFS), providing highly scalable, shared storage that supports rapid model retrieval and deployment updates.

With these AWS services, the company was able to achieve seamless, real-time transcription capabilities and scalability without investing heavily in internal maintenance.

Metrics for Success

The new Deepgram-powered AWS solution delivered substantial, measurable benefits:

Accuracy: Achieved approximately 30% lower word error rate (WER) compared to competitors and the previous solution, significantly enhancing transcription quality.
Real-Time Latency: Delivered real-time transcription with latency as low as a few hundred milliseconds, approximately 40 times faster than previous batch-processing solutions.
Scalability: Effortlessly accommodated fluctuations in call volume, scaling horizontally using AWS EKS-managed EC2 instances.
Cost Efficiency: Reduced transcription operational costs by approximately 3-5 times, significantly lowering total cost of ownership compared to the prior Whisper-based solution.
Resource Optimization: Reduced engineering team commitment from an entire team managing the previous solution to fewer than two full-time employees, freeing critical resources for strategic development.

Lessons Learned & Outcomes

Key insights from deploying Deepgram’s transcription solution on AWS infrastructure included:

Seamless Migration: Deepgram’s straightforward API, extensive documentation, and proactive applied engineering support dramatically simplified the migration from Whisper, reducing transition risk and disruption.
Strategic Focus: By offloading operational management of transcription to Deepgram’s managed solution, the company's engineering resources were redirected to product development, leading to accelerated innovation and faster time-to-market for strategic AI products, such as real-time agent-assist solutions.
Future-Proof Infrastructure: Utilizing AWS infrastructure provided a scalable, secure, and enterprise-grade environment, enabling easy future expansion and adaptability to evolving customer demands and new feature integration.

Try Deepgram for free with our API Playground

Test your own audio files or quickly explore its capabilities with our pre-recordings. Try it now for a seamless audio API experience!

Go to API Playground