Article·AI Engineering & Research·Sep 3, 2025
20 min read

Deploy a Serverless Transcription Workflow with AWS Lambda + Deepgram STT

Pairing AWS Lambda with Deepgram’s speech‑to‑text (STT) API, you get a push‑button, event‑driven workflow that scales to meet bursts and drops to zero when idle. In this guide, you’ll deploy a production‑ready foundation: when audio lands in S3, Lambda sends a secure presigned URL to Deepgram’s /v1/listen endpoint and writes both the raw JSON response and the cleaned transcript text to a transcripts/ prefix.
20 min read
Why Serverless STT on AWS (Deepgram + Lambda)?1️⃣ Event-Driven by Design2️⃣ Predictable, Minimal Cost3️⃣ Built-in Resilience and Burst Control4️⃣ Zero-Ops Scaling5️⃣ Single-Purpose Functions = Clean Code6️⃣ Deepgram Developer ExperienceScenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)Case A: Async Workflow (recommended for 5‑minute files)Case B: Sync Workflow (okay for short clips)Architecture OverviewWhat happens, step by stepWhy this Architecture?Get Prerequisites (S3 Bucket, API Keys, Tooling)Account Setup and ToolsDeepgram CredentialsStage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/FoldersFrom the ConsoleOr Use Your CLIStage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQStep 1: Create two queuesStep 2: Allow S3 to send messages to the main queueStep 3: Point S3 Event Notifications to the SQS queueCLI OptionStage 3: Set Up AWS Lambda for Serverless TranscriptionStep 1: Log in and Navigate to LambdaStep 2: Create a New FunctionStep 3: Configure Function SettingsStep 4: Add Environment Variables (Lambda)Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)Stage 4: Add Required Permissions to IAM Execution role for Lambda💡 Common issues to watchout for:Stage 5: Add Handler Code to Lambda FunctionStage 7: Monitor and Alert (5 minutes)What to watchAlarms (typical thresholds)Useful Logs Insights query (p50/p95 over time)Troubleshooting Tips for the Serverless Transcription AppConclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STTWhere to go next
Share this guide
By Stephen Oladele
Last Updated
Why Serverless STT on AWS (Deepgram + Lambda)?1️⃣ Event-Driven by Design2️⃣ Predictable, Minimal Cost3️⃣ Built-in Resilience and Burst Control4️⃣ Zero-Ops Scaling5️⃣ Single-Purpose Functions = Clean Code6️⃣ Deepgram Developer ExperienceScenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)Case A: Async Workflow (recommended for 5‑minute files)Case B: Sync Workflow (okay for short clips)Architecture OverviewWhat happens, step by stepWhy this Architecture?Get Prerequisites (S3 Bucket, API Keys, Tooling)Account Setup and ToolsDeepgram CredentialsStage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/FoldersFrom the ConsoleOr Use Your CLIStage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQStep 1: Create two queuesStep 2: Allow S3 to send messages to the main queueStep 3: Point S3 Event Notifications to the SQS queueCLI OptionStage 3: Set Up AWS Lambda for Serverless TranscriptionStep 1: Log in and Navigate to LambdaStep 2: Create a New FunctionStep 3: Configure Function SettingsStep 4: Add Environment Variables (Lambda)Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)Stage 4: Add Required Permissions to IAM Execution role for Lambda💡 Common issues to watchout for:Stage 5: Add Handler Code to Lambda FunctionStage 7: Monitor and Alert (5 minutes)What to watchAlarms (typical thresholds)Useful Logs Insights query (p50/p95 over time)Troubleshooting Tips for the Serverless Transcription AppConclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STTWhere to go next

Audio transcription powers modern products (e.g., podcast platforms, customer support analytics, accessibility features, knowledge search) and usage is rarely steady. Some hours are quiet; others spike with uploads. Traditional, server‑based pipelines force you to provision for peaks, maintain machines, and pay for idle time.

Enter serverless. Pairing AWS Lambda with Deepgram’s speech‑to‑text (STT) API, you get a push‑button, event‑driven workflow that scales to meet bursts and drops to zero when idle. Instead of running a fleet, you wire an S3 upload to trigger a Lambda function; that function calls Deepgram for accurate, low‑latency transcription and saves the results right back to S3.

In this guide, you’ll deploy a production‑ready foundation: when audio lands in S3, Lambda sends a secure presigned URL to Deepgram’s /v1/listen endpoint and writes both the raw JSON response and the cleaned transcript text to a transcripts/ prefix.

Along the way, you’ll see how to keep costs predictable, add retries and dead‑letter queues for resilience, and extend the pipeline for search or analytics (all without maintaining servers or paying for idle compute time).

What you’ll build: A production‑ready pattern: S3 (incoming audio) → S3 Event → Lambda → Deepgram REST → S3 (transcripts), plus tips for costs, observability, and hardening.

Who it’s for: Platform engineers and developers who want to build a hands‑off, speech‑to‑text serverless transcription app on AWS.

Why Serverless STT on AWS (Deepgram + Lambda)?

Modern audio pipelines must transcribe at burst scale and sleep at idle. Yet most teams don’t want to babysit servers, autoscaling groups, or Kubernetes clusters just to move bytes from A to B.

Here are the key advantages in practice:

1️⃣ Event-Driven by Design

  • S3 Event Notifications fire the moment a file lands (no polling loops or cron jobs).

  • Each invocation handles one object, so concurrency naturally matches workload.

2️⃣ Predictable, Minimal Cost

  • AWS free tier: 1 M Lambda requests + 400k GB-s monthly.

  • Typical 5-min MP3 (≈5 MB) ≈ $0.022.

3️⃣ Built-in Resilience and Burst Control

  • Optional SQS buffer smooths sudden floods; a DLQ captures hard failures for replay.

  • Automatic retries on Lambda errors; you can add exponential back-off for Deepgram 429/5xx responses.

4️⃣ Zero-Ops Scaling

  • Lambdas launch in < 100 ms warm start; cold starts are minor for I/O-bound jobs.

  • No autoscaling rules or idle EC2 instances to watch.

5️⃣ Single-Purpose Functions = Clean Code

  • One Lambda = one responsibility: fetch audio, call Deepgram, persist transcript.

  • Easy to swap languages (Python, Node.js, Go) or hand off to Step Functions if you bolt on post-processing.

6️⃣ Deepgram Developer Experience

  • /v1/listen REST accepts URLs or byte streams, returns JSON you can drop into S3.

  • Choose models (e.g., nova-2, nova-3), languages, smart formatting, summarisation (all via query params).

  • Generous (200 USD) trial credits let you test thousands of minutes for free.

Scenarios and Cost Assumptions for Serverless STT on AWS (Deepgram + Lambda)

Pricing References:

  • Lambda duration: ~$0.0000166667 per GB‑second; requests ~$0.20 per 1M. Free tier: 1M requests + 400k GB‑s/month. (Source: AWS)

  • Deepgram (Nova‑3, pre‑recorded): ~$0.0043/minute (varies by plan/volume). (Source: Deepgram)

  • S3 egress (to internet): starts ~$0.09/GB in us‑east‑1; S3 request costs are tiny (PUT ~$0.005/1k, GET ~$0.0004/1k). (Source: AWS)

📝 Important: For clips ≥ ~1–2 minutes, prefer async Deepgram + webhook so Lambda runs for hundreds of ms (submit job) rather than seconds/minutes (wait for transcript).

  • Assumptions: Lambda memory 1024 MB, runtime 600 ms to create a presigned URL + submit async job; Deepgram processes in the background and posts results to your webhook (or you poll).

  • Lambda compute: 1.0 GB × 0.6 s × $0.0000166667 ≈ $0.000010 per file (plus $0.0000002 request).

  • Deepgram: 5.0 min × $0.0043/min ≈ $0.0215 per file.

  • S3 egress (example): 5‑min MP3 @ 128 kbps ≈ ~4.8 MB → 0.0048 GB × $0.09/GB ≈ $0.00043.

👉 Estimated total per 5‑minute file: ≈ $0.022

Case B: Sync Workflow (okay for short clips)

  • Assumptions: 30‑second clip; Lambda memory 1536 MB; Lambda waits for synchronous /v1/listen to return—~2 s Lambda time end‑to‑end.

  • Lambda compute: 1.5 GB × 2.0 s × $0.0000166667 ≈ $0.000050 per file (plus $0.0000002 request).

  • Deepgram: 0.5 min × $0.0043/min ≈ $0.00215 per file.

  • S3 egress (example): 30‑sec MP3 @ 128 kbps ≈ 0.5 MB → 0.0005 GB × $0.09/GB ≈ $0.000045.

Estimated total per 30‑sec file: ≈ $0.00225.

👉 Takeaway: In both workflows, Deepgram usage is the slightly dominant cost; Lambda duration and S3 request charges are negligible at this scale. Data egress is tiny for compressed audio but non‑zero.

Architecture Overview

When a client uploads an audio file is uploaded to Amazon S3 (for example, under audio-incoming/), S3 emits an ObjectCreated event. That event invokes an AWS Lambda function, which generates a presigned S3 URL for the object and calls Deepgram’s /v1/listen REST API with that URL. 

Deepgram transcribes the audio; the function then writes both the raw JSON response and a clean text transcript to a transcripts/ prefix in the same bucket.

Why presigned URLs? The audio stays in your private bucket; Deepgram fetches it securely via a time-limited URL. That keeps Lambda fast and memory-light and avoids base64 overhead.

What happens, step by step

Why this Architecture?

  • Event-driven + pay-per-use: scales to spikes, drops to zero at idle.

  • Low latency and cost: no double-handling of bytes in Lambda; Deepgram fetches directly.

  • Operationally simple: S3 stores, Lambda orchestrates, Deepgram transcribes; SQS/DLQ make bursts and failures easy to manage.

Get Prerequisites (S3 Bucket, API Keys, Tooling)

Before wiring events and code, make sure you have the following in place.

Account Setup and Tools

💡Tips:

  • Throughout this guide, we use us-east-1; ensure you use a consistent regon for all your resources

  • Keep Lambda outside a private VPC unless you have a NAT gateway. The function must reach Deepgram over the public internet.

Deepgram Credentials

Create an API key and keep it server-side only. We’ll pass it to Lambda via:

  • Simple: Encrypted environment variable (good for demos)

  • Best: AWS Secrets Manager (Lambda reads it at runtime)

Stage 1: Set Up S3 Bucket with Source Audio and Transcript Prefixes/Folders

From the Console

Create an S3 bucket. You can use one bucket (e.g., serverless-audio-transcription) with separate prefixes or two buckets—your choice.

  • Input prefix: audio-incoming/ (where you upload audio)

  • Output prefix: transcripts/ (where the function writes results)

Here’s the naming convection this guide uses:

📝 Notes: 

  • You don’t require bucket policy for presigned URLs. Do not add Deny rules that restrict GetObject by VPC endpoint or IP, or Deepgram won’t be able to fetch via the presigned link.

  • We’ll filter S3 events to only trigger on audio types and the audio-incoming/ prefix so transcript writes don’t re-trigger Lambda.

Or Use Your CLI

Stage 2: Create and Use Amazon SQS as an S3 Buffer and Add DLQ

Step 1: Create two queues

Head over to Amazon SQS to create two queues:

Dead-letter queue audio-dlq (Standard is fine). You’ll add the DLQ to the Lambda func for failures.

Main queue: audio-incoming-queue (Standard queue for max throughput)

  • Visibility timeout: set > your Lambda timeout (e.g., Lambda 60s ⇒ visibility 120–180s)

  • Redrive policy (DLQ): send to audio-dlq with MaxReceiveCount (start with 5)

📝 Note: Queue visibility timeout > your Lambda timeout because if your function runs near its timeout or you add retries/backoff inside, SQS needs enough time to keep the message hidden from other pollers. Too short and the same message can get delivered to another Lambda while the first is still working.

Step 2: Allow S3 to send messages to the main queue

S3 can only publish to SQS if the queue access policy allows it. In the SQS console → your main queue → Access policy → add a statement like:

> Replace REGION, ACCOUNT_ID, YOUR_BUCKET with the actual values.

Step 3: Point S3 Event Notifications to the SQS queue

  • S3 bucket → Properties → Event notifications → Create event

  • Event types: All object create events (s3:ObjectCreated) (or PUT, CompleteMultipartUpload).

  • Prefix: audio-incoming/

  • Suffix: .mp3, .wav, .m4a (you can make one per suffix or one that’s broad)

  • Destination: SQS queue → select audio-incoming-queue

This step sends “S3 event JSON” into your SQS queue as each audio file lands.

CLI Option

1) Create queues

2) Allow S3 to send to SQS

3) Wire S3 → SQS

Stage 3: Set Up AWS Lambda for Serverless Transcription

Create the AWS Lambda function that orchestrates audio transcription requests to Deepgram.

Step 1: Log in and Navigate to Lambda

  • Log in to your AWS Console.

  • Search for "Lambda" in the top search bar and open the Lambda service page.

Step 2: Create a New Function

  1. Click "Create function."

  2. Choose "Author from scratch."

  3. Set the following parameters:

  4. Function Name: audio-transcriber

  5. Runtime: Choose Python 3.10 (or later)

  6. Architecture: Choose arm64 (recommended for ~30% lower cost) or x86_64.

  7. Under Execution Role, select "Create a new role with basic Lambda permissions." (You’ll attach S3/IAM bits next)

  8. Click "Create function."

Step 3: Configure Function Settings

Once created, adjust these important settings under the Configuration tab:

Step 4: Add Environment Variables (Lambda)

Under Configuration → Environment variables, add:

📝 Note: We’ll set sensible defaults in thee Lambda code if these aren’t provided.

Step 5: Connect the main SQS queue to your Lambda Function (event source mapping)

  • Lambda → Add triggerSQS → choose audio-incoming-queue

  • Batch size: start with 5

  • Batch window: 0–1s

  • Maximum concurrency (per trigger): set a cap (e.g., 10) to throttle burst spend

  • Visibility timeout (on the queue) should be greater than Lambda timeout × expected retries. Ensure it’s 2–3× Lambda timeout.

That’s it. Now S3 drops notifications in SQS; Lambda polls SQS at your chosen rate, and failures get retried up to MaxReceiveCount then land in audio-dlq for inspection.

Or use the CLI to connect the main SQS queue to your Lambda func:

Stage 4: Add Required Permissions to IAM Execution role for Lambda 

Step 1: Get your inline policies ready

Below are the following inline policies you’ll attach to the Lambda execution role with least privilege for this guide:

CloudWatch Logs (basic Lambda logging):

S3 input (read audio for presigned auth and optional bytes path):

S3 outputs (read for idempotency + write transcripts):

SQS (buffer or DLQ)

👉 Shortcut: you can also attach AWS’s managed policy AWSLambdaSQSQueueExecutionRole instead of writing this inline.

(Optional) Secrets Manager
If you store the Deepgram API key in Secrets Manager:

Step 2: Go to the AWS Console → Lambda → open your function.

Step 2: Configuration → Permissions → under Execution role, click the blue role link (e.g., audio-transcriber-role-abc123).

Step 3: You’re now on the IAM Role page. Click Add permissions → Create inline policy.

Step 4: Click the JSON tab.

Step 5: Paste the policy JSON you need (e.g., the “CloudWatch Logs” block, or the “S3 access” block) and Save.

  • If you already have an inline policy with a "Statement": [ ... ] array, you can add another statement to that array instead of creating a separate policy.

Step 6: Wait ~30–60 seconds for IAM to propagate. Re-test your Lambda.

💡 Common issues to watchout for:

  • Wrong role: Always click the role link from the Lambda page to edit the correct one.

  • Wrong ARN: S3 ARNs are arn:aws:s3:::BUCKET/PREFIX/* (no region in S3 ARNs).

  • Policy shape: JSON must have a single "Version" and a "Statement" array; don’t paste multiple top-level objects.

  • Inline policy limit: Keep each inline policy under the IAM size limits; create multiple policies if needed.

  • Bucket policy Deny beats Allow: If there’s an S3 bucket policy with a Deny, it will override your role’s Allow (not needed for this guide; avoid Deny rules that restrict GetObject on input).

  • Propagation delay: IAM changes can take ~30–60 seconds to take effect. Refresh and retry.

Stage 5: Add Handler Code to Lambda Function

This handler code (see repo):

  • Accepts S3 events directly or SQS→S3 wrapped events.

  • Generates a presigned URL and calls Deepgram /v1/listen.

  • Writes JSON and TXT to transcripts/.

  • Is idempotent (checks existing outputs).

  • Includes useful logs and retry on 429/5xx.

Paste the code in the Lambda Code editor and click Deploy.

💡 Bytes fallback (optional): If your org enforces very strict bucket policies that block presigned URLs, add a temporary env var UPLOAD_BYTES=true and send bytes instead. Replace the Deepgram call in the handler code to post raw S3 bytes (Content-Type from the file extension) instead of sending a {"url": ...} JSON payload.

Stage 7: Monitor and Alert (5 minutes)

What to watch

  • Lambda: Invocations, Errors, Throttles, Duration (avg/p95), IteratorAge (if SQS)

  • SQS: Visible messages, AgeOfOldestMessage

  • DLQ: Visible messages

Alarms (typical thresholds)

  • Lambda Errors ≥ 1 for 2×5-min periods

  • Lambda p95 Duration > 5s for 2×15-min periods

  • SQS AgeOfOldestMessage > 60s for 2×5-min

  • DLQ Messages ≥ 1 (alert immediately)

Useful Logs Insights query (p50/p95 over time)

Troubleshooting Tips for the Serverless Transcription App

  • No Lambda logs after upload? S3 event miswired; check bucket Properties → Event notifications and Lambda Resource-based policy.

  • KeyError 's3': You’re using SQS; unwrap record.body (the handler in the repo does this).

  • ModuleNotFoundError: requests: add a layer/vendor dependency.

  • 403 on HeadObject (idempotency): add s3:GetObject on transcripts/*.

  • Deepgram REMOTE_CONTENT_ERROR 403: Lambda role lacks s3:GetObject on audio-incoming/* so presigned URL isn’t authorized; fix IAM.

  • Timeouts/no network: Keep presigned pattern, increase memory (faster CPU), switch to SQS trigger and raise max concurrency.

  • Trigger creation error (SQS visibility < Lambda timeout): Set visibility to ≥ 2× Lambda timeout (e.g., 120–180s).

Conclusion: Serverless Audio Transcription with AWS Lambda and Deepgram STT

You just stood up a production-ready, pay-per-use transcription pipeline: S3 catches audio, Lambda orchestrates, Deepgram transcribes, and results land back in S3; no servers to babysit, no idle spend. The architecture scales from a trickle to a flood, and every moving part is observable, permissioned with least privilege, and easy to extend.

A few takeaways worth underlining:

  • Cost tracks usage. Lambda stays hot for ~sub-second submit work while Deepgram minutes do the heavy lifting. You’re not paying for idle CPUs or overprovisioned nodes.

  • Operationally boring (in a good way). S3 events or SQS buffering handle bursts; DLQs and CloudWatch alarms give you fast feedback loops; idempotency prevents dupes.

  • Security stays first-class. Private buckets + presigned URLs keep audio in your account; IAM scopes exactly what the function can read/write. (If you tighten bucket policies later, bytes-upload fallback is a safe escape hatch.)

  • Developer-friendly. The code is small, explicit, and testable with sample events. Swapping models or languages is an env-var edit, not a rewrite.

Where to go next

  • Long files/live traffic: Switch to Deepgram’s async + webhook flow to keep Lambda < 1s end-to-end for multi-minute content or batch jobs.

  • Post-processing: Enrich transcripts with timestamps/diarization, push summaries/keywords to DynamoDB or OpenSearch, or run Comprehend for entities and sentiment.

  • Lifecycle & analytics: Partition transcripts by date, add S3 lifecycle rules, and query with Athena for reporting.

  • Hardening: Add WORM/versioning on transcripts, KMS for outputs, and formalize everything with Terraform/SAM and a CI pipeline.

  • Observability: Pin the Log Insights query + p95 duration on a dashboard; keep DLQ alarms on at all times.

If you’re integrating this into an existing product, start by pointing your current upload path at audio-incoming/, enable the SQS buffer, and let the pipeline shoulder the load. From there, it’s a few env vars to tune accuracy, language, and formatting—and you’ve got accurate, fast, and massively scalable speech-to-text without the ops tax.

Ready to ship it? Sign up for Deepgram and start building with a developer-focused STT API—and you’ll be transcribing reliably at serverless scale in minutes.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.