Improvements to Deepgram Redaction

Shir Goldberg

Published on Jan 24, 2023

We’re excited to announce an update to our Redaction feature.

Deepgram’s hosted API for pre-recorded audio now uses a powerful entity detection model for redaction functionality. This model has the ability to find and tag sensitive information with a high degree of precision—including PCI, social security numbers, or any string of numbers.

When using our hosted API for pre-recorded audio, the output of our redaction feature has changed to be more descriptive. Previously, transcripts would be returned with a * where an entity was redacted. Now, the type of entity detected and number of times it’s appeared in the transcript will be returned instead.

For example, if you chose to redact social security numbers, the output for “My social security number is five five five two two one one one one and his is six six six two two one three three three” would appear in your transcript as “My social security number is [SSN_1] and his is [SSN_2]”.

If live streaming audio, or using our on-premise product, redaction output will continue to use * in place of redacted information for the time being.

Entity detection-based redaction is available for customers using our hosted API to send pre-recorded audio. We’re hard at work building out this functionality for live-streamed audio and on-premise customers—look out for another announcement when that’s ready.

Please see our developer documentation for more information about the Redaction feature.

Stop building work-arounds for STT systems that don't work.

Start Free Talk to an expert