Deepgram's redaction just got better!
We're excited to announce some changes to how our redaction feature works for customers using our hosted API.
Deepgram’s redaction for pre-recorded audio is now backed by a powerful entity detection model. This model has the ability to find and tag sensitive information—payment card information (PCI), social security numbers, or any other string of numbers that could be sensitive or subject to security policies.
After our entity detection model identifies sections of a transcript which may contain sensitive information, our API will automatically redact that information from the transcript. Don't worry: Deepgram's redaction capability is not a black box: we tag redactions with the type of entity redacted before returning it in our API output.
Prior to this change, Deepgram’s redaction worked via heuristics. For example, we had built in the assumption that any string of sixteen digits was a credit card number. While this approach is capable of detecting many instances of sensitive data, our new model-based approach will deliver more accurate inference and even better protection of sensitive information.
Best of all: No API changes are needed to use this new redaction!
With this latest release, entity detection-based redaction is available for customers using our hosted API to send pre-recorded audio. We’re hard at work building out this functionality for live-streamed audio and on-premise customers—look out for another announcement when that’s ready.
If you need a refresher on how to enable redaction, check out our developer documentation. Happy coding!
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .