I’m pleased to announce the on-prem availability of several Deepgram Speech Understanding features: Summarization, Language Detection, and Topic Detection, along with some compatibility updates for Deepgram Cloud and our SDKs.

Deepgram On-premises users can now choose between Deepgram’s Base and Enhanced models in an ASR request via the tier query parameter, where tier=base will select the Base model and tier=enhanced will select the Enhanced model.

Deepgram On-prem supports the following Understanding features for batch ASR:


Summarization enables users to generate meaningful and relevant summaries and insights from the audio data. It automatically identifies meaningful summaries with references to the source transcript.


Language Detection

Language Detection enables users to identify the dominant language of an audio file and transcribe the output in the detected language. It is available in 16+ languages including Spanish, Hindi, French, German, Japanese and Polish.


Topic Detection

Deepgram's Topic Detection is based on an unsupervised topic modeling technique that enables users to detect the most important and relevant topics that are referenced in speech within the audio.


To use these Understanding features, please note that the punctuate=true parameter is required as a part of the ASR request. If you do not explicitly include this parameter, it will be implicitly included by the system.

Please read each Deepgram Blog post about the features for more information.

Compatibility Updates for Deepgram Cloud and SDKs

Deepgram On-prem users who previously wanted to deploy Deepgram as a hybrid Cloud + On-prem solution had to implement support for two separate API schemas. Furthermore, the Deepgram SDKs do not support the legacy on-prem API schema, so until now developers were unable to take advantage of the Deepgram SDK’s ease-of-use for rapid development.

Deepgram On-prem now supports Deepgram Cloud’s API schema for a seamless transition between Cloud and On-prem in hybrid deployments, as well as supporting all Deepgram SDK’s for an easy developer experience.

New Deepgram On-prem configurations will default to using the /v1 endpoint. Legacy on-prem configurations can continue to support the /v2 endpoint, although this endpoint is now deprecated and users should migrate to the /v1 endpoint.

Here’s an example of how you can use the Deepgram Python SDK with an on-prem endpoint:

Pythonfrom deepgram import Deepgram
import asyncio, json

async def batch():
    dg_client = Deepgram({'api_url': "http://ON_PREM_HOSTNAME:8080/v1", 'api_key': "DEEPGRAM_API_KEY"})
    with open("Bueller-Life-moves-pretty-fast.wav", 'rb') as audio:
        source = {'buffer': audio, 'mimetype': "audio/wav"}
        response = await dg_client.transcription.prerecorded(source, {'model': 'general', 'tier': 'enhanced'})
        print(json.dumps(response, indent=4))


Deepgram On-prem now automatically enables half-precision floating-point format (aka “half precision”) if Engine detects that half-precision is supported by the NVIDIA GPU.

To explicitly enable or disable this feature, users can specify the state value within the [half_precision] section of their engine.toml file:

  state = "enabled"  # or "disabled" or "auto"

Lastly, Deepgram On-prem now supports the all-new “CloseStream” web socket message for closing your live audio streams. Please see the New Methods for Closing Streams changelog post for more information, or refer to the API documentation for Transcribing Live Streaming Audio.

To learn more about the latest release of Deepgram On-prem, please see the latest changelog posts: 

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo