Deepgram released a new version of its on-premises solution.
On-Premises Release 221007: Docker Hub Images
deepgram/onprem-api:1.70.0
deepgram/onprem-engine:3.37.2
Minimum required NVIDIA driver version: >=450.80.02
deepgram/onprem-license-proxy:1.2.1
deepgram/onprem-billing:1.4.0
deepgram/onprem-metrics-server:2.0.0
Changes
Deepgram On-premises deployments now support the following Understanding features (with the accompanying Understanding model deployed on-prem and the requisite configuration changes):
Summarization enables users to generate meaningful summaries from their audio data automatically. It provides a segment-level summary breakdown with start- and end-character positions, which customers can use to identify the start and end timestamps for each summarized section. summarize=true&punctuate=true
This requires the addition of the following section to the api.toml file:[features]
summarization = true
Language Detection enables users to identify the dominant language of an audio file and transcribe the output in the detected language. It does this by taking an initial sampling of the audio file. detect_language=true&punctuate=true
This requires the addition of the following section to the engine.toml file:[features]
language_detection = true
When you use these Understanding features, please note that the punctuate=true parameter is required as part of the ASR request. If you do not explicitly include this parameter, it will be implicitly included by the system.
Deepgram On-premises deployments now support Deepgram Cloud’s /v1 endpoint schema.
New on-prem configurations will default to using the /v1 endpoint schema.
Legacy on-prem configurations may continue to use the /v2 endpoint schema although it is now deprecated.
On startup, Engine will automatically enable half-precision floating-point format if it is supported by the NVIDIA GPU.
If you encounter issues with the ASR output, we recommend turning this feature off as a troubleshooting step. This can be done by adding the state parameter to the engine.toml file:[half_precision]
state = "disabled" # or "enabled" or "auto" (the default)
Engine will now return an explicit error message indicating if the model_manager search path is misconfigured.Error: failure: Configuration contains inaccessible model search path(s)
Time duration values can now be specified in configuration files using a human-readable format such as “1h” to represent 1 hour, “2m” to represent 2 minutes, “60s” to represent 60 seconds, etc.
The streaming connection timeout between API and Engine is now configurable via the streaming_timeout parameter in the api.toml file.[[driver_pool.standard]]
...
streaming_conn_timeout = "60s"
Fixes
Resolves an issue where WebSocket callbacks were improperly shutdown, which prevented the WebSocket Close frame from being issued in compliance with RFC 6455 and may have resulted in partial transcription data loss in the WebSocket callback.
We welcome your feedback, please share it with us at Product Feedback.