Convert Cantonese speech to text with high accuracy, low latency, and enterprise-grade scalability. Deepgram delivers real-time and batch transcription through a developer-first speech-to-text API.
Trusted by the world's top Enterprises and Startups
Get real-time Cantonese speech-to-text in under 300 ms while maintaining high accuracy in noisy, accented, or overlapping conversations.

Speakers: 85-86 million worldwide
Regions: China (Guangdong and Guangxi), Hong Kong, Macau, Malaysia, United States, Vietnam, Singapore, Australia, United Kingdom, Canada
Dialects: Guangzhou Cantonese (standard), Hong Kong Cantonese, Macau Cantonese, Taishanese (Toisanese)
Writing system: Traditional Chinese characters with Cantonese-specific colloquial characters
Language family: Sino-Tibetan, Yue branch of Chinese languages
Cantonese is widely used across Hong Kong, southern China, and global diaspora communities, making it a key language for call center analytics, customer support AI, media captioning, healthcare transcription, multilingual voice agents, financial services, and legal documentation.

Deepgram includes everything required to produce accurate, readable, and secure Cantonese transcripts out of the box.
Automatically detect and label who is speaking in multi-speaker Cantonese conversations.
Apply automatic capitalization, paragraphing, and clean transcript structure for Cantonese text.
Instantly find words or phrases inside long Cantonese recordings without reprocessing audio.
Segment streaming Cantonese audio into real-time sentence-level units for voice agents.
Add accurate punctuation and capitalization to Cantonese transcripts for easy reading.
Automatically remove sensitive data like credit cards, phone numbers, and PII from Cantonese transcripts.
Improve recognition of uncommon words, product names, and industry terms in Cantonese audio by boosting them in the transcript output.

Start with Cantonese speech-to-text, then expand to 45+ languages using the same API, models, and tooling.
Start transcribing Cantonese audio with Deepgram's speech to text API. It is fast, accurate, and built for real-time applications.