Top 6 Dutch ASR Challenges: Diverse Dialects, Data, and Dictionaries
Dutch is an Indo-European language, spoken in Northern Europe, primarily in Holland and Belgium. The language has about 25 million native speakers, and another 5 million second language learners. The term "Dutch" actually encompasses two different language varieties-Nederlands, the variety of the language spoken in Holland, and Vlaams (or Flemish in English), spoken mainly in Northern Belgium. The two varieties are so similar that they are largely mutually intelligible, and have the same International Organization for Standardization (ISO) code, although there are non-trivial differences in vocabulary and pronunciation between the two. This blog post outlines some of the challenges in training a Dutch automatic speech recognition model that works well on different varieties of the language, given the diversity that we see on the ground.
1. Inflection
Inflection refers to the addition of prefixes or suffixes (or, indeed, infixes) to root words to indicate things like person, number, or tense. English has a relatively limited number of these inflectional suffixes. Regular verbs, for example, get an -s in the present tense if the subject of the sentence is third person singular. I sing and you sing but he/she/it sings. Nouns can also be inflected-regular nouns take an -s suffix to indicate the plural.
In Dutch, however, nouns and verbs have more inflectional possibilities than English-there are two options for the plural marker, for example, (-en and -s) depending on the noun, as well as other suffixes-the diminutive -tje or -je 'small' is very common in Dutch-consider aardbei 'strawberry' and aardbeitje 'little strawberry'-and is used productively, meaning that it occur on almost any noun. In terms of ASR, all of this means that the dictionary that's used by a Dutch speech recognition model to tell it what words it might expect to find needs to be much larger than an English dictionary.
2. Compound Words
Further complicating the issue with dictionary size, Dutch also has a large number of compounds (similar to German), and, like -tje/-je discussed above, these compounds can be created productively by speakers. In English, these would be represented by different words, all of which would appear in the dictionary on their own. But with Dutch, if the compound a speaker creates isn't in the dictionary, the system can have trouble transcribing it correctly. Moreover, it's often the case that the Dutch speech recognition model will transcribe each portion of the compound word as its individual parts, which are more common, but put them into a compound. For example, autoverzekering 'car insurance' is more likely to be transcribed as auto verzekering, as these individual words are more common than the compound.
Even more so than with inflection, compounds create the potential for lots of unknown words to appear in speech that won't be present in the dictionary. And even if the compounds are known, it can still create vocabulary size issues. Numbers in Dutch are a great example of this. In English, larger numbers are written as a sequence of words-221 is two hundred twenty one. But in Dutch, this number is tweehonderdeenentwintig. In English, you only need 30 entries in the dictionary to get the numbers from zero to 100, but in Dutch, you need 101 unique entries.
3. Different Dialects
There are many dialects spoken across the Netherlands and Belgium, and they aren't always widely understandable by people who speak a different variety. In the introduction, two large dialect groups were defined-Nederlands and Vlaams-but the situation is actually much more complicated and diverse, with each of those two varieties having multiple different dialects subsumed under them. And Frisian-technically a different language, but extremely similar to Dutch (and English!)-is also sometimes included in the general Dutch language area, further complicating matters. This creates challenges when training an ASR model due to different spellings of words, different pronunciations, and altogether different vocabulary items-Flemish, for example, tends to have more words borrowed from French due to being in Belgium.
The dictionaries that models use have limited vocabularies, as discussed above, so you have two options. The first is to standardize everything to one type of Dutch, which requires laborious and careful preprocessing of data and can introduce ambiguities. This also brings into question whether it is actually correct to 'translate' a specific word in one variety of Dutch into another variety-especially if you want to create a model that can handle any kind of Dutch thrown at it. These changes can also introduce bias into your work, making the model favor some varieties of the language over others. The second option is simply training your model on all of the data with the hopes it does decently well across all varieties. This is easiest to do with deep learning, as you can iterate through different data sets quickly and easily.
Unlock language AI at scale with an API call.
Get conversational intelligence with transcription and understanding on the world's best speech AI platform.