Transfer learning is one of the hottest topics of natural language processingโand, indeed, machine learning in generalโin recent years. In this post, I want to share with you what transfer learning is, why itโs so helpful when thinking about language-related tasks, and how weโve used it to create a high-accuracy model for Portuguese based on the work that weโd already done for Spanish.ย
In this blog post, weโll discuss some of our specific logic here, including the intuition of picking Spanish for helping Portuguese model training and the similarities between these languages on many levels. But to get started, letโs talk about what transfer learning is and why itโs so valued at Deepgram before diving into the specifics of Spanish and Portuguese.
What is Transfer Learning? A Very Brief History
In short, we can say that transfer learning is taking a model that has been trained on one task, and using it for a different one by changing its training data. Transfer learning started its journey with word vectors, which are static vectors for each word in the corpus. For our purposes, you can think of a vector as a way of describing the relationship between words in a large, abstract space.
To get an idea of how this works, you can see an example of word vectors in Figure 1, below. If you look at the 1st box for the words man and woman, they’re the same, but the word king is different because he’s not an ordinary human; heโs the king. Also man, woman, and king share the same third box, because they’re all human. The fourth box of king is identical to man (baby blue), but woman has a different fourth box, pink. So king is more similar to man in this regard.
Figure 1.ย Word2vec, the ancestor of transfer learning, showing word similarities and differences (from https://jalammar.github.io/illustrated-word2vec/).
Next, contextual word vectors came with ELMo, or Embeddings from Language Models, which is โa type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy)โ (source). This provided a way to look at the relationship between words at a greater depth and with more .
Finally transformersโa type of deep learning model that differentially weights input dataโwere developed to generate contextual word vectors as well as a sentence-level vector, which is even better for linguistic analysis.
At each step of this process, though, the goal remained the sameโto look for good representations for our corpus words, which happen to be vectors. Both pretrained word vectors and transformers are trained on giant corpora, hence they know a lot about the target languageโs syntax and semantics. We feed pretrained vectors to our downstream models and the vectors bring what they know about the language, semantics of the words, and many surprising features to our models.ย
You can probably already see how this could be useful for language-related tasks. Speech recognition is the task of converting speech to text. Though speech recognition models are more sophisticated algorithms than text oriented statistical models, a neural network is still a neural network and weights are certainly used.ย
Hence, some weight re-using techniques are applicable to speech recognition, along with more sophisticated acoustic tricks. That is to say, once you have a model that works well for one language, itโs relatively easy to give that model data for another languageโespecially one thatโs closely relatedโand see better results than if you started from scratch.
Why We Use Transfer Learning
At Deepgram, transfer learning is highly valued. For a specific language, when we want to train a new version of a specific model, we donโt want to start from scratch. Instead, when we want to train a model for a brand new language, we want to transfer some knowledge from a similar languageโs model when possible.
To illustrate the power of these processes, weโll look at a specific case of transfer learningโgoing from Spanish to Portugueseโto show how you can train a model for a new language from scratch by the help of a similar languageโs model.ย
How Deepgram Works
Want to get more value out of your call center data, build the next game-changing voice feature, or save a lot of money on speech transcription? Learn why Deepgram is the platform to get you there.
Why Spanish to Portuguese
Letโs get started with the basic question of why we picked Spanish to help train our Portuguese model from scratch. Spanish and Portuguese come from the same language family and are closely related. Italian, French, Spanish, and Portuguese all belong to this language familyโthe Romance family. However, even if you didn’t know that, a quick glance is enough for one to see the similarities between Spanish and Portuguese just by looking at a piece of text. Hereโs an example text pair, taken from Alice in Wonderland, to explain what we mean:
Spanish
No habรญa nada tan notable en eso; Alicia tampoco pensรณ que fuera tan extraรฑo escuchar al Conejo decirse a sรญ mismo: ‘ยกDios mรญo! ยกOh querido! ยกLlegarรฉ tarde!’ (cuando lo pensรณ despuรฉs, se le ocurriรณ que deberรญa haberse preguntado por esto, pero en ese momento todo parecรญa bastante natural); pero cuando el Conejo sacรณ un reloj del bolsillo de su chaleco, lo mirรณ y siguiรณ corriendo, Alicia se puso de pie, porque le pasรณ por la mente que nunca antes habรญa visto un conejo con chaleco… bolsillo, o un reloj para sacar de รฉl, y ardiendo de curiosidad, corriรณ por el campo tras รฉl, y afortunadamente llegรณ justo a tiempo para verlo caer por una gran madriguera debajo del seto.
Portuguese
Nรฃo havia nada tรฃo notรกvel nisso; Alice tambรฉm nรฃo achou tรฃo estranho ouvir Rabbit dizer para si mesmo: ‘Meu Deus! Oh querida! Chegarei tarde!’ (Quando ele pensou sobre isso depois, ocorreu-lhe que deveria ter se perguntado sobre isso, mas na รฉpoca tudo parecia bastante natural); mas quando o Coelho tirou um relรณgio do bolso do colete, olhou para ele e correu, Alice levantou-se, porque lhe passou pela cabeรงa que nunca tinha visto um coelho com um colete… bolso, ou um relรณgio para levar. longe dele, e queimando de curiosidade, ela correu pelo campo atrรกs dele, felizmente chegando bem a tempo de vรช-lo cair em uma grande toca sob a cerca viva.
For non-speakers of Spanish and Portuguese, itโs totally reasonable to identify the above languages as the sameโand you can quickly and easily find a lot of similarities between the two texts above, even if you donโt know what they mean.
Similarities between Spanish and Portuguese
In this section, weโll compare Spanish and Portuguese phonetically, vocabulary-wise, and syntax-wise to better understand the ways that these languages are so similar in more depth.
Phonetic Similarities
The first major similarity between Spanish and Portuguese is acoustic similarity, which plays a key role for our transfer learning purposes. Here, acoustic or phonetic similarity means that the two languages use a set of sounds in their words that are very similar to one another.
Consonants are almost identical in both languages, although Spanish has three extra affricates that Portuguese does not. Other than that, the consonants look the same on paper and they sound the same. Below, in Figure 2, we can see the phonetic alphabet for consonants of both languages, taken from the SAMPA website.
Figure 2.ย Spanish and Portuguese consonants, represented in SAMPA.
Vowels are a bit different. The Portuguese phonetic system has more vowels, and perceptually, Portuguese vowels are described as sounding different by Spanish speakers. In Figure 3 below we can see the Spanish and Pt vowels side by side, again taken from SAMPA website.
Figure 3.ย Spanish and Portuguese vowels in SAMPA.
Here, Spanish vowels look more minimalistic and the Portuguese side looks more crowded, since it has nasal vowels, like French. Still, thereโs a certain level of similarity.
Vocabulary Similarities
The final similarity between Spanish and Portuguese we want to discover is vocabulary similarity. Here, similar vocabulary means vocabulary strings that are either common or differ by a small edit distanceโthat is to say, one or two different letters or sounds. Compare the vocabulary words below, in Table 1, with Spanish on the left and Portuguese given on the right.
Spanish | Portuguese | |
casa | casa | |
seรฑora | senhora | |
para | para | |
ahora | ahora | |
cabeza | cabeรงa | |
toma | toma | |
vamos | vamos | |
cama | cama | |
gano | gano | |
bonita | bonita | |
cara | cara | |
centro | centro | |
estados | estados | |
libros | livros | |
opiniรณn | opiniรฃo |
Table 1. Comparison of Spanish and Portuguese vocabulary items.
As we see, some words are literally the same and some words differ by an edit distance of one or two. This is great for transferring the word vectors from the Spanish model into the Portuguese model.ย
Syntactic Similarity
The final aspect of similarity between the two languages that weโll look at here is syntactic similarityโthat is, how similar is the way that sentences and phrases are constructed? This is also related to transferring weights, as syntactic constructions are one of the things learned by the model. Since weโd like to uncover how these two languages relate to each other, letโs compare the dependency trees for the phrases un perro pequeรฑo and um cachorro pequeno meaning โa small dogโ in each language. These trees show us how words within a phrase or sentence are related to each other. I generated both dependency trees with displaCy, shown below in Figure 4.
ย Figure 4.ย Dependency trees of โa small dogโ (literally, โa dog smallโ) in Spanish (top) and Portuguese (bottom).
In both dependency trees, we notice that the adjective comes after the noun. In both trees, the head is the noun perro/cachorro โdogโ and the adjective is attached to the head by the dependency relation adjective modifier, or amod. A determiner un/um โaโ is attached to the noun by the determiner, or det relation. We see that the above trees are identical; hence, if one learns the syntax of one Spanish/Portuguese pair, one can easily figure out the other languageโs syntax.ย
Putting together this information, now weโre ready to understand how we make use of transfer learning for training the very first speech recognition model of Portuguese from scratch.
Transferring our Modelโs Learning from Spanish to Portuguese
As weโve seen, Spanish and Portuguese are very similar to one another in terms of their sounds, their vocabulary, and their sentence structures. If we go back to our discussion of vectors and weighting at the beginning, you can see why starting with a Spanish model would be an effective choice for creating a Portuguese modelโthe weights and vectors are likely to be extremely similar, with only a few small points of difference that the model will learn when it sees Portuguese data, making small adjustments as it goes.
We can see how useful this is if we imagine instead starting with an English model to create one for Portuguese. Although this would better than starting from zero (Portuguese and English are, after all, both languages, and they are related to one other, albeit more distantly), the number and magnitude of the adjustments would be much greater than when starting with Spanish, requiring a longer training process and more data from Portuguese to get the model to the same level of accuracy.
Wrapping Up
I hope this article gives you a good sense of what transfer learning is and why it can be so impactful in a case like Spanish and Portuguese where the languages are very similar. Weโve got more transfer learning material coming in the next few months, so stay tuned to learn moreโyou can sign up for our newsletter below to keep up-to-date on whatโs happening at Deepgram.
Want to see the power of transfer learning first hand? Sign up for a free API key or contact us to get started using end-to-end deep learning for your speech recognition projects.
How to Evaluate a Deep Learning ASR Platform
Get the information you need about 1st generation, 2nd generation, and modern-day automatic speech recognition (ASR) solutions to ensure your evaluation experience is efficient and yields the data you need to make your purchasing decision.