Large Language Models (LLMs) have come a long way since the initial release of ChatGPT; they are now more capable, reliable, and flexible, offering a variety of free and proprietary options as well as impressive multimodal abilities. However, no matter how knowledgeable LLMs are, they remain far from perfect; when asked about hard, cold facts, they are known to be inconsistent.

The size of LLMs is increasing at a staggering rate. ChatGPT once had a mere 175 billion parameters (which was considered substantial in 2022), while GPT-4 boasts a whopping 1 trillion parameters. Yet, it’s still far from capturing humanity’s entire knowledge base, which consists of up to zettabytes of information. Furthermore, beyond the pure lack of knowledge that LLMs may have in niche areas, they also rarely admit to their insufficient understanding of a topic.

A quick example will illustrate ChatGPT’s “stupidity”: Ask ChatGPT (3.5) how many e’s are in the word ‘Ketchup’. Surprisingly, it may respond with “0” (or “2,” or anything but “1”)! This simple yet crucial mistake mainly highlights three flaws in most LLMs:

  1. When faced with something they don’t know, they typically resort to responding with a logically complete, but possibly incorrect, answer (at least without further prompting).

  2. We can somewhat assume that for basic facts, such as the number of letters in a word—something humans naturally understand through speaking the English language, combined with our logical reasoning abilities—don’t populate LLMs’ training data as much as scientific journals and codebases do.

  3. LLMs have no idea how semantics, logic, and words associate like we do; they simply learn the numerical relationships between them.

In the end, whether it’s a tricky math question or a rarely sought area of scientific research, LLMs don’t and can’t retain every single bit of human knowledge. This is where Retrieval-Augmented Generation (RAG) comes in.

RAG is a framework that allows LLMs to extract information from external knowledge databases. We have already seen it in action! GPT-4’s ability to use the browser and learn how to utilize different plugins is an example of RAG, where the model can decide when and how to use external tools to assist with its response.

But RAG doesn’t just stop there, the RAG framework provides the model an option to pull accurate information from external sources, meaning that it is less likely to produce incorrect responses that it would have without the external information. RAG also reduces the necessity to update the language model for up to date information. Considering the training cost of GPT-4 reaches up to $100 million, RAG can save some serious financial troubles.

🏛️ Retrieval and Generation: The Twin Pillars

The foundational pillars of RAG are the retrieval and generation components. The retrieval component acts as the gateway to external information, tasked with fetching relevant data based on the input query. On the other hand, the generation component takes the baton from the retrieval component, utilizing the fetched information to construct coherent, accurate, and contextually enriched responses.

Retrieval Component:

The retrieval component embodies the essence of RAG’s external knowledge access capability. At the heart of this component lies a retriever model that sifts through vast knowledge sources to extract pertinent context passages in response to a given query or task. The retriever operates on a well-defined metric to gauge the relevance of information within the external sources, be it documents, paragraphs, or smaller text fragments.

Retrieval Sources:

  1. Training Corpus: The training corpus serves as a primary reservoir of knowledge. In this mode, the retriever accesses an explicit and accessible form of knowledge encapsulated within the training corpus during inference.

  2. External Data: Diverging from the training corpus, the retrieval from external datasets opens the door to a broader spectrum of information, especially beneficial for domain adaptation and knowledge updating.

  3. Unsupervised Data: An innovative approach where unsupervised datasets are leveraged, aligning source-side sentences and target-side translations in a dense vector space for machine translation tasks.

Retrieval Metrics:

  1. Sparse-vector Retrieval: Employing methods like TF-IDF and BM25, sparse-vector retrieval efficiently matches keywords through an inverted index.

  2. Dense-vector Retrieval: Dense-vector retrieval transcends lexical overlap, venturing into semantically relevant retrievals using pre-trained language models that encode text to low-dimensional dense vectors.

  3. Task-specific Retrieval: This retrieval modality unifies the memory retriever and downstream generation model in an end-to-end optimized manner, catering to task-specific objectives.

Generation Component:

Post retrieval, the generation component comes into play, armed with the additional context garnered from the external sources. The generative model now has a richer range of information to draw from, allowing for a more nuanced and accurate response generation.

The brilliance of RAG lies in the seamless integration of the retrieved external memory with the generation process. There are several methods to actualize this integration, with data augmentation being a straightforward technique. Here, augmented inputs are constructed by concatenating spans from the retrieved data, enriching the input sequence for the generative model. This synergy empowers the LLMs to traverse beyond their intrinsic knowledge boundaries, delving into the expanse of external knowledge to generate more informed and precise responses.

🤖 RAG in ChatGPT

The operational mechanism of RAG can be witnessed in the capabilities of GPT-4, where the model decides when and how to utilize external tools to assist with its response. The utilization of browser tools and plugins by GPT-4 to extract information from external sources exemplifies RAG in action, demonstrating a practical application of the RAG framework.

In terms of external plugins, the framework used by OpenAI is a lot simpler with most of its mechanisms relying on GPT-4’s own intelligence.

Plugin creators simply need to create a manifest file describing the use of their plugin to the language model, and it’s expected that the language model itself will decide on when and how to utilize the plugin.

In terms of the browsing feature, it can be deduced that a similar approach is used but ChatGPT is provided with some more information. There is currently no specific explanation given by OpenAI on how the browsing feature works under the hood.

Furthermore, enabling language models access to external databases translates to fewer hallucinations. Since it’s able to search for information that was not present in its training data instead of combing up with nonsense.


As we stand on the cusp of technological advancement in the realm of Large Language Models, it becomes imperative to acknowledge both the milestones achieved and the chasms that still exist. The journey from the inception of ChatGPT to the sophisticated GPT-4 model showcases remarkable progress, yet the road doesn’t end here.

The integration of the Retrieval-Augmented Generation (RAG) framework is a testament to the evolutionary path of LLMs. By bridging the intrinsic knowledge gaps of language models, RAG ushers in a new era of information reliability and precision. The key lies in the harmonious collaboration between the retrieval of accurate, real-time data from external sources, and the nuanced, context-aware generation of responses. This not only enhances the model’s response accuracy but also significantly reduces instances of data hallucination, thereby elevating trust in machine-generated content.

However, the application of RAG within models like GPT-4 is not without its set of challenges. The simplification in the use of external plugins and the vague intricacies behind the browsing features underscore the need for more transparent operational mechanisms. As developers and researchers strive to optimize these functions, there remains a thin line between ensuring seamless information retrieval and maintaining the privacy and security of external databases accessed by these models.

Looking ahead, the dynamic landscape of LLMs promises further innovation. The future may witness more advanced iterations of RAG, potentially with adaptive learning algorithms that could autonomously update the model’s knowledge base in real-time. Furthermore, the pursuit of reducing training costs and environmental impact of these models cannot be understated. As we venture into this future, the potential expansion into multimodal learning, incorporating visual, auditory, and sensory data, could revolutionize how LLMs understand and interact with the world.

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo
Essential Building Blocks for Language AI