What is RAG in AI?
RAG helps AI move beyond its training limits by dynamically retrieving relevant information snippets, allowing language models to answer specific questions with precision and context.
AI models only know about the stuff that was in their training data
What do we do if we wish to use an AI model outside of the training data, perhaps latest news or company-specific product information?
Similar to reading newspapers with current news together with text books for historical context.
This is where a technique called RAG comes in...
RAG: The Clever Hack That Makes AI More Knowledgeable
Imagine you've hired a knowledgeable team member who understands a lot—but sometimes needs help with specific details. This describes how current AI language models typically are: capable, but not all-knowing.
These models are good generalists, but they do not know every single detail about everything in the world. They are also constrained by knowing nothing past the date they were trained on.
The AI Knowledge Challenge
Large language foundation models from companies like Meta, OpenAI, and Anthropic are trained on substantial internet public data. This results in them attaining broad knowledge. However, they have limitations when it comes to very specific information.
Want to know the colour options for a particular company's product? These AI models will likely respond that they don't have that information. They are simply working within their trained knowledge boundaries. Perhaps a company needs to get responses in context of that companies customer service policies, domain knowledge like this does not exist in the model. Even worse, it may try to answer from other companies it has data from.
RAG: A Knowledge Expansion Hack
Retrieval-Augmented Generation (RAG) is the clever workaround that helps AI go that bit further.
-
Chunk Up the source info: Take information, product datasheets, technical manuals, case studies, customer data, company guideline. Break them into searchable text chunks. Make them easy to fetch contextually, via step 2.
-
Contextual Information Gathering:: Store these chunks in a vector database, or other document store, that can quickly find similar pieces of information.
-
Real-Time Information Retrieval: When someone asks a question, the RAG system searches this database and grabs the most relevant text snippets. The RAG engine can make choices about the right sources of for the information depending on the nature of the prompt. It is also possible to allow RAG to do a web request to fetch the current weather, for example too, for real time data, or maybe stock market information.
-
Enhanced Prompting: The AI combines the original question plus the relevant information snippets from RAG engine, and history of the chat context too.
-
Intelligent Response: The AI uses its reasoning skills to craft an informed answer using the retrieved information together with its wider general knowledge.
Why RAG is cool
Keeps Training Costs Down
Training AI models is eye-wateringly expensive. With RAG, you don't need to go though time consuming fine-tuning of the model, you just update the RAG reference databases, resulting in instant updates to results.
Maintains Data Security
Sensitive propriety corporate information an intellectual property remains in the companies databases, never absorbed into the AI model.
Enables Real-Time Relevance
RAG can incorporate recent wider Internet search results, helping keep the AI's knowledge current. Fine-tuning a model bakes that knowledge into the model so is not longer up to dated once in the model
Closed models
Not all models are open to be fine tuned. For closed AI models that prevent access to adjusting the model weights, RAG offers an alternative way to widen the usefulness of the model.
Real-World Examples
- Product Support: Answering specific questions about product features, understanding customer service policies such as returns and warranties.
- Legal Research: Pulling relevant, up to date, case law for specific scenarios
- Technical Documentation: Providing precise answers from manuals or user guides
How RAG Actually Works
RAG works by enhancing AI's ability to reason, summarize, and present information intelligently. Instead of retraining the AI, it adds relevant context to the prompt from external sources, allowing the model to handle queries it wouldn't otherwise know about. This makes the AI smarter and more efficient without heavy computational demands.
Beyond Corporate Use
Services like Perplexity and ChatGPT's web search use RAG with internet search results, giving real-time, relevant answers that go beyond the model's original training data.
RAG isn’t just a technical tool—it helps make AI more flexible, accurate, and useful for addressing a wide range of domain challenges. By using external data to provide context, RAG also protects privacy and intellectual property by keeping sensitive information outside the model. As AI develops, approaches like RAG will play an important role in helping these systems adapt to real-world applications.
A future post will look at doing real world RAG with .NET and Llama.