Created: 17 Jan 2025

Updated: 2 Jun 2026

What is RAG?

In our previous article, we explored RAG in AI. Let’s briefly recap what retrieval-augmented generation is all about. 

Large Language Models (LLMs), such as OpenAI's GPT series, Meta's LLama series, and Google's Gemini, have made remarkable progress in the field of generative AI. However, these models can sometimes generate responses that are incorrect (a phenomenon called hallucination), rely on outdated data, and operate in a way that lacks transparency. The retrieval-augmented generation framework addresses these issues by enriching LLMs with additional information. RAG is widely used in modern question-answering applications such as chatbots, but its applications extend far beyond question-answering and conversational bots to include document summarization, enterprise search, personalized recommender systems, and more.

Technological progress doesn't stand in one place, and RAG is changing with time, too. Let's explore how RAG systems evolved over the last few years and discover the differences between Naive RAG, Advanced RAG, and Modular RAG frameworks.

RAG: naive, advanced & modular

The architecture of RAG framework keeps evolving and consists of three primary architectures, including Naive RAG, Advanced RAG, and Modular RAG. Despite being more cost-effective and better than native LLM in several aspects, RAG framework is not without its flaws. Advanced RAG and Modular RAG try to overcome the specific shortcomings associated with Naive RAG.

The Naive RAG is the simplest form wherein the information is extracted from the knowledge base and fed to the LLM in order to produce an answer.

The Advanced RAG model involves additional procedures before and after the process of retrieving to fine-tune the information that was retrieved in order to improve the precision of the output produced.

The Modular RAG is the most complicated and precise architecture of all. This framework relies on the principles underlying Advanced and Naive models and possesses a number of special characteristics, for instance, the incorporation of the search module in order to perform similarity searches and finetuning of the retriever. Various approaches have emerged within this framework, among them, RAG module reorganization and RAG pipeline restructure.

RAG has emerged as a popular approach because of its ability to enhance the precision of content generated through artificial intelligence techniques, particularly when a lot of information needs to be learned, updated continuously, and incorporated from domain-related sources. To overcome the limitations of RAG, Advanced RAG and Modular RAG have been developed.

Naive RAG

The Naive RAG can be regarded as a system where the system retrieves the required data from a knowledge base and delivers it to the LLM in order to produce an output response. Through a simple mechanism, the Naive RAG goes through four main processes: indexing, retrieving, augmenting, and generating.

Indexing means that data in different formats including PDFs, HTML files, Word documents, and Markdown will be processed to convert them into the plain-text format.

To put it more precisely, the indexing of data entails operations such as:

  • Data loading, which refers to adding all documents or other information needed for work.  
  • Data splitting consists of splitting the documents into smaller fragments, e.g., 500-character chunks.  
  • Data embedding is creating the machine-readable version of data using a vector embedding algorithm.  
  • Data storing allows saving vector embeddings in a database to use them later.

The retrieval process starts when a user inputs the query. Afterward, the user input is encoded to create a query vector in the same way as done with the documents during indexing. Finally, the system calculates the similarity score for each chunk and selects those that match best to the query, e.g., the top K.

The augmentation step combines the query with the data retrieved by the user. The context is then generated from the combination.

In generation, the generated context will be used to create a response based on the query from the user.

Advantages of Naive RAG:

  • One of the key advantages of Naïve RAG is that it is straightforward to implement because it integrates retrieval and generation, making it easier to improve upon language models without any extensive changes.
  • One of the most significant advantages of Naïve RAG is the fact that there is no need to fine-tune the LLM, thus saving both time and money.
  • The other advantage of Naïve RAG is that it vastly improves the precision of its results, thanks to utilizing external and current data.
  • The use of RAG solves the prevalent issue of LLM hallucinations, i.e., generating false information.
Challenges:
  • Insufficient processing where the extracted information is processed directly without any modifications may lead to inaccuracies in the generated outputs.
  • The performance of the generated outputs is dependent on the ability of the retrieval model to identify the most pertinent information.
  • Naive RAG may fail to understand the context behind the question asked and provide answers which are precise yet may not fully match the intentions of the user.
naive RAG diagram

Advanced RAG

Advanced RAG builds on the concepts used by Naive RAG and introduces an additional level of sophistication to the technique. Rather than applying raw data retrieved from the Internet directly, Advanced RAG uses additional processing in order to improve the quality of responses. The two kinds of additional processing involved include pre-retrieval and post-retrieval.

Pre-retrieval

In Advanced RAG, the retrieval process is optimized even before it begins. Let’s explore what kind of optimizations the pre-retrieval phase involves:

1. Improved data chunking

The sliding window method takes advantage of overlaps between chunks, capturing context across boundaries and improving the coherence of retrieved chunks. Another useful data chunking method is adaptive chunking which dynamically adjusts chunk sizes based on content complexity or query requirements which results in a more meaningful division of data.

2. Dynamic and specialized embeddings

Domain-specific fine-tuning lets embedding models be fine-tuned with domain-specific datasets, allowing the system to capture nuanced information relevant to specialized tasks. Iterative embedding refinement can also improve relevance, and it involves periodic updates and retraining of embeddings, aligning them with evolving datasets and queries.

3. Hybrid indexing techniques

Advanced RAG uses a mix of indexing algorithms to combine the strengths of different approaches.

  • Graph indexing represents entities and their relationships as nodes and edges, improving the system's ability to retrieve contextually rich and connected information.
  • Hierarchical indexing organizes data in a layered manner, providing efficient navigation through large datasets for both general and specific queries.
  • Vector indexing uses embeddings to represent data in a high-dimensional space, allowing semantic searches based on similarity.
  • MSTG (Multi-Strategy Tree-Graph) indexing combines hierarchical tree and graph structures to optimize for both filtered and unfiltered searches for fast and accurate retrieval. This hybrid approach balances the strengths of both methodologies for varied search requirements.
  • Advanced RAG supports incremental indexing, which allows the system to update the index with new data without needing a complete rebuild, allowing the system to remain up-to-date with the latest information.
4. Metadata integration

Including metadata in the indexing process can improve filtering and retrieval capabilities. Metadata fields like document type, source, creation date, or relevance scores help refine searches and help the system retrieve contextually appropriate chunks.

5. Query refinement

Before the retrieval phase begins, user queries are enriched to improve their precision. Techniques like query rewriting, expansion, and transformation are applied in this step so the system retrieves the most appropriate information. For instance, a vague query can be refined by adding context or specific keywords, while query expansion introduces synonyms or related terms to capture a broader range of relevant documents.

6. Hybrid search

A hybrid search approach enhances retrieval capabilities by combining various search techniques, such as keyword-based, semantic, and neural searches. For example, MyScaleDB supports both filtered vector and full-text searches while, thanks to its SQL-friendly syntax, allowing the use of complex SQL queries. This hybrid strategy presents highly relevant results, regardless of the query type or complexity.

Retrieval

In this step of Advanced RAG, we can improve our processes by optimizing chunk embedding and retrieval. After determining the chunk size, the next step is embedding these chunks into a semantic space with the help of an embedding model.

Optimizing chunk retrieval

During retrieval, the most relevant parts are identified by measuring how similar the query and the embedded chunks are. Embedding models can be fine-tuned to improve this process. For an embedding model to have the best chance to capture the nuances of domain-specific information, fine-tuning with customized datasets has to be performed. These custom datasets should include: queries related to the domain, a body of domain-relevant content, and documents that provide contextually accurate information.  

Choosing similarity metrics

Selecting the right similarity metric is critical for optimizing retrieval accuracy. Many vector databases, such as ChromaDB, Pinecone, and Weaviate, support a variety of similarity metrics. Examples include:  

- Cosine similarity calculates the cosine of the angle between two given vectors and is useful for high-dimensional data when it is important to capture semantic similarity;

- Euclidean distance (L2) measures the straight-line distance between two points and is more likely to be used for lower-dimensional data;

- Dot product evaluates the scalar product of two vectors and is used in scenarios where the magnitude of vectors is relevant to the computation;

- L2 squared distance, which involves squaring the Euclidean distance, is most commonly used in scenarios where the magnitude of differences between points is critical, and you want to penalize large deviations more than small ones;

- Manhattan distance measures the distance between points based on a grid-like path and is particularly useful in scenarios where differences between attributes are measured as the absolute sum of their deviations along each dimension.

Fine-tuning embedding models with domain-specific data and optimizing similarity metrics allows RAG to retrieve more accurate information.

Post-retrieval

Once the context data (chunks) is retrieved from a vector database, the next step is combining it with the user’s query to create input for the language model. However, these chunks can sometimes include duplicate, noisy, or irrelevant information, potentially affecting how the LLM interprets and processes the context. Below are some strategies to address these challenges effectively:

1. Re-ranking for prioritization

Advanced RAG introduces re-ranking as an additional step after retrieval to refine the information so that the most relevant and valuable data is given priority.

Initially, the system retrieves multiple chunks related to the query, but not all of them hold equal importance. Re-ranking reassesses this information using such factors as:

- Semantic relevance that determines how closely the data aligns with the query.

- Contextual fit provides information on how well the data integrates into the broader context.

By reorganizing the retrieved chunks, re-ranking pushes the most pertinent information to the top.

2. Contextual compression

To further improve the precision and clarity of the generated response contextual compression focuses only on what's crucial to answer the query and eliminates any extra noise. While contextual compression enhances conciseness and precision of the context, care must be taken to retain information critical for accurate responses.

3. Query refinement

A feedback loop improves the understanding of the user query. Dynamic query adjustment, based on the retrieved results, refines the query to better target relevant chunks. The refined query can also trigger another round of retrieval to fetch additional or more accurate information, a process referred to as iterative retrieval.

4. Deduplication and conflict resolution

When multiple chunks contain overlapping or contradictory information, such processes as deduplication identify and remove duplicate chunks, and conflict handling prioritizes the most reliable sources or combines data intelligently to resolve inconsistencies.

Advantages of Advanced RAG:

Advanced RAG offers improvements that enhance the quality and effectiveness of language model outputs compared to Naive RAG:

  • Improved relevance with re-ranking that prioritizes the most relevant information, providing more accurate and coherent responses.

  • Task-specific context with dynamic embeddings allows the system to better understand and respond to different queries by tailoring context to specific tasks.

  • Enhanced accuracy through hybrid search that combines multiple search strategies, and retrieves data more effectively.

  • Streamlined responses due to context compression, which removes unnecessary details and speeds up the process, providing more concise, high-quality answers.

  • A deeper understanding of user queries with techniques like query rewriting and expansion.

Advanced RAG represents a significant step forward by introducing additional refinement stages and addresses the limitations of Naive RAG.

advanced RAG diagram

Modular RAG

The modular RAG concept is an evolution from the advanced RAG framework with the incorporation of modules and techniques, which make the RAG structure more flexible. Some major improvements of modular RAG are as follows:

Enhanced System Architecture

Modular RAG introduces restructured components and rearranged pipelines to address specific challenges in traditional RAG setups. While it represents a leap in complexity and capability, it remains grounded in the core principles of Advanced and Naive RAG and presents a refined evolution within the RAG family.

New modules for improved functionality
  • Search module

Designed for targeted searches, this module allows retrieval from diverse data sources, such as databases, search engines, and knowledge graphs. Leveraging LLM-generated code and query languages enhances the retrieval process for specific scenarios.  

  • RAG-fusion

This process overcomes traditional search limitations through multi-query strategies. It broadens user queries into multiple perspectives, performs parallel vector searches, and uses intelligent re-ranking to uncover hidden knowledge.  

  • Memory module 

Uses the LLM’s memory to guide retrieval. By iteratively refining the context, it creates an evolving memory pool, aligning data more closely with text distribution for improved relevance.  

  • Routing module  

Directs queries through the most appropriate pathways based on their requirements, whether it's summarization, database searches, or merging data streams.  

  • Predict module

Generates relevant context directly from the LLM to reduce redundancy and noise, improving the precision and accuracy of results.  

  • Task adapter module  

Customizes the system for specific downstream tasks. It automates prompt generation for zero-shot queries and creates task-specific retrievers for few-shot learning scenarios, improving task-specific adaptability.  

New patterns for flexibility and scalability

Modular RAG is highly adaptable, with seamless substitution or reconfiguration of modules to address specific challenges. Unlike the fixed "Retrieve and read" mechanisms of Naive and Advanced RAG, Modular RAG supports flexible interaction flows and integrates new components as needed. Modular RAG represents a significant advancement in the retrieval-augmented generation addressing limitations in earlier RAG paradigms while expanding its potential across diverse applications. This evolution solidifies Modular RAG as the standard for building cutting-edge RAG systems.

Choosing the right architecture for projects

Aspect

Naive RAG

Advanced RAG

Modular RAG

Complexity

Simple architecture with minimal components and direct integration of retrieval and generation.

Adds additional layers of processing, such as query refinement, re-ranking, and context compression.

Highly flexible and adaptable with modular components, allowing for advanced techniques like memory modules, search modules, and task-specific customizations.

Performance

Limited performance relies heavily on retrieval quality without post-retrieval enhancements.

Improved performance through re-ranking, hybrid search, and contextual refinement.

Best performance due to specialized modules for specific tasks, including multi-query strategies and fusion techniques.

Relevance

Basic retrieval - lacks refinement in terms of relevance or coherence.

Significantly better relevance through dynamic embeddings, query rewriting, and re-ranking.

Exceptional relevance, with custom modules for domain adaptation, fine-tuned embeddings, and advanced routing strategies.

Flexibility

Rigid structure - not easily adaptable to domain-specific requirements.

More flexible than Naive RAG, with improved adaptability for general tasks.

Highly flexible - modular design supports easy reconfiguration, module substitution, and addition of new components for diverse use cases.

Scalability

Limited scalability due to inefficiencies in retrieval and generation as the dataset grows.

Better scalability with hybrid search and context compression.

Highly scalable, it's modular architecture makes efficient handling of large datasets and complex queries through specialized modules.

Best use cases

Simple projects requiring minimal setup and straightforward retrieval.

Projects needing more accurate and contextually relevant responses with moderate complexity.

Complex, large-scale projects with domain-specific requirements and a need for high customization, efficiency, and adaptability.

Naive RAG

Ideal for: Simple projects that need to be completed quickly, such as small-scale experiments where accuracy is not as important.

Examples: Information retrieval systems designed to answer frequently asked questions (FAQs).

Advanced RAG

Ideal for: Projects that require an intermediate level of accuracy and relevance. This model can enhance the quality of information retrieval through query reformulation and improved coherence.

Examples: Support chatbots where the responses must be nuanced, and educational systems where more context is required.

Modular RAG

Best for: Complex, domain-specific applications that demand advanced customizations and high efficiency and projects requiring scalability, multiple data source integrations, and tailored task handling.

Example use cases: Enterprise-level knowledge management systems or legal, healthcare, or financial applications requiring detailed domain adaptation, as well as research and development tools handling large-scale data with specialized retrieval needs.

The choice of architecture depends on the project's complexity, scale, and domain requirements.

  • For quick and simple solutions, go with Naive RAG;

  • For moderately complex projects, Advanced RAG offers a balanced approach;

  • For sophisticated, large-scale systems, Modular RAG provides unparalleled flexibility and performance.

Organizations should evaluate their use cases, resource availability and long-term scalability needs to select the most appropriate RAG architecture.

As companies worldwide are starting to wonder how LLMs can benefit their business, the question of where they excel the most arises. Thus, we have summed up a brief article on areas of excellence and ineptitude of Large Language Models.

A complete guide to how artificial intelligence is helping digital marketing specialists become more efficient.

Artificial intelligence is reshaping how the legal field is doing business. Learn how AI can improve workflows and save time and money for lawyers and their clients.

Retrieval-augmented generation (RAG) is a method that improves the precision and dependability of generative AI models by incorporating factual information from external data sources.

With the rise of no-code and low-code platforms, it may seem tempting to opt for ready-made solutions. But does it help?

Choosing the right collaboration approach when partnering with a tech vendor for custom software development can benefit your product by increasing productivity while reducing hiring costs.

The discovery phase of a software development project is the cornerstone for business success. Dive into the significance of the project discovery phase in the product development process.

Craft an experience that resonates with your audience.

Help your project succeed with an effective communication strategy.

You've probably heard the term "Jamstack" used a lot lately, so what does it mean? Jamstack is a modern web development architecture, designed to provide better performance, more security, cheaper scaling costs, and a smoother developer experience.

Revolutionize your animation game with Lottie, the free and easy-to-use open-source rendering tool.

Helping healthcare providers and patients stay on the same page.

Find out how retrieval-augmented generation evolved in the last few years and dive into the nuts and bolts of the three different RAGs: Naive RAG, Advanced RAG, and Modular RAG architectures.

Find out how Payload CMS speeds up the development process of not only websites, but also web apps without compromising on product quality!

If you're looking for a new way to think about your business, look into Jobs to be done.

A brief guide to progressive web applications.

Working with Payload has never been more comfortable! With the new release of Payload CMS 3.0 it has become Next.js native! You can easily install it in the Next.js app with a single line of code alongside your frontend. Read about what else is new in Payload 3.0 in our article.

We’re proud to be your go-to 5-star partner and an industry game-changer!

Making the right choice in software development.

Rive is a powerful animation tool that allows designers and developers collaborate efficiently to build interactive animations for virtually any platform.

Everything you need to know about web applications development.