Retrieval-Augmented Generation (RAG) extends Large Language Model capabilities by connecting them to external knowledge bases. Current implementations show query latency ranging from 100ms to several seconds, with retrieval accuracy varying significantly based on embedding quality and context window limitations. Storage requirements can reach multiple terabytes for comprehensive knowledge bases, while maintaining index freshness presents ongoing operational challenges.

The fundamental challenge lies in balancing retrieval accuracy and response latency while managing computational resources and maintaining data relevance.

This page brings together solutions from recent research—including hybrid vector-semantic search approaches, streaming retrieval architectures, efficient embedding techniques, and context optimization methods. These and other approaches focus on building practical, production-ready RAG systems that deliver reliable and timely responses.

1. Graph-Based Contextual Prompt Generation for Code Completion in Language Models

MICROSOFT TECHNOLOGY LICENSING LLC, 2025

Improving large language models like GPT3 for software engineering tasks like code completion by providing context-specific prompts that better reflect user intent. The technique involves generating code directives by traversing a graph representation of the user's source code. The graph shows usage and definition relationships between code elements. When a user queries the model, the graph is searched to find relevant nodes. Directives containing source code snippets and relationship descriptions are generated from these nodes. This customized prompt is then sent to the model for response generation.

2. Microservice Architecture for Language Models with Specialized Functional Segmentation

EDUWORKS CORP, 2025

Microservice architecture for language models that provides performance comparable to competing trillion parameter models on some tasks while using significantly less computational resources. The microservice architecture involves breaking down the language model into smaller, specialized microservices like expansion, retrieval, and data producers. These microservices leverage techniques like recursive retrieval augmentation and API-centric data access to expand and enhance client inputs. This allows leveraging smaller, more specialized models instead of massive monolithic ones, reducing computational burden.

US2025231934A1-patent-drawing

3. Microservice Architecture for Language Model Copilots with Input Expansion, Retrieval, and Core Output Generation Services

EDUWORKS CORP, 2025

Microservice architecture for language model copilots that provides cognitive functionality comparable to large language models like GPT-4 but with lower computational burden and fewer artifacts. The copilot is composed of multiple microservices, including an expansion service to augment client inputs, a retrieval service to fetch relevant documents, and a core service to generate outputs. The expansion and retrieval services expand and enrich the client input before passing it to the core service. This allows the core service to work with improved data for better performance. The microservice architecture enables leveraging specialized services for specific tasks instead of relying on a single large model.

US2025231973A1-patent-drawing

4. Natural Language Query Processing System Utilizing Task Decomposition, Knowledge Graph Entity Retrieval, and Vector-Based Text Chunk Search

INTERNATIONAL BUSINESS MACHINES CORP, 2025

Generating more accurate and efficient natural language responses to user queries using AI techniques like knowledge graphs, vector searches, and large language models (LLMs). The method involves breaking down user queries into tasks, searching a knowledge graph for relevant entities, then searching a vector database for text chunks associated with those entities. This refined search scope improves the accuracy and efficiency of the vector search. The LLM is then used to process the retrieved text chunks and generate a response.

US12346356B2-patent-drawing

5. A Systematic Review of Retrieval-Augmented Generation for Enhancing Domain-Specific Knowledge in Large Language Models

murtiyoso murtiyoso, imam tahyudin, berlilana berlilana - Politeknik Ganesha, 2025

This literature review examines the use of Retrieval-Augmented Generation (RAG) in enhancing Large Language Models (LLM) for domain-specific knowledge. RAG integrates retrieval techniques with generative models to access external knowledge sources, addressing limitations LLMs handling specialized information. By leveraging data, improves accuracy and relevance generated content, making it particularly useful fields that require detailed up-to-date highlights effectiveness overcoming challenges such as data sparsity dynamic nature Furthermore, discusses potential enhance LLM performance, scalability, ability generate contextually accurate responses knowledge-intensive applications. Key future research directions implementation are also identified.

6. Enhancing Document-Level Question Answeringvia Multi-Hop Retrieval-Augmented Generationwith LLaMA 3

xinyue huang, ziqi lin, fang sun, 2025

This paper presents a novel Retrieval-AugmentedGeneration (RAG) framework tailored for complex questionanswering tasks, addressing challenges in multi-hop reasoningand contextual understanding across lengthy documents. Builtupon LLaMA 3, the integrates dense retrievalmodule with advanced context fusion and reasoningmechanisms, enabling more accurate coherent responsegeneration. A joint optimization strategy combining retrievallikelihood generation cross-entropy improves modelsrobustness adaptability. Experimental results show that theproposed system outperforms existing retrieval-augmented andgenerative baselines, confirming its effectiveness deliveringprecise, contextually grounded answers.

7. Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation

stefan wagenpfeil - Multidisciplinary Digital Publishing Institute, 2025

Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails capture complex semantic structures, relational dependencies, multimodal content. In this paper, we introduce Graph Codesa matrix-based encoding Multimedia Feature Graphsas an alternative retrieval paradigm. Codes preserve topology explicitly entities their typed relationships from documents, enabling structure-aware interpretable retrieval. We evaluate our system in two domains: scene understanding (200 annotated image-question pairs) clinical question answering (150 real-world medical queries with 10,000 structured knowledge snippets). Results show that method outperforms baselines precision (+915%), reduces hallucination rates over 30%, yields higher expert-rated answer quality. Theoretically, work demonstrates symbolic similarity graphs provides more faithful alignment mechanism t... Read More

8. Conversational Agent for Medical Question-Answering Using RAG and LLM

la ode muhammad yudhy prayitno, annisa nurfadilah, septiyani bayu saudi - Ioinformatic, 2025

This study analyzes the application of RAG concept alongside an LLM in context PubMed QA data to augment question-answering capabilities medical context. For answering questions relevant private healthcare institutions, Mistral 7B model was utilized. To limit hallucinations, embedding used for document indexing, ensuring that answers based on provided information. The analysis conducted using five models, two which are specialized PubMedBERT-base and BioLORD-2023, as well three general GIST-large-Embedding-v0, blade-embed-kd, all-MiniLM-L6-v2. As results showed, models performed better than domain specific especially GIST-large-Embedding-v0 b1ade-embed-kd, underscores dominance general-purpose training datasets terms fundamental semantic retrieval, even domains. outcome this research demonstrates applying locally can safeguard privacy while still responding queries with appropriate precision, thus establishing a foundation dependable system.

9. Large Language Model Processing with Iterative Attention Focusing and Context-Optimized Retrieval Techniques

VIJAY MADISETTI, 2025

Enhancing attention span of large language models (LLMs) when processing long documents and improving Retrieval-Augmented Generation (RAG) through context-optimized retrieval techniques. The methods involve iterative attention focusing, context-aware document processing, intelligent information retrieval, adaptive response generation, and dynamic adjustment of LLM's attention focus. It also uses functional abstraction through hierarchical tokens to improve context management and semantic coherence.

10. Combining Retrieval-Augmented Generation and Fine-tuning of Large Language Models to Enhance Port Industry Question-Answering Systems

xiao hu, mideth abisado - Whioce Publishing Pte Ltd., 2025

In this research, we develop a new hybrid architecture that combines Retrieval-Augmented Generation (RAG) and LLMs (Large Language Models) in order to address the specific gaps domain question answering systems for maritime port industry. Our approach mitigates generic LLMs limitations concerning domain-specific queries through combination of knowledge retrieval industry adaptive modelling with implemented parameters. The overarching evaluation protocol designed investigating was both quantitative qualitative by using expert judgement which showed marked improvement justifiable gains across multi-dimensional stand-alone approaches regarding factual correctness, accuracy use terms, compliance relevant policies. system achieved 23% nDCG@5 scores alongside exceeding 90% terminology used context, maintaining sub-second response times under typical operational loads. experts consulted study were particularly impressed balance struck between precision contextual understanding complex scenarios. Such enables decision-makers critical environments greatly trust within their active contexts... Read More

11. Automated Cybersecurity System with Large Language Models and Retrieval-Augmented Generation for Suspicious Activity Report Generation

CITIBANK NA, 2025

Automated system for generating suspicious activity reports (SARs) in cybersecurity using large language models (LLMs) and retrieval-augmented generation (RAG). The system improves SAR generation accuracy and consistency by leveraging LLMs and RAG to incorporate more relevant data into SARs. The system obtains event data, extracts features, queries historical data with similar features, composes the events, prompts the LLM, and generates the SAR. This provides the LLM with more contextualized data to generate accurate SARs without needing extensive training on SARs themselves.

12. Language Model Output Enhancement via Structured Universal Language Integration

UNLIKELY ARTIFICIAL INTELLIGENCE LTD, 2025

Improving the output of large language models (LLMs) like GPT-3 by using a structured, machine-readable language like Universal Language (UL) to provide new context data for the LLM. This allows the LLM to generate more accurate and improved continuation text output in response to prompts. The UL representation allows more expressive and detailed meaning representation compared to natural language. The LLM uses UL output as prompts, and the UL processing system analyzes and improves the LLM's output. This provides more accurate and reliable LLM output.

US12321697B2-patent-drawing

13. IMPLEMENTING AND ASSESSING RETRIEVAL AUGMENTED GENERATION (RAG) FOR LLM-BASED DOCUMENTS QUERIES

peter kaczmarski, fernand vandamme - Routledge, 2025

In recent years, AI-related technology referred to as RAG (Retrieval Augmented Generation) (Lewis, 2020) gained a lot of attention. the RAG-approach, custom sources information are used seed knowledge obtained from LLM (Large Language Model), thus forming an approach which solves issue adapting cope with external information. Using RAGscenario, various processing use cases can be implemented, such AI-based document management, AI-enhanced web search, online service support, etc. This paper outlines main components RAG-workflow chunking and embedding input documents, well similarity -based user query processing. The is illustrated via Python implementation validate procedure simple example multi-topic document. Experimental results discussed showing feasibility this approach, illustrating need for further research enhancements, by RAPTOR concept (Sarthi, 2024).

14. SG-RAG MOT: SubGraph Retrieval Augmented Generation with Merging and Ordering Triplets for Knowledge Graph Multi-hop Question Answering

anwar saleh, gokhan tur, yucel saygin, 2025

Large Language Models (LLMs) often tend to hallucinate, especially on domain-specific tasks and that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel GraphRAG method for multi-hop question answering. SG-RAG leverages Cypher queries search the given knowledge graph retrieve necessary subgraph answer question. The results from our previous work showed higher performance of compared traditional (RAG). In this work, further enhance by proposing an additional step called Merging Ordering Triplets (MOT). new MOT seeks decrease redundancy in retrieved triplets applying hierarchical merging subgraphs. Moreover, it provides ordering among using Breadth First Search (BFS) traversal algorithm. We conducted experiments MetaQA benchmark, which is proposed question-answering movies domain. Our show more accurate answers than Chain-of-Though Graph Chain-of-Though. also find out (up some point) highly overlapping subgraphs defining order helps LLM generate precise answers.

15. Retrieval-Augmented Generation System with Reconfigurable Ranker Sequence and Self-Rewarding Optimization Techniques

GOLDMAN SACHS & CO LLC, 2025

Optimizing retrieval-augmented generation (RAG) systems with a reconfigurable sequence of rankers in the retriever model to improve the quality of information chunks provided to the generative model. The rankers in the reconfigurable sequence are bi-encoders, cross-encoders, and an LLM-ranker. The rankers identify relevant chunks from documents for user queries. This allows more-relevant information chunks to be provided to the generative model, increasing output quality. The rankers can be reconfigured to optimize chunk selection. Additionally, self-rewarding optimization techniques are provided to train the entire RAG system using rewards based on generated responses.

16. Retrieval Augmented Generation System with Multi-Query Expansion and Vector Search Fusion

ELSEVIER INC, 2025

Retrieval augmented generation (RAG) system that improves search results by generating multiple natural language queries from a user's input, performing vector searches on both the user queries and the generated queries, and fusing the results to compile a more comprehensive and accurate search result. The system involves inputting user queries into a large language model to generate distinct associated queries, performing vector searches on all queries, compiling the results into a fused search, and summarizing the fused results. This bridges the gap between explicit and implicit search intent, leveraging the language model to expand queries and the vector search to fuse results.

17. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain

simon knollmeyer, oguz caymazer, daniel grossmann - Multidisciplinary Digital Publishing Institute, 2025

Retrieval-Augmented Generation (RAG) systems have shown significant potential for domain-specific Question Answering (QA) tasks, although persistent challenges in retrieval precision and context selection continue to hinder their effectiveness. This study introduces Document Graph RAG (GraphRAG), a novel framework that bolsters robustness enhances answer generation by incorporating Knowledge Graphs (KGs) built upon documents intrinsic structure into the pipeline. Through application of Design Science Research methodology, we systematically design, implement, evaluate GraphRAG, leveraging graph-based document structuring keyword-based semantic linking mechanism improve quality. The evaluation, conducted on well-established datasets including SQuAD, HotpotQA, newly developed manufacturing dataset, demonstrates consistent performance gains over naive baseline across both metrics. results indicate GraphRAG improves Context Relevance metrics, with task-dependent optimizations chunk size, keyword density, top-k further enhancing performance. Notably, multi-hop questions benefit most fro... Read More

18. Integrating pre-trained LLMS with RAG for efficient content retrieval

tran trong kien, khau van bich - Lac Hong University, 2025

Large Language Models (LLMs) are highly effective at replicating human tasks and boosting productivity but face challenges in accurate data extraction due to prioritizing fluency over factual precision. Researchers addressing these limitations by combining LLMs with Retrieval-Augmented Generation (RAG) models. This approach utilizes chunking, searching, ranking algorithms streamline retrieval from unstructured text, improving LLMs precision processing. The findings provide key insights into optimizing chunking strategies set the stage for advancement broader application of RAG-enhanced systems.

19. A Comprehensive Evaluation of Embedding Models and LLMs for IR and QA Across English and Italian

ermelinda oro, francesco granata, massimo ruffolo - Multidisciplinary Digital Publishing Institute, 2025

This study presents a comprehensive evaluation of embedding techniques and large language models (LLMs) for Information Retrieval (IR) question answering (QA) across languages, focusing on English Italian. We address significant research gap by providing empirical evidence model performance linguistic boundaries. evaluate 12 diverse IR datasets, including Italian SQuAD DICE, SciFact, ArguAna, NFCorpus. assess four LLMs (GPT4o, LLama-3.1 8B, Mistral-Nemo, Gemma-2b) QA tasks within retrieval-augmented generation (RAG) pipeline. them SQuAD, CovidQA, NarrativeQA cross-lingual scenarios. The results show multilingual perform more competitively than language-specific ones. embed-multilingual-v3.0 achieves top nDCG@10 scores 0.90 0.86 In evaluation, Mistral-Nemo demonstrates superior answer relevance (0.911.0) while maintaining strong groundedness (0.64-0.78). Our analysis reveals three key findings: (1) effectively bridge gaps between Italian, though consistency decreases in specialized domains, (2) size does not consistently predict performance, (3) all evaluated systems exhibit critic... Read More

20. AI-Based Search System with Intelligent Agents and Dynamic Knowledge Base Construction Using Retrieval Augmented Generation

ARTI ANALYTICS INC, 2025

An AI-powered search system that uses intelligent agents to find curated, directly useful information from disparate sources like websites, documents, and databases. The system uses AI techniques like Retrieval Augmented Generation (RAG) to analyze search queries and dynamically build a knowledge base of indexed information. It enables features like automated form filling, restricted access, and personalization. The system also learns and grows its knowledge base based on user confirmations and AI processing.

US12306834B1-patent-drawing

21. Next Sentence Prediction with BERT as a Dynamic Chunking Mechanism for Retrieval-Augmented Generation Systems

alexandre thurow bender, gabriel gomes, ulisses brisolara correa - George A. Smathers Libraries, 2025

Retrieval-Augmented Generation systems enhance the generative capabilities of large language models by grounding their responses in external knowledge bases, addressing some major limitations and improving reliability for tasks requiring factual accuracy or domain-specific information. Chunking is a critical step pipelines, where text divided into smaller segments to facilitate efficient retrieval optimize use model context. This paper introduces method that uses BERT's Next Sentence Prediction adaptively merge related sentences context-aware chunks. We evaluate approach on SQuAD v2 dataset, comparing it standard chunking methods using Recall@k, Precision@k, Contextual-Precision@k, processing time as metrics. Results indicate proposed achieves competitive performance while reducing computational roughly 60%, demonstrating its potential improve systems.

22. Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation

matthew boraske, richard burns - George A. Smathers Libraries, 2025

In this paper, we investigate whether pre-trained large language models (LLMs) can align with human moral judgments on a dataset of approximately fifty thousand interpersonal conflicts from the AITA (Am I A******) subreddit, an online forum where users evaluate morality others. We introduce retrieval-augmented generation (RAG) approach that uses LLMs as core components. After collecting conflict posts and embedding them in vector database, RAG agent retrieves most relevant for each new query. Then, these are used sequentially context to gradually refine LLM's judgment, providing adaptability without having undergo costly fine-tuning. Using OpenAI's GPT-4o, our outperforms directly prompting LLM while achieving 83\% accuracy Matthews correlation coefficient 0.469 also reducing rate toxic responses 22.53\% virtually zero. These findings indicate integration into agents is effective method improve their alignment mitigating language.

23. Retrieval Augmented Generation: What Works and Lessons Learned

peter l elkin, gaurang mehta, frank lehouillier - IOS Press, 2025

Retrieval Augmented Generation has been shown to improve the output of large language models (LLMs) by providing context question or scenario posed model. We have tried a series experiments understand how best performance native models. present results each several experiments. These can serve as lessons learned for scientists looking medical answering tasks.

24. Enhancing Large Language Models for Specialized Domains: A Two-Stage Framework with Parameter-Sensitive LoRA Fine-Tuning and Chain-of-Thought RAG

yao he, xuanbing zhu, donghan li - Multidisciplinary Digital Publishing Institute, 2025

Large language models (LLMs) have shown impressive general-purpose capabilities, but their application in specialized domains such as healthcare and law remains limited due to two major challenges, namely, a lack of deep domain-specific knowledge the inability incorporate real-time information updates. This paper focuses on addressing these challenges by introducing parameter-sensitive low-rank adaptation (LoRA) retrieval-augmented generation (RAG), named SensiLoRA-RAG, two-stage framework designed enhance LLM performance question-answering tasks. In first stage, we propose LoRA fine-tuning method that efficiently adapts LLMs using high-quality professional data, enabling rapid resource-efficient specialization. second develop chain-of-thought RAG mechanism dynamically retrieves integrates up-to-date external knowledge, improving models ability reason with current complex domain context. We evaluate our tasks medical legal fields, demonstrating SensiLoRA-RAG significantly improves answer accuracy, relevance, adaptability compared baseline methods.

25. Building an LLM Agent for Life Sciences Literature QA and Summarization

nishanth joseph paulraj - GSC Online Press, 2025

This article explores the development of a specialized artificial intelligence agent that combines Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) techniques to address challenges biomedical literature search and synthesis. The unprecedented growth published research in life sciences has created an information crisis traditional methods cannot effectively manage. Researchers face significant including overwhelming volume, domain-specific terminology barriers, difficulty making cross-study connections, severe time constraints. proposed LLM+RAG architecture offers comprehensive solution featuring document processing for scientific papers, biomedical-specific vector embeddings, advanced retrieval strategies, sophisticated reasoning capabilities. system integrates PubMed other databases while providing natural language interfaces significantly reduce cognitive burden researchers. Domain-specific optimizations such as entity recognition, relationship extraction, embeddings further enhance performance across diverse scenarios. Evaluation through benchmark testing, ex... Read More

26. Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions

michael a oumano, shawn m pickett - Society of Nuclear Medicine and Molecular Imaging, 2025

This study investigated the application of large language models (LLMs) with and without retrieval-augmented generation (RAG) in nuclear medicine, particularly their performance across various topics relevant to field, evaluate potential use as reliable tools for professional education clinical decision-making. Methods: We evaluated LLMs, including OpenAI GPT-4o series, Google Gemini, Cohere, Anthropic, Meta Llama3, 15 medicine topics. The models' accuracy was assessed using a set 600 sample questions, covering range technical domains medicine. Overall measured by averaging these Additional comparisons were conducted individual models. Results: OpenAI's models, openai_nvidia_gpt-4o_final openai_mxbai_gpt-4o_final, demonstrated highest overall accuracy, achieving scores 0.787 0.783, respectively, when RAG implemented. Anthropic Opus Gemini 1.5 Pro followed closely, competitive 0.773 0.750 RAG. Cohere Llama3 showed more variability performance, ollama_llama3 model (without RAG) lowest accuracy. Discrepancies noted question interpretation, complex guidelines imaging-based queries. Concl... Read More

27. Enhancing Large Language Model Performance on ENEM Math Questions Using Retrieval-Augmented Generation

João Superbi, H. Sofia Pinto, Emanoel Santos - Sociedade Brasileira de Computação - SBC, 2024

In this study, we explore the use of Retrieval-Augmented Generation (RAG) to improve the performance of large language models (LLMs), such as GPT-3.5 Turbo and GPT-4o, in solving ENEM mathematics questions. Our experiments demonstrate that RAG potentially provides significant improvements in accuracy by introducing relevant contextual information. With RAG, GPT-4o consistently outperforms GPT-3.5 Turbo, underscoring the potential of this technique to enhance educational AI tools. This research illustrates the potential of RAG-enhanced LLMs to advance educational applications and encourages further exploration in this field.

28. Automating Systematic Literature Reviews with Retrieval-Augmented Generation: A Comprehensive Overview

Binglan Han, Teo Sušnjak, Anuradha Mathrani - MDPI AG, 2024

This study examines Retrieval-Augmented Generation (RAG) in large language models (LLMs) and their significant application for undertaking systematic literature reviews (SLRs). RAG-based LLMs can potentially automate tasks like data extraction, summarization, and trend identification. However, while LLMs are exceptionally proficient in generating human-like text and interpreting complex linguistic nuances, their dependence on static, pre-trained knowledge can result in inaccuracies and hallucinations. RAG mitigates these limitations by integrating LLMs generative capabilities with the precision of real-time information retrieval. We review in detail the three key processes of the RAG frameworkretrieval, augmentation, and generation. We then discuss applications of RAG-based LLMs to SLR automation and highlight future research topics, including integration of domain-specific LLMs, multimodal data processing and generation, and utilization of multiple retrieval sources. We propose a framework of RAG-based LLMs for automating SRLs, which covers four stages of SLR process: literature s... Read More

29. Domain-Driven LLM Development: Insights into RAG and Fine-Tuning Practices

J.C. Santos, Rachel Hu, Richard Song - ACM, 2024

To improve Large Language Model (LLM) performance on domain specific applications, ML developers often leverage Retrieval Augmented Generation (RAG) and LLM Fine-Tuning. RAG extends the capabilities of LLMs to specific domains or an organization's internal knowledge base, without the need to retrain the model. On the other hand, Fine-Tuning approach updates LLM weights with domain-specific data to improve performance on specific tasks. The fine-tuned model is particularly effective to systematically learn new comprehensive knowledge in a specific domain that is not covered by the LLM pre-training. This tutorial walks through the RAG and Fine-Tuning techniques, discusses the insights of their advantages and limitations, and provides best practices of adopting the methodologies for the LLM tasks and use cases. The hands-on labs demonstrate the advanced techniques to optimize the RAG and fine-tuned LLM architecture that handles domain specific LLM tasks. The labs in the tutorial are designed by using a set of open-source python libraries to implement the RAG and fine-tuned LLM architect... Read More

30. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

Wenqi Fan, Yujuan Ding, Liangbo Ning - ACM, 2024

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the quality of the generated content of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: Furthermore, to deliver deep... Read More

31. Pencils Down! Automatic Rubric-based Evaluation of Retrieve/Generate Systems

Naghmeh Farzi, Laura Dietz - ACM, 2024

Current IR evaluation paradigms are challenged by large language models (LLMs) and retrieval-augmented generation (RAG) methods. Furthermore, evaluation either resorts to expensive human judgments or lead to an over-reliance on LLMs.

32. In-Context Learning for Scalable and Online Hallucination Detection in RAGS

Nicolò Cosimo Albanese - Academy & Industry Research Collaboration Center, 2024

Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other prompt-based and semantic similarity baselines from recent literature, our method improves hallucination detection F1 scores by at least 11%, with consistent performance across different models. This research offers a practical solution for real-time validation of response accuracy in RAG systems, fostering responsible adoption, especially in critical domains where document fidelity is paramount.

33. Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Rongsheng Li, Jin Xu, Zhixiong Cao - MDPI AG, 2024

In the realm of large language models (LLMs), extending the context window for long text processing is crucial for enhancing performance. This paper introduces SBA-RoPE (Segmented Base Adjustment for Rotary Position Embeddings), a novel approach designed to efficiently extend the context window by segmentally adjusting the base of rotary position embeddings (RoPE). Unlike existing methods, such as Position Interpolation (PI), NTK, and YaRN, SBA-RoPE modifies the base of RoPE across different dimensions, optimizing the encoding of positional information for extended sequences. Through experiments on the Pythia model, we demonstrate the effectiveness of SBA-RoPE in extending context windows, particularly for texts exceeding the original training lengths. We fine-tuned the Pythia-2.8B model on the PG-19 dataset and conducted passkey retrieval and perplexity (PPL) experiments on the Proof-pile dataset to evaluate model performance. Results show that SBA-RoPE maintains or improves model performance when extending the context window, especially on longer text sequences. Compared to other m... Read More

34. Benchmarking Large Language Models in Retrieval-Augmented Generation

Jiawei Chen, Hongyu Lin, Xianpei Han - Association for the Advancement of Artificial Intelligence (AAAI), 2024

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese. RGB divides the instances within the benchmark into 4 separate testbeds based on the aforementioned fundamental abilities required to resolve the case. Then we evaluate 6 representative LLMs on RGB to diagnose the challenges of current LLMs when applying RAG. Eva... Read More

35. InspectorRAGet: An Introspection Platform for RAG Evaluation

Kshitij P. Fadnis, Siva Sankalp Patel, Odellia Boni, 2024

Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and metrics. In spite of increased recognition of the need for rigorous evaluation of RAG systems, few tools exist that go beyond the creation of model output and automatic calculation. We present InspectorRAGet, an introspection platform for RAG evaluation. InspectorRAGet allows the user to analyze aggregate and instance-level performance of RAG systems, using both human and algorithmic metrics as well as annotator quality. InspectorRAGet is suitable for multiple use cases and is available publicly to the community. The demo video is available at https://youtu.be/MJhe8QIXcEc

36. BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

David Rau, Hervé Déjean, Nadezhda Chirkova, 2024

Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, metrics, retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline. In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments. In an extensive study focusing on QA, we benchmark different state-of-the-art retrievers, rerankers, and LLMs. Additionally, we analyze existing RAG metrics and datasets. Our open-source library BERGEN is available under \url{https://github.com/naver/bergen}.

37. Retrieval-Augmented Generation for Natural Language Processing: A Survey

Shangyu Wu, Ying Xiong, Yufei Cui, 2024

Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database to augment LLMs, makes up those drawbacks of LLMs. This paper reviews all significant techniques of RAG, especially in the retriever and the retrieval fusions. Besides, tutorial codes are provided for implementing the representative techniques in RAG. This paper further discusses the RAG training, including RAG with/without datastore update. Then, we introduce the application of RAG in representative natural language processing tasks and industrial scenarios. Finally, this paper discusses the future directions and challenges of RAG for promoting its development.

38. Retrieval-Augmented Generation in Large Language Models through Selective Augmentation

Joao Quintela, Marquinhos Sapateiro - Springer Science and Business Media LLC, 2024

<title>Abstract</title> The increasing complexity and demands of natural language processing tasks have driven the need for more advanced and contextually aware language models. The integration of selective augmentation within Retrieval-Augmented Generation (RAG) frameworks represents a significant advancement, enhancing the relevance and accuracy of generated responses by dynamically incorporating pertinent information during inference. This research carefully developed and implemented a selective augmentation algorithm tailored to GPT-Neo, demonstrating substantial improvements in performance metrics such as BLEU, ROUGE, and F1 scores. Data preprocessing and model fine-tuning were conducted rigorously, ensuring a robust foundation for the selective augmentation mechanism. Experimental results confirmed that the enhanced RAG model not only provided more accurate and contextually relevant responses but also exhibited superior coherence compared to the baseline model. The implications of these findings are profound, suggesting that selective augmentation can significantly elevate the ... Read More

39. Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems

Róbert Lakatos, Péter Pollner, András Hajdú, 2024

The development of generative large language models (G-LLM) opened up new opportunities for the development of new types of knowledge-based systems similar to ChatGPT, Bing, or Gemini. Fine-tuning (FN) and Retrieval-Augmented Generation (RAG) are the techniques that can be used to implement domain adaptation for the development of G-LLM-based knowledge systems. In our study, using ROUGE, BLEU, METEOR scores, and cosine similarity, we compare and examine the performance of RAG and FN for the GPT-J-6B, OPT-6.7B, LlaMA, LlaMA-2 language models. Based on measurements shown on different datasets, we demonstrate that RAG-based constructions are more efficient than models produced with FN. We point out that connecting RAG and FN is not trivial, because connecting FN models with RAG can cause a decrease in performance. Furthermore, we outline a simple RAG-based architecture which, on average, outperforms the FN models by 16% in terms of the ROGUE score, 15% in the case of the BLEU score, and 53% based on the cosine similarity. This shows the significant advantage of RAG over FN in terms of h... Read More

40. FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

Jiajie Jin, Yutao Zhu, Xinyu Yang, 2024

With the advent of Large Language Models (LLMs), the potential of Retrieval Augmented Generation (RAG) techniques have garnered considerable research attention. Numerous novel algorithms and models have been introduced to enhance various aspects of RAG systems. However, the absence of a standardized framework for implementation, coupled with the inherently intricate RAG process, makes it challenging and time-consuming for researchers to compare and evaluate these approaches in a consistent environment. Existing RAG toolkits like LangChain and LlamaIndex, while available, are often heavy and unwieldy, failing to meet the personalized needs of researchers. In response to this challenge, we propose FlashRAG, an efficient and modular open-source toolkit designed to assist researchers in reproducing existing RAG methods and in developing their own RAG algorithms within a unified framework. Our toolkit implements 12 advanced RAG methods and has gathered and organized 32 benchmark datasets. Our toolkit has various features, including customizable modular framework, rich collection of pre-im... Read More

41. Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

B Gautam, Anupam Purwar, 2024

This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational contexts. This study examines various open-source LLMs, explores their integration into RAG frameworks using enterprise-specific data, and assesses the performance of different open-source embeddings in enhancing the retrieval and generation process. Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems, offering a viable alternative to proprietary solutions for enterprises.

42. Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation

Zheng Liu, Changxu Wu, Ninglu Shao, 2024

The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial computational overhead. On the other hand, directly using generic Large Language Models (LLMs) often leads to sub-optimal answers, while task-specific fine-tuning may compromise the LLMs' general capabilities. To address these challenges, we introduce a novel approach called FlexRAG (Flexible Context Adaptation for RAG). In this approach, the retrieved contexts are compressed into compact embeddings before being encoded by the LLMs. Simultaneously, these compressed embeddings are optimized to enhance downstream RAG performance. A key feature of FlexRAG is its flexibility, which enables effective support for diverse compression ratios and selective preservation of important contexts. Thanks to these technical designs, FlexRAG achieves superior generation quality while significantly reducing running costs. Comprehensive experiments on vario... Read More

43. A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models

Yujuan Ding, Wenqi Fan, Liangbo Ning, 2024

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) techniques can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-generated content (AIGC), the powerful capacity of retrieval in RAG in providing additional knowledge enables retrieval-augmented generation to assist existing generative AI in producing high-quality outputs. Recently, large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, retrieval-augmented large language models have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in retrieval-augmented large language models (RA-LLMs), covering ... Read More

44. One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models

Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, 2024

Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved information or directly fine-tune the LLMs to adapt to RAG scenarios. Although fine-tuning can yield better performance, it often compromises the LLMs' general generation capabilities by modifying their parameters. This limitation poses challenges in practical applications, especially when LLMs are already deployed, as parameter adjustments may affect their original functionality. To address this, we propose a novel method that involves learning scalable and pluggable virtual tokens for RAG. By maintaining the LLMs' original parameters and fine-tuning only the embeddings of these pluggable tokens, our approach not only enhances LLMs' performance but also preserves their general generation capacities. Furthermore, we design several training strategies to improve the scalability, flexibility, and generalizability of our method. Comprehensive experiment... Read More

45. RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, 2024

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as... Read More

46. Wiping out the limitations of Large Language Models -- A Taxonomy for Retrieval Augmented Generation

Mahei Manhai Li, Irina Nikishina, Özge Sevgili, 2024

Current research on RAGs is distributed across various disciplines, and since the technology is evolving very quickly, its unit of analysis is mostly on technological innovations, rather than applications in business contexts. Thus, in this research, we aim to create a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define RAG applications, facilitating the adoption of this technology in the IS community. To the best of our knowledge, no RAG application taxonomies have been developed so far. We describe our methodology for developing the taxonomy, which includes the criteria for selecting papers, an explanation of our rationale for employing a Large Language Model (LLM)-supported approach to extract and identify initial characteristics, and a concise overview of our systematic process for conceptualizing the taxonomy. Our systematic taxonomy development process includes four iterative phases designed to refine and enhance our understanding and presentation of RAG's core dimensions. We have developed a total of five meta-dimensions and sixte... Read More

47. Navigating the Present: Exploring Practical Horizons of Retrieval-Augmented Generation (RAG)

Amir Aryani - Front Matter, 2024

Authors: Hui Yin, Amir&amp;nbsp;Aryani As we discussed in our previous article A Brief Introduction to Retrieval Augmented Generation (RAG), RAG is an artificial intelligence framework that incorporates the latest reliable external knowledge and aims to improve the quality of responses generated by pre-trained language models (PLM). Initially, it was designed to improve the performance of knowledge-intensive NLP tasks (Lewis et al., 2020). As more

48. Introducing a new hyper-parameter for RAG: Context Window Utilization

Kush Juvekar, Anupam Purwar, 2024

This paper introduces a new hyper-parameter for Retrieval-Augmented Generation (RAG) systems called Context Window Utilization. RAG systems enhance generative models by incorporating relevant information retrieved from external knowledge bases, improving the factual accuracy and contextual relevance of generated responses. The size of the text chunks retrieved and processed is a critical factor influencing RAG performance. This study aims to identify the optimal chunk size that maximizes answer generation quality. Through systematic experimentation, we analyze the effects of varying chunk sizes on the efficiency and effectiveness of RAG frameworks. Our findings reveal that an optimal chunk size balances the trade-off between providing sufficient context and minimizing irrelevant information. These insights are crucial for enhancing the design and implementation of RAG systems, underscoring the importance of selecting an appropriate chunk size to achieve superior performance.

49. A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Florin Cuconasu, Giovanni Trappolini, Nicola Tonellotto, 2024

Retrieval Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by large language models (LLMs). The current common practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. Contrary to popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more nuanced situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".

50. Efficient Usage of RAG Systems in the World of LLMs

Priyank Rathod - Institute of Electrical and Electronics Engineers (IEEE), 2024

The integration of Retrieval-Augmented Generation (RAG) systems with Large Language Models (LLMs) has revolutionized the field of Natural Language Processing (NLP). By leveraging RAG techniques, LLMs can access a broader range of information, improve coherence, and enhance the relevance of generated text. This paper explores the efficient usage of RAG systems in LLMs, highlighting their benefits, applications, and future implications.

51. Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

52. Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

53. FIT-RAG: Black-Box RAG with Factual Information and Token Reduction

54. Enhanced document retrieval with topic embeddings

55. Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications

Get Full Report

Access our comprehensive collection of 108 documents related to this technology