Retrieval-Augmented Generation (RAG) extends Large Language Model capabilities by connecting them to external knowledge bases. Current implementations show query latency ranging from 100ms to several seconds, with retrieval accuracy varying significantly based on embedding quality and context window limitations. Storage requirements can reach multiple terabytes for comprehensive knowledge bases, while maintaining index freshness presents ongoing operational challenges.

The fundamental challenge lies in balancing retrieval accuracy and response latency while managing computational resources and maintaining data relevance.

This page brings together solutions from recent research—including hybrid vector-semantic search approaches, streaming retrieval architectures, efficient embedding techniques, and context optimization methods. These and other approaches focus on building practical, production-ready RAG systems that deliver reliable and timely responses.

1. SG-RAG MOT: SubGraph Retrieval Augmented Generation with Merging and Ordering Triplets for Knowledge Graph Multi-hop Question Answering

anwar saleh, gokhan tur, yucel saygin, 2025

Large Language Models (LLMs) often tend to hallucinate, especially on domain-specific tasks and that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel GraphRAG method for multi-hop question answering. SG-RAG leverages Cypher queries search the given knowledge graph retrieve necessary subgraph answer question. The results from our previous work showed higher performance of compared traditional (RAG). In this work, further enhance by proposing an additional step called Merging Ordering Triplets (MOT). new MOT seeks decrease redundancy in retrieved triplets applying hierarchical merging subgraphs. Moreover, it provides ordering among using Breadth First Search (BFS) traversal algorithm. We conducted experiments MetaQA benchmark, which is proposed question-answering movies domain. Our show more accurate answers than Chain-of-Though Graph Chain-of-Though. also find out (up some point) highly overlapping subgraphs defining order helps LLM generate precise answers.

2. Retrieval-Augmented Generation System with Reconfigurable Ranker Sequence and Self-Rewarding Optimization Techniques

GOLDMAN SACHS & CO LLC, 2025

Optimizing retrieval-augmented generation (RAG) systems with a reconfigurable sequence of rankers in the retriever model to improve the quality of information chunks provided to the generative model. The rankers in the reconfigurable sequence are bi-encoders, cross-encoders, and an LLM-ranker. The rankers identify relevant chunks from documents for user queries. This allows more-relevant information chunks to be provided to the generative model, increasing output quality. The rankers can be reconfigured to optimize chunk selection. Additionally, self-rewarding optimization techniques are provided to train the entire RAG system using rewards based on generated responses.

3. Retrieval Augmented Generation System with Multi-Query Expansion and Vector Search Fusion

ELSEVIER INC, 2025

Retrieval augmented generation (RAG) system that improves search results by generating multiple natural language queries from a user's input, performing vector searches on both the user queries and the generated queries, and fusing the results to compile a more comprehensive and accurate search result. The system involves inputting user queries into a large language model to generate distinct associated queries, performing vector searches on all queries, compiling the results into a fused search, and summarizing the fused results. This bridges the gap between explicit and implicit search intent, leveraging the language model to expand queries and the vector search to fuse results.

4. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain

simon knollmeyer, oguz caymazer, daniel grossmann - Multidisciplinary Digital Publishing Institute, 2025

Retrieval-Augmented Generation (RAG) systems have shown significant potential for domain-specific Question Answering (QA) tasks, although persistent challenges in retrieval precision and context selection continue to hinder their effectiveness. This study introduces Document Graph RAG (GraphRAG), a novel framework that bolsters robustness enhances answer generation by incorporating Knowledge Graphs (KGs) built upon documents intrinsic structure into the pipeline. Through application of Design Science Research methodology, we systematically design, implement, evaluate GraphRAG, leveraging graph-based document structuring keyword-based semantic linking mechanism improve quality. The evaluation, conducted on well-established datasets including SQuAD, HotpotQA, newly developed manufacturing dataset, demonstrates consistent performance gains over naive baseline across both metrics. results indicate GraphRAG improves Context Relevance metrics, with task-dependent optimizations chunk size, keyword density, top-k further enhancing performance. Notably, multi-hop questions benefit most fro... Read More

5. Integrating pre-trained LLMS with RAG for efficient content retrieval

tran trong kien, khau van bich - Lac Hong University, 2025

Large Language Models (LLMs) are highly effective at replicating human tasks and boosting productivity but face challenges in accurate data extraction due to prioritizing fluency over factual precision. Researchers addressing these limitations by combining LLMs with Retrieval-Augmented Generation (RAG) models. This approach utilizes chunking, searching, ranking algorithms streamline retrieval from unstructured text, improving LLMs precision processing. The findings provide key insights into optimizing chunking strategies set the stage for advancement broader application of RAG-enhanced systems.

6. A Comprehensive Evaluation of Embedding Models and LLMs for IR and QA Across English and Italian

ermelinda oro, francesco granata, massimo ruffolo - Multidisciplinary Digital Publishing Institute, 2025

This study presents a comprehensive evaluation of embedding techniques and large language models (LLMs) for Information Retrieval (IR) question answering (QA) across languages, focusing on English Italian. We address significant research gap by providing empirical evidence model performance linguistic boundaries. evaluate 12 diverse IR datasets, including Italian SQuAD DICE, SciFact, ArguAna, NFCorpus. assess four LLMs (GPT4o, LLama-3.1 8B, Mistral-Nemo, Gemma-2b) QA tasks within retrieval-augmented generation (RAG) pipeline. them SQuAD, CovidQA, NarrativeQA cross-lingual scenarios. The results show multilingual perform more competitively than language-specific ones. embed-multilingual-v3.0 achieves top nDCG@10 scores 0.90 0.86 In evaluation, Mistral-Nemo demonstrates superior answer relevance (0.911.0) while maintaining strong groundedness (0.64-0.78). Our analysis reveals three key findings: (1) effectively bridge gaps between Italian, though consistency decreases in specialized domains, (2) size does not consistently predict performance, (3) all evaluated systems exhibit critic... Read More

7. AI-Based Search System with Intelligent Agents and Dynamic Knowledge Base Construction Using Retrieval Augmented Generation

ARTI ANALYTICS INC, 2025

An AI-powered search system that uses intelligent agents to find curated, directly useful information from disparate sources like websites, documents, and databases. The system uses AI techniques like Retrieval Augmented Generation (RAG) to analyze search queries and dynamically build a knowledge base of indexed information. It enables features like automated form filling, restricted access, and personalization. The system also learns and grows its knowledge base based on user confirmations and AI processing.

US12306834B1-patent-drawing

8. Next Sentence Prediction with BERT as a Dynamic Chunking Mechanism for Retrieval-Augmented Generation Systems

alexandre thurow bender, gabriel gomes, ulisses brisolara correa - George A. Smathers Libraries, 2025

Retrieval-Augmented Generation systems enhance the generative capabilities of large language models by grounding their responses in external knowledge bases, addressing some major limitations and improving reliability for tasks requiring factual accuracy or domain-specific information. Chunking is a critical step pipelines, where text divided into smaller segments to facilitate efficient retrieval optimize use model context. This paper introduces method that uses BERT's Next Sentence Prediction adaptively merge related sentences context-aware chunks. We evaluate approach on SQuAD v2 dataset, comparing it standard chunking methods using Recall@k, Precision@k, Contextual-Precision@k, processing time as metrics. Results indicate proposed achieves competitive performance while reducing computational roughly 60%, demonstrating its potential improve systems.

9. Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation

matthew boraske, richard burns - George A. Smathers Libraries, 2025

In this paper, we investigate whether pre-trained large language models (LLMs) can align with human moral judgments on a dataset of approximately fifty thousand interpersonal conflicts from the AITA (Am I A******) subreddit, an online forum where users evaluate morality others. We introduce retrieval-augmented generation (RAG) approach that uses LLMs as core components. After collecting conflict posts and embedding them in vector database, RAG agent retrieves most relevant for each new query. Then, these are used sequentially context to gradually refine LLM's judgment, providing adaptability without having undergo costly fine-tuning. Using OpenAI's GPT-4o, our outperforms directly prompting LLM while achieving 83\% accuracy Matthews correlation coefficient 0.469 also reducing rate toxic responses 22.53\% virtually zero. These findings indicate integration into agents is effective method improve their alignment mitigating language.

10. Retrieval Augmented Generation: What Works and Lessons Learned

peter l elkin, gaurang mehta, frank lehouillier - IOS Press, 2025

Retrieval Augmented Generation has been shown to improve the output of large language models (LLMs) by providing context question or scenario posed model. We have tried a series experiments understand how best performance native models. present results each several experiments. These can serve as lessons learned for scientists looking medical answering tasks.

11. Enhancing Large Language Models for Specialized Domains: A Two-Stage Framework with Parameter-Sensitive LoRA Fine-Tuning and Chain-of-Thought RAG

yao he, xuanbing zhu, donghan li - Multidisciplinary Digital Publishing Institute, 2025

Large language models (LLMs) have shown impressive general-purpose capabilities, but their application in specialized domains such as healthcare and law remains limited due to two major challenges, namely, a lack of deep domain-specific knowledge the inability incorporate real-time information updates. This paper focuses on addressing these challenges by introducing parameter-sensitive low-rank adaptation (LoRA) retrieval-augmented generation (RAG), named SensiLoRA-RAG, two-stage framework designed enhance LLM performance question-answering tasks. In first stage, we propose LoRA fine-tuning method that efficiently adapts LLMs using high-quality professional data, enabling rapid resource-efficient specialization. second develop chain-of-thought RAG mechanism dynamically retrieves integrates up-to-date external knowledge, improving models ability reason with current complex domain context. We evaluate our tasks medical legal fields, demonstrating SensiLoRA-RAG significantly improves answer accuracy, relevance, adaptability compared baseline methods.

12. Building an LLM Agent for Life Sciences Literature QA and Summarization

nishanth joseph paulraj - GSC Online Press, 2025

This article explores the development of a specialized artificial intelligence agent that combines Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) techniques to address challenges biomedical literature search and synthesis. The unprecedented growth published research in life sciences has created an information crisis traditional methods cannot effectively manage. Researchers face significant including overwhelming volume, domain-specific terminology barriers, difficulty making cross-study connections, severe time constraints. proposed LLM+RAG architecture offers comprehensive solution featuring document processing for scientific papers, biomedical-specific vector embeddings, advanced retrieval strategies, sophisticated reasoning capabilities. system integrates PubMed other databases while providing natural language interfaces significantly reduce cognitive burden researchers. Domain-specific optimizations such as entity recognition, relationship extraction, embeddings further enhance performance across diverse scenarios. Evaluation through benchmark testing, ex... Read More

13. Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions

michael a oumano, shawn m pickett - Society of Nuclear Medicine and Molecular Imaging, 2025

This study investigated the application of large language models (LLMs) with and without retrieval-augmented generation (RAG) in nuclear medicine, particularly their performance across various topics relevant to field, evaluate potential use as reliable tools for professional education clinical decision-making. Methods: We evaluated LLMs, including OpenAI GPT-4o series, Google Gemini, Cohere, Anthropic, Meta Llama3, 15 medicine topics. The models' accuracy was assessed using a set 600 sample questions, covering range technical domains medicine. Overall measured by averaging these Additional comparisons were conducted individual models. Results: OpenAI's models, openai_nvidia_gpt-4o_final openai_mxbai_gpt-4o_final, demonstrated highest overall accuracy, achieving scores 0.787 0.783, respectively, when RAG implemented. Anthropic Opus Gemini 1.5 Pro followed closely, competitive 0.773 0.750 RAG. Cohere Llama3 showed more variability performance, ollama_llama3 model (without RAG) lowest accuracy. Discrepancies noted question interpretation, complex guidelines imaging-based queries. Concl... Read More

14. Enhancing Large Language Model Performance on ENEM Math Questions Using Retrieval-Augmented Generation

João Superbi, H. Sofia Pinto, Emanoel Santos - Sociedade Brasileira de Computação - SBC, 2024

In this study, we explore the use of Retrieval-Augmented Generation (RAG) to improve the performance of large language models (LLMs), such as GPT-3.5 Turbo and GPT-4o, in solving ENEM mathematics questions. Our experiments demonstrate that RAG potentially provides significant improvements in accuracy by introducing relevant contextual information. With RAG, GPT-4o consistently outperforms GPT-3.5 Turbo, underscoring the potential of this technique to enhance educational AI tools. This research illustrates the potential of RAG-enhanced LLMs to advance educational applications and encourages further exploration in this field.

15. Automating Systematic Literature Reviews with Retrieval-Augmented Generation: A Comprehensive Overview

Binglan Han, Teo Sušnjak, Anuradha Mathrani - MDPI AG, 2024

This study examines Retrieval-Augmented Generation (RAG) in large language models (LLMs) and their significant application for undertaking systematic literature reviews (SLRs). RAG-based LLMs can potentially automate tasks like data extraction, summarization, and trend identification. However, while LLMs are exceptionally proficient in generating human-like text and interpreting complex linguistic nuances, their dependence on static, pre-trained knowledge can result in inaccuracies and hallucinations. RAG mitigates these limitations by integrating LLMs generative capabilities with the precision of real-time information retrieval. We review in detail the three key processes of the RAG frameworkretrieval, augmentation, and generation. We then discuss applications of RAG-based LLMs to SLR automation and highlight future research topics, including integration of domain-specific LLMs, multimodal data processing and generation, and utilization of multiple retrieval sources. We propose a framework of RAG-based LLMs for automating SRLs, which covers four stages of SLR process: literature s... Read More

16. Domain-Driven LLM Development: Insights into RAG and Fine-Tuning Practices

J.C. Santos, Rachel Hu, Richard Song - ACM, 2024

To improve Large Language Model (LLM) performance on domain specific applications, ML developers often leverage Retrieval Augmented Generation (RAG) and LLM Fine-Tuning. RAG extends the capabilities of LLMs to specific domains or an organization's internal knowledge base, without the need to retrain the model. On the other hand, Fine-Tuning approach updates LLM weights with domain-specific data to improve performance on specific tasks. The fine-tuned model is particularly effective to systematically learn new comprehensive knowledge in a specific domain that is not covered by the LLM pre-training. This tutorial walks through the RAG and Fine-Tuning techniques, discusses the insights of their advantages and limitations, and provides best practices of adopting the methodologies for the LLM tasks and use cases. The hands-on labs demonstrate the advanced techniques to optimize the RAG and fine-tuned LLM architecture that handles domain specific LLM tasks. The labs in the tutorial are designed by using a set of open-source python libraries to implement the RAG and fine-tuned LLM architect... Read More

17. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

Wenqi Fan, Yujuan Ding, Liangbo Ning - ACM, 2024

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the quality of the generated content of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: Furthermore, to deliver deep... Read More

18. Pencils Down! Automatic Rubric-based Evaluation of Retrieve/Generate Systems

Naghmeh Farzi, Laura Dietz - ACM, 2024

Current IR evaluation paradigms are challenged by large language models (LLMs) and retrieval-augmented generation (RAG) methods. Furthermore, evaluation either resorts to expensive human judgments or lead to an over-reliance on LLMs.

19. In-Context Learning for Scalable and Online Hallucination Detection in RAGS

Nicolò Cosimo Albanese - Academy & Industry Research Collaboration Center, 2024

Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other prompt-based and semantic similarity baselines from recent literature, our method improves hallucination detection F1 scores by at least 11%, with consistent performance across different models. This research offers a practical solution for real-time validation of response accuracy in RAG systems, fostering responsible adoption, especially in critical domains where document fidelity is paramount.

20. Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Rongsheng Li, Jin Xu, Zhixiong Cao - MDPI AG, 2024

In the realm of large language models (LLMs), extending the context window for long text processing is crucial for enhancing performance. This paper introduces SBA-RoPE (Segmented Base Adjustment for Rotary Position Embeddings), a novel approach designed to efficiently extend the context window by segmentally adjusting the base of rotary position embeddings (RoPE). Unlike existing methods, such as Position Interpolation (PI), NTK, and YaRN, SBA-RoPE modifies the base of RoPE across different dimensions, optimizing the encoding of positional information for extended sequences. Through experiments on the Pythia model, we demonstrate the effectiveness of SBA-RoPE in extending context windows, particularly for texts exceeding the original training lengths. We fine-tuned the Pythia-2.8B model on the PG-19 dataset and conducted passkey retrieval and perplexity (PPL) experiments on the Proof-pile dataset to evaluate model performance. Results show that SBA-RoPE maintains or improves model performance when extending the context window, especially on longer text sequences. Compared to other m... Read More

21. Benchmarking Large Language Models in Retrieval-Augmented Generation

22. InspectorRAGet: An Introspection Platform for RAG Evaluation

23. BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

24. Retrieval-Augmented Generation for Natural Language Processing: A Survey

25. Retrieval-Augmented Generation in Large Language Models through Selective Augmentation

Get Full Report

Access our comprehensive collection of 99 documents related to this technology