Augmenting LLM with RAG
Retrieval-Augmented Generation (RAG) extends Large Language Model capabilities by connecting them to external knowledge bases. Current implementations show query latency ranging from 100ms to several seconds, with retrieval accuracy varying significantly based on embedding quality and context window limitations. Storage requirements can reach multiple terabytes for comprehensive knowledge bases, while maintaining index freshness presents ongoing operational challenges.
The fundamental challenge lies in balancing retrieval accuracy and response latency while managing computational resources and maintaining data relevance.
This page brings together solutions from recent research—including hybrid vector-semantic search approaches, streaming retrieval architectures, efficient embedding techniques, and context optimization methods. These and other approaches focus on building practical, production-ready RAG systems that deliver reliable and timely responses.
1. A Systematic Review of Retrieval-Augmented Generation for Enhancing Domain-Specific Knowledge in Large Language Models
murtiyoso murtiyoso, imam tahyudin, berlilana berlilana - Politeknik Ganesha, 2025
This literature review examines the use of Retrieval-Augmented Generation (RAG) in enhancing Large Language Models (LLM) for domain-specific knowledge. RAG integrates retrieval techniques with generative models to access external knowledge sources, addressing limitations LLMs handling specialized information. By leveraging data, improves accuracy and relevance generated content, making it particularly useful fields that require detailed up-to-date highlights effectiveness overcoming challenges such as data sparsity dynamic nature Furthermore, discusses potential enhance LLM performance, scalability, ability generate contextually accurate responses knowledge-intensive applications. Key future research directions implementation are also identified.
2. Enhancing Document-Level Question Answeringvia Multi-Hop Retrieval-Augmented Generationwith LLaMA 3
xinyue huang, ziqi lin, fang sun, 2025
This paper presents a novel Retrieval-AugmentedGeneration (RAG) framework tailored for complex questionanswering tasks, addressing challenges in multi-hop reasoningand contextual understanding across lengthy documents. Builtupon LLaMA 3, the integrates dense retrievalmodule with advanced context fusion and reasoningmechanisms, enabling more accurate coherent responsegeneration. A joint optimization strategy combining retrievallikelihood generation cross-entropy improves modelsrobustness adaptability. Experimental results show that theproposed system outperforms existing retrieval-augmented andgenerative baselines, confirming its effectiveness deliveringprecise, contextually grounded answers.
3. Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation
stefan wagenpfeil - Multidisciplinary Digital Publishing Institute, 2025
Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails capture complex semantic structures, relational dependencies, multimodal content. In this paper, we introduce Graph Codesa matrix-based encoding Multimedia Feature Graphsas an alternative retrieval paradigm. Codes preserve topology explicitly entities their typed relationships from documents, enabling structure-aware interpretable retrieval. We evaluate our system in two domains: scene understanding (200 annotated image-question pairs) clinical question answering (150 real-world medical queries with 10,000 structured knowledge snippets). Results show that method outperforms baselines precision (+915%), reduces hallucination rates over 30%, yields higher expert-rated answer quality. Theoretically, work demonstrates symbolic similarity graphs provides more faithful alignment mechanism t... Read More
4. Conversational Agent for Medical Question-Answering Using RAG and LLM
la ode muhammad yudhy prayitno, annisa nurfadilah, septiyani bayu saudi - Ioinformatic, 2025
This study analyzes the application of RAG concept alongside an LLM in context PubMed QA data to augment question-answering capabilities medical context. For answering questions relevant private healthcare institutions, Mistral 7B model was utilized. To limit hallucinations, embedding used for document indexing, ensuring that answers based on provided information. The analysis conducted using five models, two which are specialized PubMedBERT-base and BioLORD-2023, as well three general GIST-large-Embedding-v0, blade-embed-kd, all-MiniLM-L6-v2. As results showed, models performed better than domain specific especially GIST-large-Embedding-v0 b1ade-embed-kd, underscores dominance general-purpose training datasets terms fundamental semantic retrieval, even domains. outcome this research demonstrates applying locally can safeguard privacy while still responding queries with appropriate precision, thus establishing a foundation dependable system.
5. Large Language Model Processing with Iterative Attention Focusing and Context-Optimized Retrieval Techniques
VIJAY MADISETTI, 2025
Enhancing attention span of large language models (LLMs) when processing long documents and improving Retrieval-Augmented Generation (RAG) through context-optimized retrieval techniques. The methods involve iterative attention focusing, context-aware document processing, intelligent information retrieval, adaptive response generation, and dynamic adjustment of LLM's attention focus. It also uses functional abstraction through hierarchical tokens to improve context management and semantic coherence.
6. Combining Retrieval-Augmented Generation and Fine-tuning of Large Language Models to Enhance Port Industry Question-Answering Systems
xiao hu, mideth abisado - Whioce Publishing Pte Ltd., 2025
In this research, we develop a new hybrid architecture that combines Retrieval-Augmented Generation (RAG) and LLMs (Large Language Models) in order to address the specific gaps domain question answering systems for maritime port industry. Our approach mitigates generic LLMs limitations concerning domain-specific queries through combination of knowledge retrieval industry adaptive modelling with implemented parameters. The overarching evaluation protocol designed investigating was both quantitative qualitative by using expert judgement which showed marked improvement justifiable gains across multi-dimensional stand-alone approaches regarding factual correctness, accuracy use terms, compliance relevant policies. system achieved 23% nDCG@5 scores alongside exceeding 90% terminology used context, maintaining sub-second response times under typical operational loads. experts consulted study were particularly impressed balance struck between precision contextual understanding complex scenarios. Such enables decision-makers critical environments greatly trust within their active contexts... Read More
7. Automated Cybersecurity System with Large Language Models and Retrieval-Augmented Generation for Suspicious Activity Report Generation
CITIBANK NA, 2025
Automated system for generating suspicious activity reports (SARs) in cybersecurity using large language models (LLMs) and retrieval-augmented generation (RAG). The system improves SAR generation accuracy and consistency by leveraging LLMs and RAG to incorporate more relevant data into SARs. The system obtains event data, extracts features, queries historical data with similar features, composes the events, prompts the LLM, and generates the SAR. This provides the LLM with more contextualized data to generate accurate SARs without needing extensive training on SARs themselves.
8. Language Model Output Enhancement via Structured Universal Language Integration
UNLIKELY ARTIFICIAL INTELLIGENCE LTD, 2025
Improving the output of large language models (LLMs) like GPT-3 by using a structured, machine-readable language like Universal Language (UL) to provide new context data for the LLM. This allows the LLM to generate more accurate and improved continuation text output in response to prompts. The UL representation allows more expressive and detailed meaning representation compared to natural language. The LLM uses UL output as prompts, and the UL processing system analyzes and improves the LLM's output. This provides more accurate and reliable LLM output.
9. IMPLEMENTING AND ASSESSING RETRIEVAL AUGMENTED GENERATION (RAG) FOR LLM-BASED DOCUMENTS QUERIES
peter kaczmarski, fernand vandamme - Routledge, 2025
In recent years, AI-related technology referred to as RAG (Retrieval Augmented Generation) (Lewis, 2020) gained a lot of attention. the RAG-approach, custom sources information are used seed knowledge obtained from LLM (Large Language Model), thus forming an approach which solves issue adapting cope with external information. Using RAGscenario, various processing use cases can be implemented, such AI-based document management, AI-enhanced web search, online service support, etc. This paper outlines main components RAG-workflow chunking and embedding input documents, well similarity -based user query processing. The is illustrated via Python implementation validate procedure simple example multi-topic document. Experimental results discussed showing feasibility this approach, illustrating need for further research enhancements, by RAPTOR concept (Sarthi, 2024).
10. SG-RAG MOT: SubGraph Retrieval Augmented Generation with Merging and Ordering Triplets for Knowledge Graph Multi-hop Question Answering
anwar saleh, gokhan tur, yucel saygin, 2025
Large Language Models (LLMs) often tend to hallucinate, especially on domain-specific tasks and that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel GraphRAG method for multi-hop question answering. SG-RAG leverages Cypher queries search the given knowledge graph retrieve necessary subgraph answer question. The results from our previous work showed higher performance of compared traditional (RAG). In this work, further enhance by proposing an additional step called Merging Ordering Triplets (MOT). new MOT seeks decrease redundancy in retrieved triplets applying hierarchical merging subgraphs. Moreover, it provides ordering among using Breadth First Search (BFS) traversal algorithm. We conducted experiments MetaQA benchmark, which is proposed question-answering movies domain. Our show more accurate answers than Chain-of-Though Graph Chain-of-Though. also find out (up some point) highly overlapping subgraphs defining order helps LLM generate precise answers.
11. Retrieval-Augmented Generation System with Reconfigurable Ranker Sequence and Self-Rewarding Optimization Techniques
GOLDMAN SACHS & CO LLC, 2025
Optimizing retrieval-augmented generation (RAG) systems with a reconfigurable sequence of rankers in the retriever model to improve the quality of information chunks provided to the generative model. The rankers in the reconfigurable sequence are bi-encoders, cross-encoders, and an LLM-ranker. The rankers identify relevant chunks from documents for user queries. This allows more-relevant information chunks to be provided to the generative model, increasing output quality. The rankers can be reconfigured to optimize chunk selection. Additionally, self-rewarding optimization techniques are provided to train the entire RAG system using rewards based on generated responses.
12. Retrieval Augmented Generation System with Multi-Query Expansion and Vector Search Fusion
ELSEVIER INC, 2025
Retrieval augmented generation (RAG) system that improves search results by generating multiple natural language queries from a user's input, performing vector searches on both the user queries and the generated queries, and fusing the results to compile a more comprehensive and accurate search result. The system involves inputting user queries into a large language model to generate distinct associated queries, performing vector searches on all queries, compiling the results into a fused search, and summarizing the fused results. This bridges the gap between explicit and implicit search intent, leveraging the language model to expand queries and the vector search to fuse results.
13. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain
simon knollmeyer, oguz caymazer, daniel grossmann - Multidisciplinary Digital Publishing Institute, 2025
Retrieval-Augmented Generation (RAG) systems have shown significant potential for domain-specific Question Answering (QA) tasks, although persistent challenges in retrieval precision and context selection continue to hinder their effectiveness. This study introduces Document Graph RAG (GraphRAG), a novel framework that bolsters robustness enhances answer generation by incorporating Knowledge Graphs (KGs) built upon documents intrinsic structure into the pipeline. Through application of Design Science Research methodology, we systematically design, implement, evaluate GraphRAG, leveraging graph-based document structuring keyword-based semantic linking mechanism improve quality. The evaluation, conducted on well-established datasets including SQuAD, HotpotQA, newly developed manufacturing dataset, demonstrates consistent performance gains over naive baseline across both metrics. results indicate GraphRAG improves Context Relevance metrics, with task-dependent optimizations chunk size, keyword density, top-k further enhancing performance. Notably, multi-hop questions benefit most fro... Read More
14. Integrating pre-trained LLMS with RAG for efficient content retrieval
tran trong kien, khau van bich - Lac Hong University, 2025
Large Language Models (LLMs) are highly effective at replicating human tasks and boosting productivity but face challenges in accurate data extraction due to prioritizing fluency over factual precision. Researchers addressing these limitations by combining LLMs with Retrieval-Augmented Generation (RAG) models. This approach utilizes chunking, searching, ranking algorithms streamline retrieval from unstructured text, improving LLMs precision processing. The findings provide key insights into optimizing chunking strategies set the stage for advancement broader application of RAG-enhanced systems.
15. A Comprehensive Evaluation of Embedding Models and LLMs for IR and QA Across English and Italian
ermelinda oro, francesco granata, massimo ruffolo - Multidisciplinary Digital Publishing Institute, 2025
This study presents a comprehensive evaluation of embedding techniques and large language models (LLMs) for Information Retrieval (IR) question answering (QA) across languages, focusing on English Italian. We address significant research gap by providing empirical evidence model performance linguistic boundaries. evaluate 12 diverse IR datasets, including Italian SQuAD DICE, SciFact, ArguAna, NFCorpus. assess four LLMs (GPT4o, LLama-3.1 8B, Mistral-Nemo, Gemma-2b) QA tasks within retrieval-augmented generation (RAG) pipeline. them SQuAD, CovidQA, NarrativeQA cross-lingual scenarios. The results show multilingual perform more competitively than language-specific ones. embed-multilingual-v3.0 achieves top nDCG@10 scores 0.90 0.86 In evaluation, Mistral-Nemo demonstrates superior answer relevance (0.911.0) while maintaining strong groundedness (0.64-0.78). Our analysis reveals three key findings: (1) effectively bridge gaps between Italian, though consistency decreases in specialized domains, (2) size does not consistently predict performance, (3) all evaluated systems exhibit critic... Read More
16. AI-Based Search System with Intelligent Agents and Dynamic Knowledge Base Construction Using Retrieval Augmented Generation
ARTI ANALYTICS INC, 2025
An AI-powered search system that uses intelligent agents to find curated, directly useful information from disparate sources like websites, documents, and databases. The system uses AI techniques like Retrieval Augmented Generation (RAG) to analyze search queries and dynamically build a knowledge base of indexed information. It enables features like automated form filling, restricted access, and personalization. The system also learns and grows its knowledge base based on user confirmations and AI processing.
17. Next Sentence Prediction with BERT as a Dynamic Chunking Mechanism for Retrieval-Augmented Generation Systems
alexandre thurow bender, gabriel gomes, ulisses brisolara correa - George A. Smathers Libraries, 2025
Retrieval-Augmented Generation systems enhance the generative capabilities of large language models by grounding their responses in external knowledge bases, addressing some major limitations and improving reliability for tasks requiring factual accuracy or domain-specific information. Chunking is a critical step pipelines, where text divided into smaller segments to facilitate efficient retrieval optimize use model context. This paper introduces method that uses BERT's Next Sentence Prediction adaptively merge related sentences context-aware chunks. We evaluate approach on SQuAD v2 dataset, comparing it standard chunking methods using Recall@k, Precision@k, Contextual-Precision@k, processing time as metrics. Results indicate proposed achieves competitive performance while reducing computational roughly 60%, demonstrating its potential improve systems.
18. Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation
matthew boraske, richard burns - George A. Smathers Libraries, 2025
In this paper, we investigate whether pre-trained large language models (LLMs) can align with human moral judgments on a dataset of approximately fifty thousand interpersonal conflicts from the AITA (Am I A******) subreddit, an online forum where users evaluate morality others. We introduce retrieval-augmented generation (RAG) approach that uses LLMs as core components. After collecting conflict posts and embedding them in vector database, RAG agent retrieves most relevant for each new query. Then, these are used sequentially context to gradually refine LLM's judgment, providing adaptability without having undergo costly fine-tuning. Using OpenAI's GPT-4o, our outperforms directly prompting LLM while achieving 83\% accuracy Matthews correlation coefficient 0.469 also reducing rate toxic responses 22.53\% virtually zero. These findings indicate integration into agents is effective method improve their alignment mitigating language.
19. Retrieval Augmented Generation: What Works and Lessons Learned
peter l elkin, gaurang mehta, frank lehouillier - IOS Press, 2025
Retrieval Augmented Generation has been shown to improve the output of large language models (LLMs) by providing context question or scenario posed model. We have tried a series experiments understand how best performance native models. present results each several experiments. These can serve as lessons learned for scientists looking medical answering tasks.
20. Enhancing Large Language Models for Specialized Domains: A Two-Stage Framework with Parameter-Sensitive LoRA Fine-Tuning and Chain-of-Thought RAG
yao he, xuanbing zhu, donghan li - Multidisciplinary Digital Publishing Institute, 2025
Large language models (LLMs) have shown impressive general-purpose capabilities, but their application in specialized domains such as healthcare and law remains limited due to two major challenges, namely, a lack of deep domain-specific knowledge the inability incorporate real-time information updates. This paper focuses on addressing these challenges by introducing parameter-sensitive low-rank adaptation (LoRA) retrieval-augmented generation (RAG), named SensiLoRA-RAG, two-stage framework designed enhance LLM performance question-answering tasks. In first stage, we propose LoRA fine-tuning method that efficiently adapts LLMs using high-quality professional data, enabling rapid resource-efficient specialization. second develop chain-of-thought RAG mechanism dynamically retrieves integrates up-to-date external knowledge, improving models ability reason with current complex domain context. We evaluate our tasks medical legal fields, demonstrating SensiLoRA-RAG significantly improves answer accuracy, relevance, adaptability compared baseline methods.
Get Full Report
Access our comprehensive collection of 104 documents related to this technology