Research on Multi-hop Q/A
Multi-hop question answering requires systems to connect multiple pieces of information across documents to arrive at correct answers. Current benchmarks show that while single-hop accuracy reaches 85-90%, performance drops to 45-60% when questions require 2-3 reasoning steps. This gap highlights the challenge of maintaining contextual understanding across multiple information retrievals and inference steps.
The fundamental challenge lies in balancing the breadth of information retrieval against the precision needed for each reasoning step while maintaining semantic coherence throughout the chain.
This page brings together solutions from recent research—including graph-based reasoning architectures, iterative retrieval-then-reasoning approaches, and methods for decomposing complex questions into simpler sub-queries. These and other approaches focus on building more robust multi-hop reasoning systems that can handle increasingly complex queries while maintaining accuracy.
1. SG-RAG MOT: SubGraph Retrieval Augmented Generation with Merging and Ordering Triplets for Knowledge Graph Multi-hop Question Answering
anwar saleh, gokhan tur, yucel saygin, 2025
Large Language Models (LLMs) often tend to hallucinate, especially on domain-specific tasks and that require reasoning. Previously, we introduced SubGraph Retrieval Augmented Generation (SG-RAG) as a novel GraphRAG method for multi-hop question answering. SG-RAG leverages Cypher queries search the given knowledge graph retrieve necessary subgraph answer question. The results from our previous work showed higher performance of compared traditional (RAG). In this work, further enhance by proposing an additional step called Merging Ordering Triplets (MOT). new MOT seeks decrease redundancy in retrieved triplets applying hierarchical merging subgraphs. Moreover, it provides ordering among using Breadth First Search (BFS) traversal algorithm. We conducted experiments MetaQA benchmark, which is proposed question-answering movies domain. Our show more accurate answers than Chain-of-Though Graph Chain-of-Though. also find out (up some point) highly overlapping subgraphs defining order helps LLM generate precise answers.
2. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain
simon knollmeyer, oguz caymazer, daniel grossmann - Multidisciplinary Digital Publishing Institute, 2025
Retrieval-Augmented Generation (RAG) systems have shown significant potential for domain-specific Question Answering (QA) tasks, although persistent challenges in retrieval precision and context selection continue to hinder their effectiveness. This study introduces Document Graph RAG (GraphRAG), a novel framework that bolsters robustness enhances answer generation by incorporating Knowledge Graphs (KGs) built upon documents intrinsic structure into the pipeline. Through application of Design Science Research methodology, we systematically design, implement, evaluate GraphRAG, leveraging graph-based document structuring keyword-based semantic linking mechanism improve quality. The evaluation, conducted on well-established datasets including SQuAD, HotpotQA, newly developed manufacturing dataset, demonstrates consistent performance gains over naive baseline across both metrics. results indicate GraphRAG improves Context Relevance metrics, with task-dependent optimizations chunk size, keyword density, top-k further enhancing performance. Notably, multi-hop questions benefit most fro... Read More
3. Generative AI System for Querying Heterogeneous Data Sources with Sub-Question Decomposition and Privacy Preservation
MICROSOFT TECHNOLOGY LICENSING LLC, 2025
A system that uses generative AI to intelligently query heterogeneous data sources to answer natural language questions. The system breaks down complex questions into simpler sub-questions and uses AI to identify the appropriate data source for each sub-question. It then generates custom queries for each data source and executes them to gather the necessary information. The AI summarizes the results and returns a coherent response to the original question. The AI avoids sharing or training on the actual data, preserving privacy while leveraging AI's query generation and data source selection abilities.
4. Retrieval-Augmented Query System with Document-Enhanced Prompting for Large Language Models
DATUM POINT LABS INC, 2025
Using a retrieval system in conjunction with large language models (LLMs) to improve accuracy and reduce the need for re-training when underlying information changes. The system retrieves relevant documents based on a user's query using a retriever. These documents are then combined with the original query to form a prompt for the LLM. The LLM generates an output using this augmented prompt that incorporates the retrieved documents' information. This provides a more accurate response compared to just using the LLM alone, especially for complex domains where the LLM may lack specific knowledge.
5. Dynamic Workflow Data Structure Utilizing Large Language Model for Context-Based Query Segmentation and Response Generation
A&E ENGINEERING INC, 2025
Creating a dynamic workflow data structure that can tailor responses to specific user queries and contexts. The method involves using a large language model (LLM) to process a corpus of documents, classify user queries based on context, construct a workflow structure from the classified segments, and generate personalized responses. The LLM segments the documents, analyzes the segments' semantics, and organizes them into a workflow structure. This structure is then used to generate customized responses for user queries based on the segment classification and query context.
6. Method for Enhancing AI Question Answering Systems via Knowledge Graph and Vector Search Integration
INTERNATIONAL BUSINESS MACHINES CORP, 2025
Using knowledge graphs and vector searches to improve the efficiency and accuracy of AI question answering systems. The method involves decomposing user queries into tasks, searching a knowledge graph to identify relevant entities, searching a vector database using the entities to find text chunks, and using an LLM to generate answers based on the identified text. This refined search scope improves the LLM's efficiency and accuracy compared to searching large text databases directly.
7. System for Generating Query Language Responses from Natural Language Inputs Using Knowledge Graphs and AI Models
DELL PRODUCTS LP, 2025
Automatically generating context-based responses to natural language queries using knowledge graphs in combination with artificial intelligence techniques. The system generates queries in a predetermined query language from natural language inputs, processes them using enterprise knowledge graphs and AI models, and combines the results to provide more accurate and comprehensive responses. This allows users without technical knowledge to search and query knowledge graphs using natural language and make decisions based on the responses.
8. Multi-Hop Question Answering over Knowledge Graphs
Janadhi Uyanhewage, Viraj Welgama, Ruvan Weerasinghe - Sri Lanka Journals Online, 2024
Multi-Hop Question Answering over Knowledge Graphs (MHQA-KG) plays a pivotal role in various applications, including but not limited to Question Answering, Recommendation Systems, and Semantic Search. Nevertheless, current models for MHQA have limitations in their ability to grasp all the information included in the question, resulting a reduction in accuracy when producing answers. In order to mitigate this limitation, this paper proposes a novel Multi-Hop Question Answering over Knowledge Graphs approach. It mainly utilizes question and path embedding to answer multi-hop questions, significantly improving accuracy. This approach effectively captures auxiliary information that may present in the question. The experimental findings provide evidence that the suggested methodology outperforms the current state-of-the-art models, achieving highly accurate outcomes with improvements.
9. Tree-of-Reasoning Question Decomposition for Complex Question Answering with Large Language Models
Kun Zhang, Jiali Zeng, Fandong Meng - Association for the Advancement of Artificial Intelligence (AAAI), 2024
Large language models (LLMs) have recently demonstrated remarkable performance across various Natual Language Processing tasks. In the field of multi-hop reasoning, the Chain-of-thought (CoT) prompt method has emerged as a paradigm, using curated stepwise reasoning demonstrations to enhance LLM's ability to reason and produce coherent rational pathways. To ensure the accuracy, reliability, and traceability of the generated answers, many studies have incorporated information retrieval (IR) to provide LLMs with external knowledge. However, existing CoT with IR methods decomposes questions into sub-questions based on a single compositionality type, which limits their effectiveness for questions involving multiple compositionality types. Additionally, these methods suffer from inefficient retrieval, as complex questions often contain abundant information, leading to the retrieval of irrelevant information inconsistent with the query's intent. In this work, we propose a novel question decomposition framework called TRQA for multi-hop question answering, which addresses these limitations. ... Read More
10. A Relation Embedding Assistance Networks for Multi-hop Question Answering
Songlin Jiao, Zhenfang Zhu, Jiangtao Qi - Association for Computing Machinery (ACM), 2024
Multi-hop Knowledge Graph Question Answering aims at finding an entity to answer natural language questions from knowledge graphs. When humans perform multi-hop reasoning, people tend to focus on specific relations across different hops and confirm the next entity. Therefore, most algorithms choose the wrong specific relation, which makes the system deviate from the correct reasoning path. The specific relation at each hop plays an important role in multi-hop question answering. Existing work mainly relies on the question representation as relation information, which cannot accurately calculate the specific relation distribution. In this article, we propose an interpretable assistance framework that fully utilizes the relation embeddings to assist in calculating relation distributions at each hop. Moreover, we employ the fusion attention mechanism to ensure the integrity of relation information and hence to enrich the relation embeddings. The experimental results on three English datasets and one Chinese dataset demonstrate that our method significantly outperforms all baselines. The... Read More
11. Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering
Zhengliang Shi, Shuo Zhang, Weiwei Sun, 2024
Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method.
12. Relation Cross-fusion Attention Assistance Networks for Multi-hop Question Answering over Knowledge Graphs
Yana Lv, Ho Nguyen Phu Bao, Xiuli Du - Research Square Platform LLC, 2024
<title>Abstract</title> Multi-hop knowledge graph question answering aims to find answer entities from the knowledge graph based on natural language questions. This is a challenging task as it requires precise reasoning about entity relationships at each step. When humans perform multi-hop reasoning, they usually focus on specific relations between different hops and determine the next entity. However, most algorithms often choose the wrong specific relations, causing the system to deviate from the correct reasoning path. In multi-hop question answering, the specific relation between each hop is crucial. The existing TransferNet model mainly relies on question representation for relational reasoning, but cannot accurately calculate the specific relational distribution, which will profoundly affect question answering performance. On this basis, this paper proposes an interpretable assiatance framework, which makes full use of relation embedding and question semantics, and uses the attention mechanism to cross-fuse the relevant information of them to assist in calculating the relation ... Read More
13. GenDec: A robust generative Question-decomposition method for Multi-hop reasoning
Jian Wu, Linyi Yang, Yuliang Ji, 2024
Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.
14. RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs
Mayank Kharbanda, Rajiv Ratn Shah, Raghava Mutharaju, 2024
Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link prediction tasks. The latter is for answering complex logical queries. The logical multi-hop querying technique embeds the KG and queries in the same embedding space. The existing work incorporates First Order Logic (FOL) operators, such as conjunction ($\wedge$), disjunction ($\vee$), and negation ($\neg$), in queries. Though current models have most of the building blocks to execute the FOL queries, they cannot use the dense information of multi-modal entities in the case of Multi-Modal Knowledge Graphs (MMKGs). We propose RConE, an embedding method to capture the multi-modal information needed to answer a query. The model first shortlists candidate (multi-modal) entities containing the answer. It then finds the solution (sub-entities) within those entities. Several existing works tackle path-based question-answering in MMKGs. However, to ... Read More
15. Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models
Yucheng Shi, Qiaoyu Tan, Xuansheng Wu, 2024
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework tailored for multi-hop question answering. RAE first retrieves edited facts and then refines the language model through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that na\"ive similarity-based searches might miss. Additionally, our framework incorporates a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally... Read More
16. STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering
Zhenyu Bi, Daniel Hajialigol, Zhongkai Sun, 2024
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting. In this paper, we propose STOC-TOT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reas... Read More
17. Text Reasoning Chain Extraction for Multi-Hop Question Answering
Pengming Wang, Zijiang Zhu, Qing Chen - Tsinghua University Press, 2024
With the advent of the information age, it will be more troublesome to search for a lot of relevant knowledge to find the information you need. Text reasoning is a very basic and important part of multi-hop question and answer tasks. This paper aims to study the integrity, uniformity, and speed of computational intelligence inference data capabilities. That is why multi-hop reasoning came into being, but it is still in its infancy, that is, it is far from enough to conduct multi-hop question and answer questions, such as search breadth, process complexity, response speed, comprehensiveness of information, etc. This paper makes a text comparison between traditional information retrieval and computational intelligence through corpus relevancy and other computing methods. The study finds that in the face of multi-hop question and answer reasoning, the reasoning data that traditional retrieval methods lagged behind in intelligence are about 35% worse. It shows that computational intelligence would be more complete, unified, and faster than traditional retrieval methods. This paper also i... Read More
18. MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Yixuan Tang, Yi Yang, 2024
Retrieval-augmented generation (RAG) augments large language models (LLM) by retrieving relevant knowledge, showing promising potential in mitigating LLM hallucinations and enhancing response quality, thereby facilitating the great adoption of LLMs in practice. However, we find that existing RAG systems are inadequate in answering multi-hop queries, which require retrieving and reasoning over multiple pieces of supporting evidence. Furthermore, to our knowledge, no existing RAG benchmarking dataset focuses on multi-hop queries. In this paper, we develop a novel dataset, MultiHop-RAG, which consists of a knowledge base, a large collection of multi-hop queries, their ground-truth answers, and the associated supporting evidence. We detail the procedure of building the dataset, utilizing an English news article dataset as the underlying RAG knowledge base. We demonstrate the benchmarking utility of MultiHop-RAG in two experiments. The first experiment compares different embedding models for retrieving evidence for multi-hop queries. In the second experiment, we examine the capabilities o... Read More
19. MQA-KEAL: Multi-hop Question Answering under Knowledge Editing for Arabic Language
Muhammad Ali, Nawal Daftardar, M.A Waheed, 2024
Large Language Models (LLMs) have demonstrated significant capabilities across numerous application domains. A key challenge is to keep these models updated with latest available information, which limits the true potential of these models for the end-applications. Although, there have been numerous attempts for LLMs Knowledge Editing (KE), i.e., to edit the LLMs prior knowledge and in turn test it via Multi-hop Question Answering (MQA), yet so far these studies are primarily focused on English language. To bridge this gap, in this paper we propose: Multi-hop Questioning Answering under Knowledge Editing for Arabic Language (MQA-KEAL). MQA-KEAL stores knowledge edits as structured knowledge units in the external memory. In order to solve multi-hop question, it first uses task-decomposition to decompose the question into smaller sub-problems. Later for each sub-problem, it iteratively queries the external memory and/or target LLM in order to generate the final response. In addition, we also contribute MQUAKE-AR (Arabic translation of English benchmark MQUAKE), as well as a new benchma... Read More
20. A Content-based Reasoning Method for Multi-hop Question Answering using Graph Neural Networks
Arash Ghafouri, Hasan Naderi, Behrouz Minaei‐Bidgoli - Research Square Platform LLC, 2024
Abstract Question-answering systems require retrieving evidence from multiple documents or paragraphs and reasoning over them to meet users' information needs and answer their complex questions. On the other hand, the Explainability and comprehensibility of the predictions made by question-answering systems pose a challenge. In this paper, a content-based reasoning approach based on graph-based machine reading comprehension methods is proposed to answer multi-hop complex questions. In this approach, relevant paragraphs are selected in a two-step process after receiving the input of a multi-hop complex question. Then, to facilitate content-based reasoning and utilize the evidence related to the multi-hop complex question in the retrieved paragraphs, an incoherent graph infrastructure is constructed. Subsequently, a graph neural network and a transformer are employed as an encoder to extract the content-based answer relevant to the question from the graph infrastructure. Finally, to overcome the challenge of interpretability in the question-answering system, a transformer and the predi... Read More
Get Full Report
Access our comprehensive collection of 106 documents related to this technology