Prior Art Search Acceleration

Prior art searching demands systematic analysis of millions of documents across multiple databases, languages, and jurisdictions. A typical comprehensive search evaluates 50-100 potentially relevant documents, each containing detailed technical disclosures that must be mapped to specific claim elements. The challenge is magnified when searching non-patent literature, which lacks the structured format of patent documents but often contains critical technical disclosures.

The fundamental challenge lies in balancing search breadth and precision while managing the cognitive load of analyzing complex technical relationships across large document sets.

This page brings together solutions from recent research—including AI-assisted claim element mapping, iterative classification-based refinement, concept-driven triage approaches, and automated search string optimization. These and other approaches focus on improving search accuracy while reducing the time required to identify and analyze relevant prior art.

1. Server-Based Data Extraction System with Iterative Vector Search and Labeling Mechanism

INVENTEC CORP, 2025

A system and method for extracting accurate data from large corpora using vector search and labeling. The system involves a server and a company knowledge base containing vectorized patent data. The server receives keywords, vectors them, searches the knowledge base, labels the found results, vectors the labels, and stores them to improve future searches. This iterative labeling and vectorization step trains the knowledge base to better match similar data.

2. Search Result Filtering Method with Primary and Secondary Term Extraction for Document Review Efficiency

ESI LABORATORY LLC, 2025

Filtering search results to improve efficiency and accuracy in document review processes like legal discovery. The method involves initially showing small search term results containing primary search terms to the user to quickly filter out irrelevant ones. This helps avoid reviewing entire documents. The user indicates relevance or removal for each result. The method also extracts secondary search terms from names/contact info, emails, and external sources. This allows finding relevant documents without reviewing every one.

3. Sparse Matrix-Based Document Scoring Algorithm for Query-Relevance Ranking

CAPITAL ONE SERVICES LLC, 2025

Efficient and accurate search through large document corpora using a sparse matrix-based scoring algorithm that allows for fast, relevant and size-aware search. The algorithm scores documents for relevance to a query by multiplying a sparse document matrix (containing term contribution values), a query vector (containing query term values), and an optional word coverage factor vector (containing document coverage values). The documents are then sorted by scores and the best ones returned in response to the query. This allows finding most relevant documents for a query quickly and accurately, as well as providing sizing information on how many documents match a query term.

4. IMPLEMENTING AND ASSESSING RETRIEVAL AUGMENTED GENERATION (RAG) FOR LLM-BASED DOCUMENTS QUERIES

peter kaczmarski, fernand vandamme - Routledge, 2025

In recent years, AI-related technology referred to as RAG (Retrieval Augmented Generation) (Lewis, 2020) gained a lot of attention. the RAG-approach, custom sources information are used seed knowledge obtained from LLM (Large Language Model), thus forming an approach which solves issue adapting cope with external information. Using RAGscenario, various processing use cases can be implemented, such AI-based document management, AI-enhanced web search, online service support, etc. This paper outlines main components RAG-workflow chunking and embedding input documents, well similarity -based user query processing. The is illustrated via Python implementation validate procedure simple example multi-topic document. Experimental results discussed showing feasibility this approach, illustrating need for further research enhancements, by RAPTOR concept (Sarthi, 2024).

5. Artificial Intelligence and Large Language Model Powered Literature Review Services

ayman musleh, saif aldeen alryalat - High Yield Medicine, 2025

Large language model (LLM) tools are transforming the way evidence is retrieved by converting natural prompts into quick, synthesized outputs. These platforms significantly reduce time required for literature searches, making them more accessible to users unfamiliar with formal search strategies. A close evaluation of four prominent platformsUndermind.ai, Scite.ai, Consensus.app, and OpenEvidencehighlights both notable advantages ongoing limitations. Undermind Consensus utilize extensive Semantic Scholar database over 200 million records, Scite enhances results Smart Citations that indicate supportive or opposing references, OpenEvidence applies a medically-focused LLM trained on licensed sources, including complete NEJM archive. Despite their benefits, key limitations persist: opaque algorithms, inconsistent responses identical queries, paywalls sign-up barriers, incomplete recall may compromise systematic reviews. To support critical appraisal, we outline essential information-retrieval metricsincluding recall, precision, F1-score, mean average specificityand prov... Read More

6. The principles, methods and algorithms for bibliographic search intelligence system design

aa boryaev - State Public Scientific-Technical Library, 2025

7. Patent Portfolio Management System with Claim Mapping, Concept Organization, and Similarity Indexing Tools

BLACK HILLS IP HOLDINGS LLC, 2025

Patent management system with tools to help analyze, organize, and search patent portfolios. The system allows quick and relevant patent analysis using features like claim mapping, concept organization, similarity indexing, and expanded search results. It provides tools to efficiently manage and analyze patent portfolios by leveraging automated techniques like mining, mapping, and indexing to extract insights from patent claims and texts. The system also enables expanded search results, concept organization, and claim mapping to help with quick and relevant patent analysis.

8. Multi-Dimensional Index Sharding System Using Record Type, Year, and Surname with Greedy Balancing Algorithm

ANCESTRY.COM OPERATIONS INC, 2025

Efficiently searching large databases by sharding the index using multiple dimensions to reduce search times and costs. The sharding is done on three dimensions: record type, year, and name. Instead of randomly distributing records across shards, they are organized by surname within each shard. This allows searches for specific names to only query a subset of shards, instead of all of them like in random sharding. The optimization is based on a greedy algorithm to balance shard allocation between dimensions.

9. Intelligent Semantic Search for Academic Journals Using AI and NLP Techniques

shireen fathi malo - Lectito Journals, 2025

The exponential growth of academic literature has rendered traditional keyword-based search engines increasingly inadequate for scholars seeking contextually relevant research. This study presents the design and implementation an intelligent semantic engine tailored journals, integrating state-of-the-art Artificial Intelligence (AI) Natural Language Processing (NLP) techniques. proposed system leverages sentence transformer models (all-mpnet-base-v2) embeddings, enabling vector-based similarity searches, alongside spaCy tokenization entity recognition to enhance syntactic understanding. An ontology-based matching mechanism further aligns user queries with domain-specific research topics, while fuzzy regular expressions improve error tolerance numeric filtering (e.g., CiteScore, Impact Factor). architecture combines these NLP layers Elasticsearch's hybrid capabilities process rank peer-reviewed journal metadata sourced from Scopus DOAJ. A modular FastAPI-based backend ensures scalability responsiveness, a lightweight frontend interface facilitates interactive input. contributes novel ... Read More

10. Artificial Intelligence System for Semantic and Relationship Analysis of Patent Documents with Automated Metadata Consolidation and Visualization

IP.COM I LLC, 2025

Artificial intelligence (AI) system for analyzing patent documents to provide insights into technology development trends, competitive intelligence, and patent analysis. The system uses AI techniques like semantic analysis and relationship analysis to automatically identify the most critical patents and documents in a collection, determine what they reveal, and provide summaries and visualizations. It consolidates metadata like citations, litigation, and expiration dates to provide statistics like citation indices and influence factors. The AI also generates alerts, flags, and indicators based on patent quality, relevance, and expiration. The system aims to automate tasks like competitive analysis, prior art search, patentability assessment, and freedom to operate analysis using AI instead of manual review.

11. Legal Research System with AI-Driven Natural Language Query Processing and Document Relevance Analysis

THOMSON REUTERS ENTERPRISE CENTRE GMBH, 2025

AI-assisted legal research system that enables more effective and efficient legal research by using AI techniques like large language models (LLMs) to generate summaries and analyze legal documents based on user inputs. The system receives natural language search queries instead of keywords and synthesizes responses using LLMs. It also ranks and quantifies relevance of documents to the query. This provides more accurate and conversational search results compared to traditional keyword-based search engines.

12. AI-Based System for Analyzing Claim Characteristics and Flagging Potential Issues Using Comparative Analysis

INTERNATIONAL BUSINESS MACHINES CORP, 2025

Cognitively identifying potential issues with claims using AI to analyze claim characteristics and compare them to similar claims to determine if an alert is warranted. The AI model identifies similarities and differences between a new claim and past claims to flag potential issues. If the new claim has similar characteristics to ones with issues, it generates an alert notification to the user.

13. Intellectual Property Portfolio Analysis Platform with Similarity Identification and Technical Aspect Clustering

MOAT METRICS INC, 2025

A platform for analyzing intellectual property portfolios of entities by identifying similarities between portfolios and clustering IP assets based on technical aspects. The platform allows users to seed searches based on technical fields, competitor portfolios, etc. to find IP assets similar to their own. It then clusters the assets at varying levels of granularity and generates visual representations of the clusters. This helps users efficiently analyze and compare portfolios, identify gaps and saturation, and assess exposure.

14. Document Search and Analysis System with Search Engine-Based Indexing and Complex Query Interface

PALANTIR TECHNOLOGIES INC, 2025

Searching and analyzing extremely large numbers of documents efficiently, with tools for complex query building, visualization, and publishing results. The system uses a search engine to index all fields of the documents instead of a database for faster searching. The system organizes documents into collections based on format, and allows querying across collections. The interface allows building complex queries, viewing results, flagging, and directly accessing documents. It also generates visualizations and lets sharing/publishing results. This scales for millions of documents by using a search engine instead of a database.

15. System for Automated Patent Claim Analysis Using Stemming and Normalization Techniques

MOAT METRICS INC, 2025

Automated analysis of patent claims to help evaluate relative breadth, identify corresponding products, and find related patents. The system analyzes claims using techniques like stemming and normalization to extract features like unique word counts. It generates claim profiles based on these features and compares them to determine relative breadth. It also searches for products and related patents based on identified elements in the claims.

16. PAI-NET: Retrieval-Augmented Generation Patent Network Using Prior Art Information

kyung yul lee, juho bai - Multidisciplinary Digital Publishing Institute, 2025

Similar patent document retrieval is an essential task that reduces the scope of claimants searches, and numerous studies have attempted to provide automated search services. Recently, Retrieval-Augmented Generation (RAG) based on generative language models has emerged as excellent method for accessing utilizing knowledge environments. RAG-based services offer enhanced ranking performance AI by providing similar queries. However, achieving optimal similarity-based in remains a challenging task, methods similarity do not adequately address characteristics documents. Unlike general retrieval, documents must take into account prior art relationships. To this issue, we propose PAI-NET, deep neural network computing similarities incorporating expert We demonstrate our proposed outperforms current state-of-the-art classification tasks through semantic distance evaluation USPD KPRIS datasets. PAI-NET presents candidates, demonstrating superior improvement 15% over methods.

17. Document Comparison System with Grouping and Shared Difference Highlighting Mechanism

EVERLAW INC, 2025

Efficiently comparing and viewing differences in large sets of documents to speed up document review processes. The method involves identifying shared text with variations across multiple documents, sorting documents into groups based on those variations, and generating a single "shared difference" document that highlights areas of variation between sections of shared text. This allows quickly comparing and navigating through many documents at once instead of reading them one by one.

18. Information Retrieval System Utilizing Voronoi Cell-Based Vector Embedding Indices for Efficient Query Processing

Intuit, Inc., 2025

Large language model (LLM)-based information retrieval for large datasets using indices to improve query response times and scalability. The method involves creating an index of the input text files containing vector embeddings of the text in voronoi cells. When a query comes in, the query embedding is compared to the vectors in the voronoi cell with the closest match to generate a response without needing to search the full text. This allows using a subset of embeddings for querying instead of the entire text. The indices can also be merged and partitioned to further reduce the search space.

19. Overlay Graph-Based Method for Cross-Ontology Knowledge Graph Search

INTERNATIONAL BUSINESS MACHINES CORPORATION, 2025

Searching across multiple ontologic knowledge graphs using an overlay graph to improve efficiency and accuracy. The method involves generating overlay graphs that map entities and relations from multiple source graphs. When a search request comes in, an appropriate overlay graph is selected based on the entity and relation. The search is then executed on the overlay graph, which translates the request into queries for the source graphs. Results are received from the sources and used to respond to the search.

20. Dynamic Search Space Partitioning with Asynchronous Secondary Query Execution

PRODIGO SOLUTIONS INC, 2025

Dynamic indexing and searching technique to improve search efficiency and reduce search times. The search space is split into a primary subset and a secondary subset. When a query is received, only the primary subset is initially searched. If the results are insufficient, a supplemental search is performed on the secondary subset using the same query. This asynchronous secondary search occurs after the primary search and presents additional results. Items from the secondary search can also be moved to the primary subset for future queries. This reduces search times by prioritizing the most relevant subset for initial searches.

21. Patent Document Retrieval Method Utilizing Machine Learning for Claim Element Parsing and Synonym Identification

22. Patent Document Retrieval System Utilizing Metadata and Semantic Conversion with Multi-Method Filtering

23. Patent Management System with Claim Scope Determination and Prior Art Analysis Using Mapping, Mining, and Analytics Techniques

24. Iterative Patent Search Method Using User-Guided Classification and Keyword Refinement

25. Hybrid AI System for Patent Claims Analysis with Machine Learning-Driven Embedding and User-Guided Search Mechanisms

Get Full Report

Access our comprehensive collection of 80 documents related to this technology

Request PDF