99 patents in this list

Updated:

Digital assistants process thousands of natural language interactions daily, with response times expected under 200ms and accuracy requirements above 95%. Current systems struggle with context switching, often failing to maintain coherent conversations across multiple domains or properly interpret non-verbal cues that humans naturally process.

The fundamental challenge lies in balancing real-time performance with the deep contextual understanding needed for natural human-AI interactions.

This page brings together solutions from recent research—including multi-context word prediction systems, eye-behavior responsive agents, domain-specific response modes, and sensor-enhanced natural language processing. These and other approaches focus on creating more intuitive and efficient interactions while maintaining processing speed and reliability.

1. Machine Learning Models for Generating Conversational Dialogue with Structured Data Training

NVIDIA Corporation, 2024

Generating more conversational and varied dialogue responses for AI systems using machine learning models trained on structured data. The models are trained on queries and corresponding field values from multiple domains, rather than fixed templates. This allows the models to generate flexible and varied responses that incorporate the queried fields. The models are trained using variational sample responses with differing syntax to improve conversational flow.

US2024184991A1-patent-drawing

2. Method for Query Response Generation Using Combined Structured and Unstructured Data in Conversational AI Systems

NVIDIA CORP, 2024

Generating query responses using combined structured and unstructured data for conversational AI systems and applications. The method involves generating context data using both structured data (converted to unstructured form) and unstructured data from a knowledge base. This context data is then used by neural networks to extract requested information from requests and generate responses. By combining structured and unstructured data, it enables more accurate and contextualized responses compared to using just structured data.

3. Neural Network-Based Contextual Data Generation Using Combined Structured and Unstructured Data Inputs

NVIDIA CORP, 2024

Generating contextual data using structured and unstructured data for conversational AI systems and applications. The method involves combining structured and unstructured data to generate contextual information for AI applications like chatbots. It uses neural networks to generate context data that contains a mix of text generated from both structured and unstructured sources. This allows the AI to better interpret and respond to requests that involve both types of data. The structured data is converted to unstructured form for consistency in the generated context.

4. Context-Adaptive Voice Assistant System with Role-Specific Model Integration

三星电子株式会社, SAMSUNG ELECTRONICS CO LTD, 2024

Role-specific voice assistants that can adapt their responses based on the context of a request. The voice assistant program receives user input containing a request and a role indicator. It uses a role-specific model associated with that role to generate a tailored response to the request. The role models are provided to the assistant by external devices. This allows customization of the assistant's responses for different contexts, like a doctor role, lawyer role, etc.

CN110930994B-patent-drawing

5. Dialogue System Utilizing Language Generation and Response Selection Neural Networks with Rule Violation Detection

DEEPMIND TECH LTD, DEEPMIND TECHNOLOGIES LTD, 2024

A method for enabling users to obtain information through dialogue with an agent using language generation neural networks and search. The dialogue system involves a trained language generation neural network, like GPT-3, to understand and respond to user requests. The system can also leverage a trained response selection neural network to generate more aligned responses. It can also have a trained rule violation detection neural network to enforce rules during the dialogue. The dialogue system processes user requests using the language generation network, optionally incorporating external search results, and generates responses using the trained networks.

6. Voice-Activated Question Answering System with Transformer-Based Speech-to-Text and Semantic Mapping for Knowledge Graph Integration

NORTH CHINA UNIV OF TECHNOLOGY, NORTH CHINA UNIVERSITY OF TECHNOLOGY, 2024

A voice-based intelligent question answering system that leverages big data analysis to provide accurate and relevant answers to user questions. The system converts user speech to text using a transformer model, then maps the user's question and the answer knowledge graph to a vector space using a semantic mapping module. This allows the system to better understand and match user intents with appropriate answers drawn from the rich and diverse knowledge graph. The knowledge graph is constructed by collecting structured and unstructured data from various sources, processing it with neural networks, and generating a comprehensive and interconnected answer knowledge base.

7. Neural Network System for Context-Aware Text Prediction with Multi-Network Architecture

Apple Inc., 2023

Efficiently generating word and phrase predictions for intelligent automated assistants like digital assistants. The method involves using a neural network system with three separate neural networks to generate context-relevant text predictions. The first neural network extracts the context, the second determines text predictions, and the third checks relevance. When confidence scores exceed a threshold, the predictions are provided. This allows context-aware text completion with words and phrases instead of just words.

US20230376690A1-patent-drawing

8. Virtual Agent Action Control System Utilizing Eye Tracking and Scene Information Integration

APPLE INC., 2023

Controlling a virtual agent's actions based on eye behavior of the user and scene information to provide a more intuitive and natural way to interact with virtual agents. The system uses eye tracking to monitor eye movements and updates the virtual agent's appearance or actions accordingly. For example, if the user's gaze shifts to an object, the virtual agent may move towards it or focus its gaze on it. The scene information is also considered to determine appropriate responses. This allows the virtual agent to respond intuitively to the user's eye behavior without requiring explicit user input or hand movements.

US11822716B2-patent-drawing

9. Speech Input Intent Classification Using Multimodal Analysis for Intelligent Digital Assistants

Apple Inc., 2023

Determining whether a speech input is intended for an intelligent digital assistant. The method involves analyzing various factors like textual representation, acoustic features, user gaze, etc., to score the likelihood that the speech input is for the assistant. If the scores indicate high confidence, the assistant processes the input, else it's treated as regular user input. This prevents misinterpretation of spontaneous speech as assistant commands.

10. Voice-Activated Shortcut Registration System for Application Installation and Data Processing

Apple Inc., 2023

Automatic registration of voice-activated shortcuts for application features during installation or after receiving certain types of data. When installing an app, the installation file is scanned for voice shortcuts. These shortcuts are registered with the device's vocabulary engine. When the user says a shortcut, the app executes the associated action. This enables immediate access to app features through voice commands without manual setup.

11. Context-Aware User Input Suggestion System for Digital Assistant Task Execution

Apple Inc., 2023

Providing suggested user inputs for triggering digital assistant tasks. The method involves receiving a user input requesting tasks, analyzing the context of the request, and generating a textual representation of an utterance for performing a task. This textual representation is then displayed as an affordance over the user interface to suggest what the user could say to the digital assistant to execute the task. This helps users discover and remember tasks they can request from the digital assistant.

US20230359334A1-patent-drawing

12. Context-Dependent Non-Verbal Audio Response System for Digital Assistants

Apple Inc., 2023

Non-verbal audio responses from digital assistants to natural language inputs, where the response is adjusted based on task context. If criteria are not satisfied, the response includes both an audio indication of the task and a verbal response. But if criteria are satisfied, only the audio indication is provided without the verbal response. This allows silent confirmation when appropriate, like when a long-running task completes, to avoid repetitive verbal feedback. The criteria could be task-specific, like completion status, or user preference.

13. Automated Assistant System with Domain-Specific Response Mode Selection

Apple Inc., 2023

Intelligent automated assistant that provides different response modes to user requests based on the type of domain in the request. The assistant selects a response mode from a set of options, each corresponding to a set of domains. For example, when a request is in a finance domain, the assistant might provide detailed text responses with financial data. But for a request in a weather domain, the assistant might just say "It's going to rain today" and not provide text. This allows tailoring the response type (e.g., audio vs text) and affordances (e.g., buttons vs lists) to match the user's intent and expected response format.

US20230352014A1-patent-drawing

14. Multicontextual Word Prediction System with Weighted Probability Integration for Enhanced Accuracy

Apple Inc., 2023

Word prediction for devices like smartphones that improves accuracy by integrating multiple contexts. The system receives a plurality of words and obtains two contexts: one based on the received words and another based on the received words plus an additional context. It calculates separate word probabilities using language models for each context. Then it combines these probabilities using weights to generate a final prediction. This allows better handling of context-specific outliers and multilingual code switching compared to simple language models.

15. Method for Parsing and Analyzing Combined Natural Language and Sensor Data Inputs Using Node-Based Structure

Apple Inc., 2023

Processing natural language requests using natural language input and input from sensors to improve understanding of user intent. The method involves parsing both the natural language input and sensor data into nodes of a structure. The nodes are analyzed together to determine user intent. By combining natural language understanding with sensor data analysis, it allows a digital assistant to better understand user requests that cannot be fully determined from natural language alone.

US11783815B2-patent-drawing

16. System for Predictive Suggestion of Subsequent User Actions in Digital Assistant Interactions

Apple Inc., 2023

Providing suggested subsequent user actions during a conversation between a user and a digital assistant to increase efficiency by anticipating and presenting options. The digital assistant determines possible next actions based on the current request, selects the most likely one, and suggests it to the user. This is done by analyzing the request domain and determining parameters for multiple potential subsequent actions. If one action's score is higher than another's, it's selected as the suggested action.

17. Virtual Assistant System with Machine Learning-Based Entity, Intent, and Context Extraction

Carvana, LLC, 2023

An AI-powered virtual assistant that can handle complex requests and provide timely and relevant responses in automated conversations. The assistant uses machine learning models to extract entities, intent, and context from user messages. It generates responses based on the extracted information and conversation state. The assistant can also change the conversation state based on rules. This allows it to handle variations from a script and provide personalized and coherent conversations.

US11777874B1-patent-drawing

18. Digital Assistant Interaction Modulation Based on Social Engagement Detection

Apple Inc., 2023

Real-time social intelligence for accommodating social engagements of a user without interruption from a digital assistant. The system determines if the user is engaged in social interactions based on factors like detecting nearby people, user gaze, and speech inputs. If so, it foregoes providing outputs from the digital assistant during that time. This allows users to socially engage without having to manually pause the assistant.

19. Modular Computer System with Layered Architecture for Contextual Input Processing

NOS INOVACAO, S.A., 2023

A computer system for handling complex tasks provided by users via natural input interfaces, like voice commands or gestures, using a modular, loosely coupled architecture with layers dedicated to interpreting and managing contextual information from user input. The layers include device, conversational, ambient awareness, core conversational management, multi-skill, cognitive enhancement, knowledge, event dispatch, data seeding, and health monitoring. This architecture allows increasingly complex interactions with users while improving decisioning and response times through contextual enrichment.

20. Machine Assistant Response Generation Based on User Interaction Style Analysis

APPLE INC., 2023

Generating responses from a machine assistant that engenders greater user confidence and reduces unnecessary user inputs by basing the response style on the user's interaction style. The machine assistant analyzes user input to determine the interaction style, like word choice and speech characteristics, and uses that to generate responses that match the user's style. This improves user trust in the assistant compared to more dissimilar responses. By mimicking the user's style, the assistant reduces the need for users to verify the responses, saving resources and improving privacy/safety.

US11769016B2-patent-drawing

21. System for Context Aggregation and Task Delegation via Centralized Context Collector in Distributed Device Network

22. Media Item Description Length Adjustment Based on Confidence Level in Intelligent Assistants

23. Token Sequence Expansion and Metadata-Based Selection for Disambiguating User Inputs in Application Interfaces

24. Digital Assistant Interface with Multi-Stage Speech Analysis and Dynamic Interruption-Correction Mechanism

25. Natural Language Processing-Based Task Management System with Adaptive Endpoint Integration

Request the full report with complete details of these

+79 patents for offline reading.