LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach

Primary Investigator (PI) Name

Yan Fang

Department

SPCEET - Electrical and Computer Engineering

Abstract

Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.

Disciplines

Computer Engineering

This document is currently not available here.

Share

COinS
 

LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach

Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.