LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach

Disciplines

Computer Engineering

Abstract (300 words maximum)

Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.

Academic department under which the project should be listed

SPCEET - Electrical and Computer Engineering

Primary Investigator (PI) Name

Yan Fang

This document is currently not available here.

Share

COinS
 

LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach

Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.