LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach
Disciplines
Computer Engineering
Abstract (300 words maximum)
Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.
Academic department under which the project should be listed
SPCEET - Electrical and Computer Engineering
Primary Investigator (PI) Name
Yan Fang
LLM-Based Semantic Alignment for Human-Robot Interaction: A Neural-Symbolic AI Approach
Human-robot collaboration requires a seamless understanding of human language and visual perception. This work explores a neural-symbolic AI approach to enable a humanoid robot to interpret spoken commands and align them with visual input. Using a trending light-weight large language model (LLM), DeepSeek R1 deployed on a Raspberry Pi 5 within the Tony Pi humanoid, we integrate speech recognition, symbolic AI, and computer vision to achieve semantic alignment between human instructions and perceived objects. Our current design allows the robot to process human commands and recognize objects through visual features and reasoning symbolically to determine the correct action. This research contributes to exploring and advancing multimodal human-robot interaction on the edge computing platform with limited resources, enabling more natural and efficient communication with intelligent robotic systems.