DigitalCommons@Kennesaw State University - C-Day Computing Showcase: GRM-131 XR Agent (A MLLM powered XR system)

 

Presenter Information

Yukang ShenFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

Streaming Media

Event Website

https://team-portal-plum.vercel.app/home

Document Type

Event

Start Date

15-4-2025 4:00 PM

Description

This project proposes “XR Agent”, a uncoupled and efficient framework for developing AI-powered extended reality (XR) applications on head-mounted displays (HMDs). Leveraging multimodal artificial intelligence—including MediaPipe(Google open-source CV Model) for computer vision (object segmentation, recognition, pose estimation), multimodal large language models (MLLMs) like Gemini, and Unity’s cross-platform XR development ecosystem—the framework aims to create an extensible base system that enables rapid prototyping and deployment of intelligent XR applications. Currently, it was deployed on the Meta Quest 3 platform, XR Agent explores novel HCI(Human Computer Interaction) paradigms, combining real-time sensor data processing, immersive visualization, and adaptive AI-driven logic. This work addresses challenges modular integration of various different kinds of devices AI models. The framework also will be valuable through use cases in collaborative remote control, immersive training scenarios, and data collection for embodied AI.

Share

COinS
 
Apr 15th, 4:00 PM

GRM-131 XR Agent (A MLLM powered XR system)

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

This project proposes “XR Agent”, a uncoupled and efficient framework for developing AI-powered extended reality (XR) applications on head-mounted displays (HMDs). Leveraging multimodal artificial intelligence—including MediaPipe(Google open-source CV Model) for computer vision (object segmentation, recognition, pose estimation), multimodal large language models (MLLMs) like Gemini, and Unity’s cross-platform XR development ecosystem—the framework aims to create an extensible base system that enables rapid prototyping and deployment of intelligent XR applications. Currently, it was deployed on the Meta Quest 3 platform, XR Agent explores novel HCI(Human Computer Interaction) paradigms, combining real-time sensor data processing, immersive visualization, and adaptive AI-driven logic. This work addresses challenges modular integration of various different kinds of devices AI models. The framework also will be valuable through use cases in collaborative remote control, immersive training scenarios, and data collection for embodied AI.

https://digitalcommons.kennesaw.edu/cday/Spring_2025/Masters_Research/20