Date of Award
Spring 5-10-2024
Degree Type
Thesis
Degree Name
Master of Science in Information Technology
Department
Department of Information Technology
Committee Chair/First Advisor
Shaoen Wu
Second Advisor
Jian Zhang
Third Advisor
Ying Wang
Abstract
The rapid advancement in Deep Learning (DL), especially in Reinforcement Learning (RL) and Imitation Learning (IL), has positioned it as a promising approach for a multitude of autonomous robotic systems. However, the current methodologies are predominantly constrained to singular setups, necessitating substantial data and extensive training periods. Moreover, these methods have exhibited suboptimal performance in tasks requiring long-horizontal maneuvers, such as Radio Frequency Identification (RFID) inventory, where a robot requires thousands of steps to complete.
In this thesis, we address the aforementioned challenges by presenting the Cross-modal Reasoning Model (CMRM), a novel zero-shot Imitation Learning policy, to tackle long-horizontal robotic tasks. The RFID inventory task is a typical long-horizontal robotic task that can be formulated as a Partially Observable Markov Decision Process (POMDP); the robot should be able to recall previous actions and reason from current environmental observations to optimize its strategy. To this end, our CMRM has been designed with a two-stream flow structure to extract abstract information concealed in environmental observations and subsequently generate robot actions by reasoning structural and temporal features from historical and current observations. Extensive experiments in a virtual platform and mockup real store are conducted to evaluate the proposed CMRM. Experimental results demonstrate that CMRM is capable of performing RFID inventory tasks in unstructured environments with complex layouts and provides competitive accuracy that surpasses previous methods and manual inventory. To facilitate the training and assessment of CMRM, we constructed a Unity3D-based virtual platform that can be configured into various environments, like an apparel store. This platform is capable of offering photo-realistic objects and precise physical features (gravities, appearance, and more) to provide close to real environments for training and testing robots. Subsequently, the robot, once trained, was deployed in an actual retail environment to perform RFID inventory tasks. This approach effectively bridges the ``reality gap", enabling the robot to perform the RFID inventory task seamlessly in both virtual and real-world settings, thereby demonstrating zero-shot generalization capabilities.
Included in
Other Computer Engineering Commons, Robotics Commons, Systems and Communications Commons