Date of Award
Fall 5-8-2025
Degree Type
Dissertation/Thesis
Degree Name
MASTER OF SCIENCE IN COMPUTER SCIENCE
Department
COLLEGE OF COMPUTING AND SOFTWARE ENGINEERING
Committee Chair/First Advisor
Md Abdullah Al Hafiz Khan
Second Advisor
Kazi Aminul Islam
Third Advisor
Sanghoon Lee
Abstract
Multimodal intent recognition, as a key research topic in human-computer interaction, aims to construct precise human intent understanding models by fusing heterogeneous data streams including speech, text, gestures, and facial expressions. However, existing multimodal methods require complex feature extraction and fusion strategies. Current approaches exhibit numerous limitations when processing multimodal fusion in complex scenarios, such as high computational complexity in feature extraction and difficulties in bridging the semantic gap between modalities. Furthermore, multimodal datasets in real-world scenarios often present class imbalance and long-tail distribution characteristics, which further exacerbate the learning challenges for models. To address these challenges, We introduce MMIU, a unified framework for ID classification and OOD detection that synthesizes pseudo-OOD examples by convexly mixing in-distribution data and then learns multimodal representations at two levels. At the coarse level, it enforces a binary separation between ID and OOD; at the fine level, it refines ID-class boundaries by assigning confidence scores that reflect each sample’s difficulty and by applying instance-level contrastive learning to pull similar examples together and push dissimilar ones apart. A human-in-the-loop active-learning module further allows experts to label challenging unlabeled samples during training, triggering iterative retraining and yielding more accurate, robust models. This research will: evaluate the feasibility and effectiveness of the new algorithm, reveal the technical challenges and theoretical significance in this research field, and provide new research paradigms and methodological insights for the academic community.