Semester of Graduation

Spring 2026

Degree Type

Thesis

Degree Name

Master of Science in Information Technology

Department

Information Technology - College of Computing and Software Engineering

Committee Chair/First Advisor

Honghui Xu

Abstract

Multimodal large language models (MLLMs) increasingly process screenshots, scanned documents, diagrams, and other visually grounded inputs. This capability creates a safety risk because many multimodal jailbreaks are not harmful in the prompt or image alone. Harm can emerge only after the model binds a benign-looking operation, such as summarizing or translating, to a localized visual target. This thesis studies this reference-dependent failure mode and argues that the security-relevant unit is the grounded operation–target pair rather than the whole prompt–image pair. To address this problem, it proposes COMIC, a reference-aware pre-generation safety gate for MLLMs. COMIC infers the requested operation and reference type, constructs candidate targets from OCR and open-vocabulary visual proposals, grounds plausible referents, and evaluates safety before generation. Its routing rule combines conservative maximum-risk aggregation with evidence-quality checks, so ambiguous or weakly grounded requests are not treated as automatically safe. Evaluation across representative open-source MLLMs, localized jailbreak benchmarks, broader multimodal attacks, and benign reference-sensitive settings shows that COMIC substantially reduces attack success while preserving practical benign utility and runtime efficiency. These findings show that multimodal safety mechanisms should intervene at the point where user intent becomes grounded action, before unsafe generation can occur.

Master's Theses

COMIC: Reference-Aware Safety Gating for Multimodal Large Language Models

Semester of Graduation

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Abstract

Included in

Search

Authors

Browse

Useful Links

Master's Theses

COMIC: Reference-Aware Safety Gating for Multimodal Large Language Models

Author

Semester of Graduation

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Abstract

Included in

Share

Search

Authors

Browse

Useful Links