Presenter Information

William A StigallFollow


Streaming Media

Event Website

Document Type


Start Date

25-4-2024 4:00 PM


Affective computing is a field of growing importance, as human society becomes more integrated with machines. Human feelings are both complex and multi-modal, expressed through various methods and nuances in behavior. In this work we introduce EmoHydra, a multi-modal model created through the fusion of three top-level models fine-tuned on text, vision, and speech respectively. Despite heterogenous heads performing well on the unseen data, as well as generalizing well to other benchmarks, logit concatenation proves to be ineffective at predicting Multimodal data, therefore we implement Multi-Head Attention as our fusion mechanism.


Apr 25th, 4:00 PM

UR-94 EmoHydra: Multimodal Emotion Classification using Heterogenous Modality Fusion

Affective computing is a field of growing importance, as human society becomes more integrated with machines. Human feelings are both complex and multi-modal, expressed through various methods and nuances in behavior. In this work we introduce EmoHydra, a multi-modal model created through the fusion of three top-level models fine-tuned on text, vision, and speech respectively. Despite heterogenous heads performing well on the unseen data, as well as generalizing well to other benchmarks, logit concatenation proves to be ineffective at predicting Multimodal data, therefore we implement Multi-Head Attention as our fusion mechanism.