Presentation Type
Article
Location
Kennesaw, Georgia
Start Date
1-4-2026 3:00 PM
End Date
1-4-2026 4:15 PM
Description
Early detection of dementia is essential for timely clinical intervention, yet existing diagnostic procedures remain costly and resource-intensive. Speech-based analysis has emerged as a promising non-invasive alternative due to its sensitivity to cognitive decline. In this work, we propose a multimodal framework that integrates self-supervised speech representations from wav2vec2 with demographic metadata, including age, gender, and ethnicity. We evaluate our approach against a strong audio-only baseline under a controlled experimental setup. Results demonstrate a +12.5% improvement in validation accuracy and consistent gains in macro F1-score. These findings indicate that demographic information provides complementary predictive signals beyond acoustic features alone, highlighting the effectiveness of multimodal approaches for scalable dementia screening.
Multimodal Speech-Based Dementia Detection Using wav2vec2 and Demographic Features
Kennesaw, Georgia
Early detection of dementia is essential for timely clinical intervention, yet existing diagnostic procedures remain costly and resource-intensive. Speech-based analysis has emerged as a promising non-invasive alternative due to its sensitivity to cognitive decline. In this work, we propose a multimodal framework that integrates self-supervised speech representations from wav2vec2 with demographic metadata, including age, gender, and ethnicity. We evaluate our approach against a strong audio-only baseline under a controlled experimental setup. Results demonstrate a +12.5% improvement in validation accuracy and consistent gains in macro F1-score. These findings indicate that demographic information provides complementary predictive signals beyond acoustic features alone, highlighting the effectiveness of multimodal approaches for scalable dementia screening.