Presentation Type

Article

Location

Kennesaw, Georgia

Start Date

1-4-2026 3:00 PM

End Date

1-4-2026 4:15 PM

Description

Early detection of dementia is essential for timely clinical intervention, yet existing diagnostic procedures remain costly and resource-intensive. Speech-based analysis has emerged as a promising non-invasive alternative due to its sensitivity to cognitive decline. In this work, we propose a multimodal framework that integrates self-supervised speech representations from wav2vec2 with demographic metadata, including age, gender, and ethnicity. We evaluate our approach against a strong audio-only baseline under a controlled experimental setup. Results demonstrate a +12.5% improvement in validation accuracy and consistent gains in macro F1-score. These findings indicate that demographic information provides complementary predictive signals beyond acoustic features alone, highlighting the effectiveness of multimodal approaches for scalable dementia screening.

Share

COinS
 
Apr 1st, 3:00 PM Apr 1st, 4:15 PM

Multimodal Speech-Based Dementia Detection Using wav2vec2 and Demographic Features

Kennesaw, Georgia

Early detection of dementia is essential for timely clinical intervention, yet existing diagnostic procedures remain costly and resource-intensive. Speech-based analysis has emerged as a promising non-invasive alternative due to its sensitivity to cognitive decline. In this work, we propose a multimodal framework that integrates self-supervised speech representations from wav2vec2 with demographic metadata, including age, gender, and ethnicity. We evaluate our approach against a strong audio-only baseline under a controlled experimental setup. Results demonstrate a +12.5% improvement in validation accuracy and consistent gains in macro F1-score. These findings indicate that demographic information provides complementary predictive signals beyond acoustic features alone, highlighting the effectiveness of multimodal approaches for scalable dementia screening.