DigitalCommons@Kennesaw State University - C-Day Computing Showcase: GRM-043 Performance Assessment of DeepSeek versus Bard and ChatGPT in Detecting Alzheimer’s Dementia

 

Presenter Information

Muhammad Awais ArshadFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

Streaming Media

Document Type

Event

Start Date

15-4-2025 4:00 PM

Description

Alzheimer’s disease is a growing public health issue due to its progressive nature and increasing prevalence. Large language models (LLMs) offer promising avenues for non-invasive cognitive assessment through natural language understanding. In this study, we evaluate DeepSeek’s general-purpose model V3 and reasoning-enhanced R1 variant—for identifying Alzheimer’s dementia (AD) and Cognitively Normal (CN) individuals using transcripts derived from spontaneous speech. Two baseline prompting strategies (zero-shot, chain-of-thought ) were applied to both model types and an additional query (self-consistency prompting) was applied to assess better predictions. Accuracy was the primary performance metric. When positively identifying AD, the general-purpose DeepSeek V3 model produced the highest true positives at 88%, but tended to misclassify CN as AD. In contrast, the DeepSeek-R1 model achieved the highest true negatives at 90% for CN classification. Overall, DeepSeek models surpass chance-level classification, but further refinement is needed before clinical applicability can be ensured.

Share

COinS
 
Apr 15th, 4:00 PM

GRM-043 Performance Assessment of DeepSeek versus Bard and ChatGPT in Detecting Alzheimer’s Dementia

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

Alzheimer’s disease is a growing public health issue due to its progressive nature and increasing prevalence. Large language models (LLMs) offer promising avenues for non-invasive cognitive assessment through natural language understanding. In this study, we evaluate DeepSeek’s general-purpose model V3 and reasoning-enhanced R1 variant—for identifying Alzheimer’s dementia (AD) and Cognitively Normal (CN) individuals using transcripts derived from spontaneous speech. Two baseline prompting strategies (zero-shot, chain-of-thought ) were applied to both model types and an additional query (self-consistency prompting) was applied to assess better predictions. Accuracy was the primary performance metric. When positively identifying AD, the general-purpose DeepSeek V3 model produced the highest true positives at 88%, but tended to misclassify CN as AD. In contrast, the DeepSeek-R1 model achieved the highest true negatives at 90% for CN classification. Overall, DeepSeek models surpass chance-level classification, but further refinement is needed before clinical applicability can be ensured.