Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
Document Type
Event
Start Date
19-11-2024 4:00 PM
Description
This study leverages large language models (LLMs), particularly GPT-4, to overcome the data limitations often encountered in Alzheimer’s detection. We utilize GPT-4 for data augmentation, generating synthetic speech transcripts to enhance machine learning model training. Our approach combines fine-tuned BERT embeddings with CLAN-derived linguistic features, as well as sentence-level embeddings, to improve classification performance on the ADReSS2020 dataset. BERT and CLAN features capture detailed linguistic variants, while sentence embeddings offer robust semantic representations, collectively enhancing the accuracy and generalization of the models. Among the classifiers tested, the Random Forest model shows the best performance, achieving an accuracy of 88% with sentence embeddings, surpassing other models in detecting Alzheimer’s from speech patterns. The integration of LLM-augmented data and multilevel embeddings presents a promising solution to the data scarcity issue in medical research, enabling more accurate and reliable Alzheimer’s diagnoses.
Included in
GMR-7179 Improving Alzheimer’s Detection via Synthetic Data Generation Using GPT-4 and Multi-Level Embeddings
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
This study leverages large language models (LLMs), particularly GPT-4, to overcome the data limitations often encountered in Alzheimer’s detection. We utilize GPT-4 for data augmentation, generating synthetic speech transcripts to enhance machine learning model training. Our approach combines fine-tuned BERT embeddings with CLAN-derived linguistic features, as well as sentence-level embeddings, to improve classification performance on the ADReSS2020 dataset. BERT and CLAN features capture detailed linguistic variants, while sentence embeddings offer robust semantic representations, collectively enhancing the accuracy and generalization of the models. Among the classifiers tested, the Random Forest model shows the best performance, achieving an accuracy of 88% with sentence embeddings, surpassing other models in detecting Alzheimer’s from speech patterns. The integration of LLM-augmented data and multilevel embeddings presents a promising solution to the data scarcity issue in medical research, enabling more accurate and reliable Alzheimer’s diagnoses.