Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php

Streaming Media

Document Type

Event

Start Date

19-11-2024 4:00 PM

Description

This study leverages large language models (LLMs), particularly GPT-4, to overcome the data limitations often encountered in Alzheimer’s detection. We utilize GPT-4 for data augmentation, generating synthetic speech transcripts to enhance machine learning model training. Our approach combines fine-tuned BERT embeddings with CLAN-derived linguistic features, as well as sentence-level embeddings, to improve classification performance on the ADReSS2020 dataset. BERT and CLAN features capture detailed linguistic variants, while sentence embeddings offer robust semantic representations, collectively enhancing the accuracy and generalization of the models. Among the classifiers tested, the Random Forest model shows the best performance, achieving an accuracy of 88% with sentence embeddings, surpassing other models in detecting Alzheimer’s from speech patterns. The integration of LLM-augmented data and multilevel embeddings presents a promising solution to the data scarcity issue in medical research, enabling more accurate and reliable Alzheimer’s diagnoses.

Share

COinS
 
Nov 19th, 4:00 PM

GMR-7179 Improving Alzheimer’s Detection via Synthetic Data Generation Using GPT-4 and Multi-Level Embeddings

https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php

This study leverages large language models (LLMs), particularly GPT-4, to overcome the data limitations often encountered in Alzheimer’s detection. We utilize GPT-4 for data augmentation, generating synthetic speech transcripts to enhance machine learning model training. Our approach combines fine-tuned BERT embeddings with CLAN-derived linguistic features, as well as sentence-level embeddings, to improve classification performance on the ADReSS2020 dataset. BERT and CLAN features capture detailed linguistic variants, while sentence embeddings offer robust semantic representations, collectively enhancing the accuracy and generalization of the models. Among the classifiers tested, the Random Forest model shows the best performance, achieving an accuracy of 88% with sentence embeddings, surpassing other models in detecting Alzheimer’s from speech patterns. The integration of LLM-augmented data and multilevel embeddings presents a promising solution to the data scarcity issue in medical research, enabling more accurate and reliable Alzheimer’s diagnoses.