DigitalCommons@Kennesaw State University - C-Day Computing Showcase: UR-001 Large language model enabled mental health app recommendations using structured datasets

 

Presenter Information

Kris PrasadFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

Streaming Media

Document Type

Event

Start Date

15-4-2025 4:00 PM

Description

The increasing use of large language models (LLMs) in mental health support neces-sitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, dataset-enhanced Gemma 2, and dataset-enhanced GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs emonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve 100% accuracy, compared to baseline models (GPT-4o at 45% and Claude at 70%), while maintaining good diversity and recommending apps with significantly better user ratings. These findings demonstrate that dataset-enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.

Share

COinS
 
Apr 15th, 4:00 PM

UR-001 Large language model enabled mental health app recommendations using structured datasets

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

The increasing use of large language models (LLMs) in mental health support neces-sitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, dataset-enhanced Gemma 2, and dataset-enhanced GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs emonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve 100% accuracy, compared to baseline models (GPT-4o at 45% and Claude at 70%), while maintaining good diversity and recommending apps with significantly better user ratings. These findings demonstrate that dataset-enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.