Location
https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php
Streaming Media
Document Type
Event
Start Date
15-4-2025 4:00 PM
Description
The increasing use of large language models (LLMs) in mental health support neces-sitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, dataset-enhanced Gemma 2, and dataset-enhanced GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs emonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve 100% accuracy, compared to baseline models (GPT-4o at 45% and Claude at 70%), while maintaining good diversity and recommending apps with significantly better user ratings. These findings demonstrate that dataset-enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.
UR-001 Large language model enabled mental health app recommendations using structured datasets
https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php
The increasing use of large language models (LLMs) in mental health support neces-sitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, dataset-enhanced Gemma 2, and dataset-enhanced GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs emonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve 100% accuracy, compared to baseline models (GPT-4o at 45% and Claude at 70%), while maintaining good diversity and recommending apps with significantly better user ratings. These findings demonstrate that dataset-enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.