Disciplines

Computer Sciences

Abstract (300 words maximum)

The increasing use of large language models (LLMs) in mental health support necessitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, and dataset-enhanced Gemma 2 and GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs demonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve up to 55% higher accuracy than baseline models while recommending apps with significantly better user ratings. Dataset-enhanced models maintained perfect accuracy while preserving recommendation diversity and quality. These findings demonstrate that strategically enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.

Academic department under which the project should be listed

CCSE - Computer Science

Primary Investigator (PI) Name

Md Abdullah Al Hafiz Khan

Share

COinS
 

Large language model enabled mental health app recommendations using structured datasets

The increasing use of large language models (LLMs) in mental health support necessitates detailed evaluation of their recommendation capabilities. This study compares four modern LLMs—GPT-4o, Claude 3.5 Sonnet, and dataset-enhanced Gemma 2 and GPT-3.5-Turbo—in recommending mental health applications. We constructed a structured dataset of 55 mental health apps using RoBERTa-based sentiment analysis and keyword similarity scoring, focusing on depression, anxiety, ADHD, and insomnia. Standard LLMs demonstrated inconsistent accuracy and often relied on outdated or generic information. In contrast, our retrieval-augmented generation (RAG) pipeline enabled lower-cost models to achieve up to 55% higher accuracy than baseline models while recommending apps with significantly better user ratings. Dataset-enhanced models maintained perfect accuracy while preserving recommendation diversity and quality. These findings demonstrate that strategically enhanced, cost-effective LLMs can outperform expensive proprietary models in domain-specific applications like mental health resource recommendations, potentially improving accessibility to quality mental health support tools.