Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
Document Type
Event
Start Date
19-11-2024 4:00 PM
Description
This study evaluates the effectiveness of LLMs in supporting mental health applications by analyzing their performance in understanding and categorizing user (mental health-related) inputs. We collected data from various mental health apps on the Google Play Store, including user reviews and app descriptions, and filtered content using a targeted mental health keyword bank. Sentiment analysis and keyword similarity scores were generated for reviews using RoBERTa-based models, this showed us how each review aligned with the mental health keywords advertised by the app and how users felt about the app. We prompted four modern LLMs: GPT-4o, Claude 3.5 Sonnet, Gemma 2, and GPT-3.5-Turbo. We provided Gemma 2 and GPT-3.5-Turbo with our dataset for more informed outputs. Our prompts consisted of five common mental health conditions (depression, anxiety, ADHD, PTSD, and insomnia) and we asked for the models to provide us with up to five app recommendations. The results showed that our data-enhanced LLMs noticeably outperformed the other state-of-the-art LLMs in accuracy, quality, and variety of outputs while being much more cost-effective. This suggests that data-enhanced, low-cost LLMs can serve as an effective alternative to newer, more powerful, and more expensive models, achieving notably better results in interpreting nuanced text for mental health applications.
Included in
UR-172 A Comparative Study of LLM Effectiveness in Mental Health Assistance
https://www.kennesaw.edu/ccse/events/computing-showcase/fa24-cday-program.php
This study evaluates the effectiveness of LLMs in supporting mental health applications by analyzing their performance in understanding and categorizing user (mental health-related) inputs. We collected data from various mental health apps on the Google Play Store, including user reviews and app descriptions, and filtered content using a targeted mental health keyword bank. Sentiment analysis and keyword similarity scores were generated for reviews using RoBERTa-based models, this showed us how each review aligned with the mental health keywords advertised by the app and how users felt about the app. We prompted four modern LLMs: GPT-4o, Claude 3.5 Sonnet, Gemma 2, and GPT-3.5-Turbo. We provided Gemma 2 and GPT-3.5-Turbo with our dataset for more informed outputs. Our prompts consisted of five common mental health conditions (depression, anxiety, ADHD, PTSD, and insomnia) and we asked for the models to provide us with up to five app recommendations. The results showed that our data-enhanced LLMs noticeably outperformed the other state-of-the-art LLMs in accuracy, quality, and variety of outputs while being much more cost-effective. This suggests that data-enhanced, low-cost LLMs can serve as an effective alternative to newer, more powerful, and more expensive models, achieving notably better results in interpreting nuanced text for mental health applications.