ProjectsDecember 1, 2024

Mental Health LLM Evaluation

Overview

This research project evaluates Large Language Models (LLMs) for mental health applications, with a comprehensive focus on bias and fairness across different demographic groups and clinical scenarios.

Key Objectives

Bias Detection: Systematically identify and measure bias in LLM responses related to mental health topics
Fairness Assessment: Evaluate fairness across various demographic groups including age, gender, ethnicity, and socioeconomic status
Clinical Accuracy: Assess the accuracy and appropriateness of mental health-related responses
Safety Evaluation: Examine potential risks and safety concerns in mental health applications

Methodology

Dataset Creation: Curated comprehensive test sets covering various mental health scenarios and demographic contexts
Multi-Model Evaluation: Tested multiple state-of-the-art LLMs including GPT-4, Claude, and specialized mental health models
Bias Metrics: Applied established bias detection metrics and developed custom evaluation criteria for mental health contexts
Expert Review: Collaborated with mental health professionals for clinical validation

Technologies Used

Python: Primary programming language for data analysis and model evaluation
Machine Learning Libraries: PyTorch, Transformers, scikit-learn for model interaction and analysis
Statistical Analysis: Advanced statistical methods for bias measurement and significance testing
Visualization: Custom dashboards for presenting bias patterns and evaluation results

Key Findings

The research revealed significant bias patterns across different demographic groups and highlighted the importance of careful consideration when deploying LLMs in mental health contexts. The findings contribute to the development of more responsible AI systems in healthcare.

Impact

This work contributes to the growing body of research on responsible AI in healthcare, providing frameworks and metrics that can be used by other researchers and practitioners working on mental health AI applications.

Related projects

F1 AI Race Predictor

Built an end-to-end race prediction platform using historical race data, weather, driver performance, and qualifying results, achieving 68.5% accuracy

Read case study

Portfolio Optimization Dashboard

Designed a full-stack investment optimization system supporting strategies like Markowitz, Black-Litterman, and Risk Parity, with real-time analytics dashboards

Read case study

Nashville Airbnb Data Analysis

Enhancing InsideAirbnb.com with Predictive Analytics on Nashville Listings: A Data-Driven Approach to Price and Rating Predictions

Read case study

Building Once UI, a Customizable Design System

Development of a flexible and highly customizable design system using Next.js for front-end and Figma for design collaboration.

Read case study

Once UI: Open-source design system

Automating Design Handovers with a Figma to Code Pipeline

Explore the enduring debate between using spaces and tabs for code indentation, and why this choice matters more than you might think.

Read case study View project