America/Chicago
ProjectsDecember 1, 2024

Mental Health LLM Evaluation

image
This research project evaluates Large Language Models (LLMs) for mental health applications, with a comprehensive focus on bias and fairness across different demographic groups and clinical scenarios.
  • Bias Detection: Systematically identify and measure bias in LLM responses related to mental health topics
  • Fairness Assessment: Evaluate fairness across various demographic groups including age, gender, ethnicity, and socioeconomic status
  • Clinical Accuracy: Assess the accuracy and appropriateness of mental health-related responses
  • Safety Evaluation: Examine potential risks and safety concerns in mental health applications
  • Dataset Creation: Curated comprehensive test sets covering various mental health scenarios and demographic contexts
  • Multi-Model Evaluation: Tested multiple state-of-the-art LLMs including GPT-4, Claude, and specialized mental health models
  • Bias Metrics: Applied established bias detection metrics and developed custom evaluation criteria for mental health contexts
  • Expert Review: Collaborated with mental health professionals for clinical validation
  • Python: Primary programming language for data analysis and model evaluation
  • Machine Learning Libraries: PyTorch, Transformers, scikit-learn for model interaction and analysis
  • Statistical Analysis: Advanced statistical methods for bias measurement and significance testing
  • Visualization: Custom dashboards for presenting bias patterns and evaluation results
The research revealed significant bias patterns across different demographic groups and highlighted the importance of careful consideration when deploying LLMs in mental health contexts. The findings contribute to the development of more responsible AI systems in healthcare. This work contributes to the growing body of research on responsible AI in healthcare, providing frameworks and metrics that can be used by other researchers and practitioners working on mental health AI applications.

Related projects

F1 AI Race Predictor

F1 AI Race Predictor

Built an end-to-end race prediction platform using historical race data, weather, driver performance, and qualifying results, achieving 68.5% accuracy
Portfolio Optimization Dashboard

Portfolio Optimization Dashboard

Designed a full-stack investment optimization system supporting strategies like Markowitz, Black-Litterman, and Risk Parity, with real-time analytics dashboards
Nashville Airbnb Data Analysis

Nashville Airbnb Data Analysis

Enhancing InsideAirbnb.com with Predictive Analytics on Nashville Listings: A Data-Driven Approach to Price and Rating Predictions
Building Once UI, a Customizable Design System

Building Once UI, a Customizable Design System

Development of a flexible and highly customizable design system using Next.js for front-end and Figma for design collaboration.
Once UI: Open-source design system

Once UI: Open-source design system

Automating Design Handovers with a Figma to Code Pipeline

Automating Design Handovers with a Figma to Code Pipeline

Explore the enduring debate between using spaces and tabs for code indentation, and why this choice matters more than you might think.