Gender bias detection for AI developers
Bias infiltrates your AI from two directions — the foundation model you build on, and the data you train with. MARTHA scans both, scores both, and shows you exactly where the problems are. Before a single training run begins.
No setup required — free during beta — built for developers
The problem
Bias enters from two directions
The foundation model you build on carries its own bias baseline. So does your training data. Most teams don’t measure either before they start building.
Existing tools work downstream
Hugging Face evaluate and IBM AI Fairness 360 measure bias in models you’ve already trained. They tell you what went wrong after the fact — not how to prevent it.
MARTHA works upstream
Detection before training, not measurement after deployment. Scan your data and your foundation model — with the same scoring framework — before a single training run begins.
The tools
DataScan and ModelScan use the same 8-dimension scoring framework, so the results are directly comparable. Understand what you’re starting with, and what you’re building on.
Know what bias is in your training data before a single training run begins. Upload a CSV — DataScan scores it across 8 gender bias dimensions and flags exactly what needs attention.
Upload a CSV →
Foundation modelsKnow the bias baseline of any foundation model before you build on it. ModelScan fires structured probes and returns a full gender bias scorecard — no guesswork about what you’re inheriting.
Pick a model →
In Development
User promptsIntercept bias at the user prompt. People bring their own biases to prompting. Recognizing it and intervening up front vastly improves outcomes. PromptScan proactively blocks user bias from carrying through to output.
8 dimensions
The same framework runs across both tools so your training data score and your foundation model score are directly comparable.
Pronoun Distribution
Flags datasets or model outputs where pronoun distribution is uneven beyond a measurable threshold across she/her/hers, he/him/his, and they/them/theirs.
Gendered Adjectives
Detects when adjectives such as emotional, fragile, or timid skew female while adjectives such as assertive, logical, or dominant skew male.
Leadership Bias
Measures how often leadership positions such as CEO, founder, director, and manager roles are attributed to women versus men versus non-binary.
Occupational Stereotyping
Catches systematic underrepresentation of female engineers, male nurses, and other counter-stereotype pairings.
STEM / Caregiver Roles
Tracks frequency of stereotyped gender roles; where STEM roles such as scientist and programmer default male, and caregiver roles, such as teacher or parent, default female.
Competence Framing
Finds phrases that treat competence as the exception — “surprisingly capable for a woman”, “despite being female.”
Parenting Roles
Measures how often domestic tasks and childcare are attributed to mothers versus fathers.
Emotional Attribution
Detects when emotional language disproportionately clusters around female subjects such as anxious or distressed.
Why MARTHA
Existing tools measure what your model already learned. MARTHA measures what it will learn.
HF evaluate
Hugging Face
AI Fairness 360
IBM / Linux Foundation
MARTHA
DataScan + ModelScan
Share feedback
This is an early release. Your feedback shapes what MARTHA will become.