Gender bias detection for AI developers

Your model
learns
what
you teach it.

Bias infiltrates your AI from two directions — the foundation model you build on, and the data you train with. MARTHA scans both, scores both, and shows you exactly where the problems are. Before a single training run begins.

No setup required — free during beta — built for developers

MARTHASCAN Scanning
Pronoun Distribution
38
Leadership Bias
61
Occupational Stereotyping
44
STEM / Caregiver Roles
67
Competence Framing
55
Gendered Adjectives
82
Parenting Roles
72
Emotional Attribution
49

Most bias tools score your model after the damage is done.

Bias enters from two directions

The foundation model you build on carries its own bias baseline. So does your training data. Most teams don’t measure either before they start building.

Existing tools work downstream

Hugging Face evaluate and IBM AI Fairness 360 measure bias in models you’ve already trained. They tell you what went wrong after the fact — not how to prevent it.

MARTHA works upstream

Detection before training, not measurement after deployment. Scan your data and your foundation model — with the same scoring framework — before a single training run begins.

Three bias infiltration points.
Three tools to stop it.

DataScan and ModelScan use the same 8-dimension scoring framework, so the results are directly comparable. Understand what you’re starting with, and what you’re building on.

What MARTHA measures

The same framework runs across both tools so your training data score and your foundation model score are directly comparable.

01

Pronoun Distribution

Flags datasets or model outputs where pronoun distribution is uneven beyond a measurable threshold across she/her/hers, he/him/his, and they/them/theirs.

02

Gendered Adjectives

Detects when adjectives such as emotional, fragile, or timid skew female while adjectives such as assertive, logical, or dominant skew male.

03

Leadership Bias

Measures how often leadership positions such as CEO, founder, director, and manager roles are attributed to women versus men versus non-binary.

04

Occupational Stereotyping

Catches systematic underrepresentation of female engineers, male nurses, and other counter-stereotype pairings.

05

STEM / Caregiver Roles

Tracks frequency of stereotyped gender roles; where STEM roles such as scientist and programmer default male, and caregiver roles, such as teacher or parent, default female.

06

Competence Framing

Finds phrases that treat competence as the exception — “surprisingly capable for a woman”, “despite being female.”

07

Parenting Roles

Measures how often domestic tasks and childcare are attributed to mothers versus fathers.

08

Emotional Attribution

Detects when emotional language disproportionately clusters around female subjects such as anxious or distressed.

Built to block bias before it begins

Existing tools measure what your model already learned. MARTHA measures what it will learn.

HF evaluate

Hugging Face

Scans training data before training
Probes foundation models
Post-hoc model evaluation
Comparable data + model scores
Row-level flagging with reasons

AI Fairness 360

IBM / Linux Foundation

Scans training data before training
Probes foundation models
Post-hoc model evaluation
Comparable data + model scores
Structured ML / tabular data focus

MARTHA

DataScan + ModelScan

Scans training data before training
Probes foundation models
Same framework across both tools
Comparable data + model scores
Row-level flagging with reasons