Senior Technical Program Manager (TPM) at Mpathic
Remote / Seattle, WA, US

Our Story

mpathic is building the future of empathetic, trustworthy AI. Grounded in behavioral science and human-centered design, our technology delivers AI systems that are safe, aligned, and emotionally intelligent. As enterprises race to adopt AI, we believe the companies that win will be those that build trust first.

We are building a high-quality AI Safety Team to evaluate and strengthen advanced AI systems. Our work focuses on making models reliable, auditable, and scalable—so safety work can move fast without relying on heroics or sacrificing quality.

 

Position Overview

We’re looking for a Senior Technical Project Manager (TPM) who is also a licensed or clinically trained expert, with experience in AI safety, evaluation, or behavioral science research. This role is ~50% delivery leadership, ~50% clinical evaluation & quality. This role does not provide therapy, crisis intervention, or on-call clinical care

 

In this role, you will lead end-to-end delivery of psychologically grounded AI safety evaluation projects, ranging from data generation to safety rubric development to red teaming—ensuring that advanced models respond appropriately when users discuss stressful life events, crises, trauma, daily hassles, and other sensitive experiences.

You’ll operate at the intersection of:

  • Clinical judgment
  • AI evaluation rigor
  • Annotation and rubric design
  • Qualitative data analysis
  • Operational excellence
  • Cross-functional technical delivery

This is a full-time, remote role (US preferred), reporting to the Head of AI Safety or Evaluation Programs/GM.

What You’ll Accomplish

In your first 60–90 days you’ll…

  • Develop and maintain a reliable delivery engine for clinically grounded AI evaluation projects
  • Build workflows for rubric development, labeling, review, and escalation
  • Establish quality and agreement standards across expert raters
  • Ship 1–2 concrete improvements to evaluation ops, tooling, or QA systems
  • Successfully deliver an initial high-signal pilot evaluation (e.g., stressful life events conversations)
  • Demonstrate calibrated judgment aligned with mpathic’s safety philosophy

In your first year, you’ll…

  • Own multiple concurrent AI safety evaluation programs across customers and models
  • Document learnings from ongoing projects to continuously improve and iterate on the team’s capacity to deliver the high-level AI safety possible
  • Develop scalable clinical evaluation playbooks for sensitive user contexts
  • Improve throughput, rigor, and auditability with strong metrics
  • Become a connective leader across Clinical Experts, QA, Product, and Engineering
  • Help shape mpathic’s long-term approach to emotionally grounded AI safety
  • Balance rigor, speed, and customer needs in real-world delivery

 

You’ll Thrive in This Role If You…

Have 5+ years of experience in one or more of:

  • Technical project/program management and team leadership experience
  • Clinical research experience and/or report writing
  • Organize and analyze qualitative data, in addition to synthesizing key findings
  • Train and coach clinicians to engage in high-level data analysis with the end goal of developing safety rubrics and evaluation systems
  • Navigate evolving rubrics, edge cases, and judgment callsAI evaluation or annotation programs
  • Trust & safety, human-centered AI, or behavioral science workflows

And you bring clinical expertise such as:

  • Licensed clinician (LCSW, LMFT, PsyD, PhD, MD) or equivalent applied experience
  • Expertise in qualitative data, annotation and/or evaluating clinical conversations for safety and mental health content
  • Deep familiarity with psychological stress, trauma, crisis response, or mental health frameworks
  • Ability to evaluate model behavior in emotionally complex, high-stakes conversations

You are skilled at:

  • Designing rubrics that capture nuanced human and clinical judgment
  • Running expert labeling efforts with calibration and reliability
  • Translating ambiguous safety questions into structured evaluation plans
  • Building scalable systems instead of relying on heroics
  • Communicating clearly through specs, dashboards, and structured reporting
  • Helping other clinicians apply shared standards

What You’ll Do

Own Clinically Grounded AI Evaluation Delivery (Core)

  • Lead evaluation projects from kickoff → rubric → execution → QA → final reporting
  • Define milestones, timelines, and delivery SLAs
  • Manage delivery risk in sensitive, high-stakes domains
  • Ensure outputs are rigorous, clinically appropriate, and customer-ready
  • Evaluate safety, appropriateness, and response quality—not clinical outcomes
  • Familiarity with scope, limits, and responsible handling of sensitive content

Design Rubrics & Annotation Protocols

  • Partner with mpathic behavioral experts to define rating criteria
  • Build labeling guides for psychological state, intent, appropriateness, and harm risk
  • Ensure rubrics balance clinical nuance with operational usability

Drive Quality & Calibration Systems

  • Implement review tiers: peer review → QA → escalation
  • Run inter-rater reliability and disagreement reduction workflows
  • Maintain gold sets and drift monitoring over time

Tooling & Workflow Ownership

Translate evaluation needs into requirements for internal tooling:

  • Work queues and labeling interfaces
  • Audit trails and version control
  • Taxonomies for stress, crisis, and safety response categories
  • Reporting automation and reproducibility pipelines

Metrics & Operational Excellence

Own evaluation program dashboards including:

  • Throughput per expert per day
  • Agreement and defect rates
  • Cycle time and on-time delivery
  • Safety actionability metrics (e.g., customer acceptance, mitigation impact)

Customer & Stakeholder Execution Support

  • Help scope clinically sensitive evaluation engagements
  • Translate customer priorities into clear execution plans
    Present findings with clinical and operational clarity

About the Team

You’ll collaborate closely with:

  • Clinical & Behavioral Science Experts — rubric design, psychological grounding
  • QA / Evaluation Leadership — calibration, review systems, drift monitoring
  • Engineering / Product — tooling, automation, evaluation pipelines
  • Customer Delivery — scoping, results, renewals

 

We value calm execution, clinical rigor, operational excellence, and scalable systems over fragile heroics.

 

Compensation & Benefits

  • Base Salary (US): $140,000–$200,000 (band depends on seniority, scope, and number of customer programs or pods owned)
  • Equity: Yes
  • Benefits: We offer 100% company-funded health, dental, and vision insurance for full-time employees. Additionally, we offer 401k, well-being programs, and flexible paid-time off.
  • Remote-first
  • Mission-driven work focused on AI safety, trust, and operational rigor

 

Apply Even If You Don’t Check Every Box

If you’re excited about building systems that make AI safety scalable, reliable, and real—and want to operate at the center of AI Safety, QA, Engineering, and Delivery—we’d love to hear from you.