Senior Technical Program Manager (TPM) at Mpathic
Seattle, WA, US / San Francisco, CA, US / Remote

Our Story

mpathic is building the future of empathetic, trustworthy AI. Grounded in behavioral science and human-centered design, our technology delivers AI systems that are safe, aligned, and emotionally intelligent. As enterprises race to adopt AI, we believe the companies that win will be those that build trust first.

We are building a high-quality AI Safety Team to evaluate and strengthen advanced AI systems. Our work focuses on making models reliable, auditable, and scalable—so safety work can move fast without relying on heroics or sacrificing quality.

 

Position Overview

We’re looking for a Senior Technical Program Manager (TPM) to lead end-to-end AI Safety Human Data Programs.

This role sits at the intersection of:

  • Human data operations
  • Trust & Safety policy development
  • Rubric and Taxonomy development
  • AI evaluation and benchmarking
  • Red teaming and edge-case discovery


You will own programs that design, generate, evaluate, and scale high-quality human data — ensuring outputs are reliable, auditable, and actionable for Trust & Safety and ML teams.

 

This is not an infrastructure ML role. It is an ops and systems-building role focused on human signal, policy operationalization, and scalable evaluation. You will build the systems, workflows, and quality controls that allow clinicians, policy experts, and ML teams to collaborate efficiently at scale.


This is a full-time role (Seattle or Bay Area preferred, remote eligible), reporting to the Head of AI Safety or Evaluation Programs / GM.

 

What You’ll Accomplish

In your first 60–90 days you’ll…

  • Take ownership of one or more active AI safety human data program operations with a sharp focus on program execution and quality
  • Lead the team on timelines, prioritization, and risk management
  • Establish clear program milestones, throughput targets, and quality benchmarks
  • Audit and improve existing annotation and QA workflows
  • Deliver measurable improvements in quality, scalability, or cycle time
  • Align program outputs with Trust & Safety and ML partner needs

 

In your first year, you’ll…

  • Own multiple concurrent human data programs across safety domains, , ensuring consistent quality, prioritization, and delivery standards across initiatives
  • Establish durable and scalable systems for data generation, benchmarking, red teaming, and evaluation
  • Partner with our clinical leads to ensure scalable and reusable policy, rubric, and taxonomy frameworks that scale across customers and use cases
  • Reduce cost and lead time through smarter task design, workflow optimization and capacity planning
  • Launch reporting dashboards linking human data outputs to policy insights, model improvement and measurable safety improvements
  • Implement governance standards that ensure auditability and reproducibility across programs
  • Serve as the internal point of accountability for human data program execution, ensuring that strategic accounts are delivered on time, at quality, and aligned with executive sponsor expectations.

 

You’ll Thrive in This Role If You…

Have 6+ years of experience in:

  • Leading complex, cross-functional technical programs in fast-moving or ambiguous environments
  • Have managed expert, contractor, or vendor-based review programs and understand throughput, calibration, and QA tradeoffs
  • Are comfortable owning timelines, prioritization, and delivery accountability across multiple parallel workstreams
  • Focus on scalability of systems, people, etc.
  • Technical program and human data pipeline management
  • Operationalizing human data operations 
  • Managing expert or vendor-based human review workflows

 

And have experience:

  • Building and scaling human data, Trust & Safety, or evaluation operations that required structured workflows, quality controls, and governanceUnderstand the realities of AI evaluation, red teaming, model benchmarking, or human-in-the-loop systems
  • Can operate independently with strong judgment while escalating risks early and clearly
  • Scaling annotation, evaluation, or red teaming programs
  • Leading cross-functional programs involving policy, product, and engineering
  • Working with researchers and QA to implement QA systems Working with LLM evaluation, alignment, or model benchmarking
  • Managing fast-paced, high-demand delivery environments

You are especially strong at:

  • Turning ambiguous safety goals into structured execution plans with clear milestones and risk management
  • Building repeatable systems, templates, and playbooks that scale across teams and use cases
  • Balancing quality, speed, and cost
  • Setting expectations and communicating clearly with both technical and non-technical stakeholders
  • Maintaining calm, clarity, and decisiveness in high-pressure or high-visibility environments

 

What You’ll Do

Own End-to-End Human Data Programs

  • This role operates in a fast-moving, high-demand environment with overlapping campaigns and tight delivery timelines. The ideal candidate thrives under pressure and can maintain quality while moving quickly.
  • Lead operations safety data programs from rubric development → data generation → annotation → QA → reporting
  • Define milestones, SLAs, staffing plans, and delivery timelines
  • Own prioritization, risk management, and cross-functional alignment
  • Manage parallel workstreams across internal teams and expert contributors
  • Identify and mitigate execution risk early

Drive Human Data Quality & Reliability

  • Design data workflows that balance nuance, speed, and consistency
  • Implement QA tiers and sampling strategies
  • Manage drift, quality metrics, and throughput performance
  • Ensure auditability, reproducibility, and scalable program governance

Cross-Functional Alignment

  • Partner with clinical QA, reviewers, and trainers ensure successful execution of human data projects
  • Support ML teams with structured evaluation signal for fine-tuning and benchmarking
  • Collaborate with Engineering to improve tooling and workflow automation
  • Deliver executive-ready reporting on program performance, risks and impact

Cross-Functional Collaboration

Work closely with:

  • TPMs and Evaluation Leads — delivery execution, workflows, escalation systems
  • Clinical & Behavioral Science Experts — rubric grounding, psychological frameworks
  • QA Leadership — agreement metrics, gold sets, drift monitoring
  • Engineering / Product — tooling support for review, audit trails, and escalation queues
  • Customer Delivery — ensuring findings are interpretable and trustworthy

 

We value calm execution, clinical rigor, operational excellence, and scalable systems that make high quality work sustainable.

 

Compensation & Benefits

  • Base Salary (US): $140,000–$200,000 (band depends on seniority, scope, and number of customer programs or pods owned)
  • Equity: Yes
  • Benefits: We offer 100% company-funded health, dental, and vision insurance for full-time employees. Additionally, we offer 401k, well-being programs, and flexible paid-time off.
  • Remote-first
  • Mission-driven work focused on AI safety, trust, and operational rigor

 

Apply Even If You Don’t Check Every Box

If you’re excited about bringing clinical judgment, training excellence, and quality systems into AI safety evaluation work—and want to help ensure emotionally grounded AI systems are safe and trustworthy—we’d love to hear from you.