Zendesk Jobs

Senior Product Manager, AI Agents Testing

Zendesk

Senior Product Manager, AI Agents Testing

Posted 23 Days Ago

Remote

Hiring Remotely in United Kingdom

Senior level

Remote

Hiring Remotely in United Kingdom

Senior level

Lead the product strategy for AI agent testing and observability, ensuring tools for testing agent behavior and quality scoring are user-friendly for non-technical users in B2B environments. Coordinate across multiple teams to deliver integrated solutions.

The summary above was generated by AI

Job Description

Role

Zendesk AI Agents are fully autonomous agents that resolve customer issues end-to-end — reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed. They operate across messaging, email, and voice channels, handling millions of conversations for brands like Liberty London, Unity, and Motel Rocks. As these agents grow more capable and more autonomous, the stakes of every deployment decision increase: a misconfigured procedure, a hallucinated response, or a broken escalation path can erode customer trust at scale.

Today, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely. You'll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.

Key Responsibilities

Own product strategy and roadmap for AI agent testing — simulation, quality scoring, experimentation, regression detection, and conversation tracing
Ship testing as an integrated experience embedded in the builder and deployment flow
Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where
Build the experimentation layer — A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies
Design a pre-publish readiness gate that gives admins a quantified view of risk before every deployment — specific issues, coverage gaps, comparison to current production behavior
Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture
Make all of this usable by non-technical admins — CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets

Required Qualifications

Several years of product management experience, with 2+ years building for non-technical users in complex technical domains (QA tooling, no-code platforms, admin consoles, workflow builders) in B2B SaaS
Experience shipping AI/ML products where evaluation and reliability were real concerns, not afterthoughts
You understand why traditional testing doesn't work for LLM-based systems and have opinions about what does
Ability to ship platform capabilities through user-facing product surfaces — you don't just build infrastructure, you make it usable
Experience integrating acquired or adjacent products into a unified experience — combining capabilities from different teams, codebases, or organizations into something that feels like one product
Track record coordinating across 3+ engineering teams and multiple departments to deliver one coherent product experience

Bonus Qualifications

Experience building simulation, synthetic data, or automated testing products
Background in conversational AI, chatbot platforms, or customer service technology
Familiarity with LLM evaluation approaches — human-in-the-loop scoring, automated rubrics, AI-as-judge
Experience with experimentation infrastructure — A/B testing, staged rollouts, feature flagging at scale
Experience turning internal prototypes into customer-facing products

Success in the Role

Testing becomes part of how customers build and deploy agents — not something they do separately, but part of the flow
Customers can quantify whether their agent is ready to go live, and catch regressions before end users hit them
Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing
The testing platform becomes a shared capability used beyond AI Agents — consumed by other product teams that need to validate AI-powered experiences

Interview Process

1. Initial Call with Talent Team — 15 mins

2. Hiring Manager Interview with Mirza, Director of Product, AI Agents — 45 mins

3. Case Study / Workshop — 75 mins

4. Final Interview with Ryan McGrew, VP Product, AI Agents — 30 mins

#LI-MK12

The intelligent heart of customer experience

Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.

Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.

As part of our commitment to fairness and transparency, we inform all applicants that artificial intelligence (AI) or automated decision systems may be used to screen or evaluate applications for this position, in accordance with Company guidelines and applicable law.

Zendesk is an equal opportunity employer, and we’re proud of our ongoing efforts to foster global diversity, equity, & inclusion in the workplace. Individuals seeking employment and employees at Zendesk are considered without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, disability, military or veteran status, or any other characteristic protected by applicable law. We are an AA/EEO/Veterans/Disabled employer. If you are based in the United States and would like more information about your EEO rights under the law, please click here.

Zendesk endeavors to make reasonable accommodations for applicants with disabilities and disabled veterans pursuant to applicable federal and state law. If you are an individual with a disability and require a reasonable accommodation to submit this application, complete any pre-employment testing, or otherwise participate in the employee selection process, please send an e-mail to [email protected] with your specific accommodation request.

Similar Jobs

Mondelēz International

Change Manager o9 MEU, Demand Planning

3 Hours Ago

Remote or Hybrid

Senior level

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing

Lead change management for the S4/o9 transformation across MEU Demand Planning. Partner with senior leaders to design change strategies, assess impacts, deliver training (TNA, curriculum, localization, train-the-trainer), build change capability, manage stakeholder engagement, and track KPIs to drive adoption and measure effectiveness.

Top Skills: Integrated Business Planning (Ibp)O9 PlanningSap S/4Hana

Mondelēz International

Change Manager o9 MEU, IBP

3 Hours Ago

Remote or Hybrid

Senior level

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing

Lead change management for the S4/o9 transformation across MEU: set change strategy, manage stakeholder engagement with senior leaders, deliver change impact assessments, own end-to-end functional training, build change capability, and measure adoption and KPIs to ensure successful implementation.

Top Skills: Integrated Business Planning (Ibp)O9 PlanningSap S/4Hana

Mondelēz International

o9 Change Readiness Lead

3 Hours Ago

Remote or Hybrid

Senior level

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing

Lead program-level change strategy, readiness framework, and QA for change deliverables. Standardize key user learning journeys, manage the integrated change plan, oversee risks and issues, direct Functional Change Leads, represent change at leadership forums, and build lasting organizational change capability.

Top Skills: ConfluenceJIRAMicrosoft TeamsMs ProjectO9OracleSalesforceSAPSharepointSmartsheet

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.