Extreme Reach

Lead MLOps Engineer

Posted 2 Days Ago

Be an Early Applicant

London, Greater London, England

Senior level

London, Greater London, England

Senior level

The Lead MLOps Engineer is responsible for integrating, deploying, and monitoring machine learning models in production, ensuring reliability and scalability. This role includes designing the AI/ML models platform, managing infrastructure, implementing CI/CD pipelines, and providing technical leadership. Collaboration with data scientists, DevOps teams, and product managers is essential for successful integration into business workflows.

The summary above was generated by AI

Description

XR is a global technology platform powering the creative economy. Its unified platform moves creative and productions forward, simplifying the fragmentation and delivering global insights that drive increased business value. XR operates in 130 countries and 45 languages, serving the top global advertisers and enabling $150 billion in video ad spend around the world. More than half a billion creative brand assets are managed in XR’s enterprise platform.

Above all, we are a supportive and collaborative culture dedicated to DEI. We are caring, dedicated, positive, genuine, trustworthy, experienced, passionate and fun people with loyalty to our customers and our fellow teammates. It is our belief that the better we work together to help our clients achieve their goals, the more successful XR will be.

The Opportunity

The Lead MLOps Engineer plays a critical role in ensuring the seamless integration, deployment, monitoring, and scaling of machine learning models into production. The role blends the expertise of DevOps and machine learning to bridge the gap between data science and operational systems, ensuring that ML models perform reliably and at scale in real-world environments. As the Lead MLOps Engineer, you'll drive best practices for model lifecycle management and create the infrastructure to automate and streamline workflows.

Job Responsibilities

Design and architect the AI/ML models platform to support scalable, efficient, and high-performance machine learning workflows.

Build and manage infrastructure that supports the deployment of machine learning models. This includes leveraging cloud services (AWS), CDK, and containerization tools like Docker.

Architecting and developing MLOps systems with tools such as AWS Sagemaker, MLFlow, Stepfunctions, Lambdas.

Lead the design and implementation of CI/CD pipelines to automate model deployment and rollback processes, ensuring that models can be delivered seamlessly to production aiming to reduce manual intervention and increasing system reliability.

Ensure scalability and efficiency of the models to handle real-time predictions and batch processing.

Set up monitoring and logging solutions for tracking the performance of models in production (DataDog, Cloudwatch).

Define and promote best practices in MLOps.

Provide technical leadership and mentorship to MLOps engineers on technologies, and standard processes.

Partner with the global engineering team to drive cross-functional alignment and ensure seamless integration of AI ML models into wider data ecosystem.

Work closely with Data Scientists, DevOps teams, and Product Managers to ensure that machine learning models are integrated into business workflows and deployed effectively.

Stay up-to-date with the latest trends and technologies in MLOps and machine learning deployment and identify opportunities to incorporate new tools or practices to improve efficiency.

Requirements

MS/BS in Computer Science or related background preferred;

5+ years of experience in MLOps or related roles, with at least 2+ years in a leadership or senior engineering capacity;

Proven experience leading and mentoring teams, managing multiple stakeholders, and delivering projects on time;

Proficiency in Python is essential;

Experience with shell scripting, system diagnostic and automation tooling;

Proficiency and professional experience of ML and computer vision;

Have built and deployed ML, computer vision or GenAI solutions (PyTorch, TensorFlow);

Experience working with databases to manage the flow of data through the machine learning lifecycle;

Experience with cloud-native services for machine learning, such as AWS SageMaker, MLFlow, Stepfunctions, Lambdas is essential;

Deep expertise in Docker for containerization of machine learning models and tools is essential;

Experience delivering environment using infrastructure-as-code techniques (AWS CDK, CloudFormation);

Experience setting up and managing continuous CI/CD pipelines for ML workflows using tools like Jenkins, GitLab;

Experience in fast-paced, innovative, Agile SDLC;

Strong problem solving, organization and analytical skills;

Experience with Databricks is beneficial;

Experience in building and managing training, evaluation and testing datasets in beneficial;

Knowledge of security best practices in the context of machine learning.

Top Skills

Python

Similar Jobs

Quantexa

Lead MLOps Engineer

2 Days Ago

London, Greater London, England, GBR

Senior level

Database

The Lead MLOps Engineer will be responsible for designing, deploying, and maintaining machine learning models in production. This role includes overseeing MLOps initiatives, optimizing workflows, automating pipelines, and ensuring robust deployment infrastructure. They will also mentor junior engineers and lead collaboration with data science teams.

Top Skills: JavaPythonScala

Morningstar

Operations Engineer

6 Hours Ago

Hybrid

London, Greater London, England, GBR

Junior

Enterprise Web • Fintech • Financial Services

The Operations Engineer role involves monitoring the application stack and infrastructure, providing support for detected issues, and deploying new software. The position requires collaboration with teams to improve Ops monitoring and tooling, as well as responding to requests in a timely manner. It also includes updating documentation and participating in training.

Top Skills: PowershellPython

NinjaOne

Site Reliability Engineer

12 Hours Ago

Hybrid

London, Greater London, England, GBR

Senior level

Information Technology • Productivity • Software • Infrastructure as a Service (IaaS)

The Site Reliability Engineer will be responsible for diagnosing and resolving complex application/infrastructure issues, ensuring service availability, and improving operational efficiency through automation. They will participate in an on-call rotation, conduct Root Cause Analyses, and contribute to the architecture and design of reliable systems.

Top Skills: .NetC++Java

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.