OrgVue Logo

OrgVue

Principal Site Reliability Engineer

Posted 25 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in London, Greater London, England
Senior level
Remote
Hiring Remotely in London, Greater London, England
Senior level
The Principal Site Reliability Engineer will lead SRE transformations, scale AWS-based infrastructure, and mentor engineers on reliability practices, ensuring operational excellence and resilience at scale.
The summary above was generated by AI
Description

Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have. Our platform connects strategy to structure, providing clarity of vision, so you can build a more adaptable, better performing organisation that thrives in a constantly changing world of work.

The world’s largest and best-known enterprises and consulting firms use Orgvue to visualise and model current and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

As a Principal Site Reliability Engineer, you will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient — even at scale.

This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

Responsibilities

  • Define and enforce SLOs, SLIs, and error budgets across critical services
  • Crafting and implementing a cloud infrastructure and tooling strategy       
  • Work across our Org to level up SRE practices
  • Help implement robust observability metrics, logs & traces using our observability tool
  • Guide the team in building automated, self-healing systems
  • Own and evolve our incident response processes, including on-call practices and post-mortem culture
  • Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
  • Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
  • Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
  • Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
Requirements

Desired Skills & Experience:

  • Demonstrable experience leading SRE transformations
  • Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
  • Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
  • Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
  • Strong background in observability: metrics, visualization, logging, and tracing
  • Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
  • Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
Benefits
  • Hybrid working - 1+ days a week in the London office
  • Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day
  • Subsidised Gym Membership
  • Private Medical Insurance (including Dental and Vision) and Life Assurance
  • 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
  • Summer Fridays (half-day Fridays for the months of July and August)
  • Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
  • Season ticket Loan
  • Cycle to Work Scheme
  • Annual Discretionary Bonus

'Here at Orgvue we promote individualism and a diverse workforce to build on our future success'

Top Skills

AWS
CloudFormation
Gitops
Kubernetes
Terraform

Similar Jobs

4 Hours Ago
Remote
Hybrid
UK
Expert/Leader
Expert/Leader
Cloud • Fintech • Information Technology • Machine Learning • Software
Lead engineering projects ensuring technical excellence and alignment with strategic goals. Mentor engineers and advocate for best practices in scalable and efficient software delivery.
Top Skills: Cloud-Native Platforms
5 Hours Ago
Remote
Hybrid
Staines, Surrey, England, GBR
Senior level
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
The Principal Data Center Architect leads technical direction and infrastructure outcomes for data centers, managing vendor relationships and cross-functional collaboration to ensure operational efficiency and meeting long-term goals.
Top Skills: CadItil
5 Hours Ago
Remote
Hybrid
2 Locations
Senior level
Senior level
Cloud • Enterprise Web • Other • Productivity • Software • Analytics • Design
The Enterprise Field Application Engineer will support enterprise customers in PCB design software adoption, provide demonstrations, and collaborate with sales and product teams to ensure customer success.
Top Skills: Altium DesignerAutomationCloud-Based Design Collaboration ToolsEda SoftwareErpPlmScripting

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account