Tricentis

Senior Site Reliability Engineer

Posted 9 Days Ago

Be an Early Applicant

In-Office

2 Locations

Senior level

In-Office

2 Locations

Senior level

As a Senior Site Reliability Engineer, you will enhance SaaS products' reliability and availability by maintaining cloud infrastructure, monitoring systems, and collaborating with engineers for scalable solutions.

The summary above was generated by AI

Job Description

The Site Reliability Engineer is a pivotal role in our SaaS strategy. You will work closely with our engineering team to ensure unrivaled observability, availability, and performance of Tricentis SaaS Products.

As a Site Reliability Engineer (SRE), you'll be the driving force of our user-facing services and production systems. We're seeking individuals with pragmatic operational skills and software craftsmanship, applying engineering principles, and operational discipline to elevate our operating environments and codebase to new heights.

At the core of your responsibilities, you'll specialize in systems such as operating systems, storage subsystems, observability and networking while implementing best practices for availability, reliability, and scalability. But that's just the beginning of your thrilling journey with us!

Your Impact as an SRE 🚀

Design, build, and maintain the product cloud infrastructure that enables seamless scaling to support hundreds of thousands of concurrent users.

Develop advanced monitoring systems that proactively alert on symptoms, ensuring rapid response to potential issues.

Leverage tools like Terraform, GitHub actions, and Kubernetes to efficiently manage our AWS or AZURE infrastructure.

Continuously enhance operational processes, making deployments, upgrades, and other tasks as boring and automated as possible.

Collaborate with product engineers on daily basis and influence product architectures designs

Be part of an on-call (PagerDuty) rotation to respond swiftly to incidents affecting availability, offering support to product engineers during customer incidents.

As a valuable member of our SRE team, you'll have the opportunity to 💪

Act as a reliability champion for stable counterpart assignments, ensuring a robust and resilient infrastructure.

Propose innovative ideas and solutions within the SRE organization and engineering

Plan, design, and execute solutions to achieve goals agreed upon by the team.

Leading by example with positive and inclusive attitude and fostering constructive discussions between SRE and engineering

Proactively identify opportunities to enhance system availability and performance by applying insights gained from monitoring and observation.

Share your learnings with the wider community

Be the first responder during emergencies and on-call duties, promptly addressing symptoms and conducting root cause analysis to implement corrective actions and prevent recurring issues.

Our Tech Stack 🌐

AZURE , AWS, Terraform, GitHub Actions, Kubernetes, DataDog, Prometheus, Grafana, Betterstack, All-in-one incident management platform | incident.io , Jira

Our Culture 🦄

We don't just preach our values; we embody them in everything we do. We are committed to creating an environment that empowers, supports, and includes individuals, where trust, transparency, creativity, curiosity, and continuous improvement thrive on a daily basis.

About You 🎯

Proficiency in Terraform syntax and GitHub Actions configuration, including pipelines and job management using GitOps

Working knowledge of SaaS architecture concepts and designs.

Understanding of Kubernetes, including CLI usage and service re-provisioning

Ability to provision and set up metrics along with managing alerts and silences.

Identify Service Level Indicators (SLIs) that align the team with availability and latency objectives.

Experience with Linux operating system configuration, package management, and troubleshooting.

Working experience with cloud environments like AZURE or AWS and provisioning infrastructure there.

Good cultural fit: clear communication, empathy, curiosity & continuous learning, no blame attitude, but instead supportive

If you're ready to make a lasting impact as a Site Reliability Engineer and be at the forefront of revolutionising Tricentis SaaS Products, don't miss this.

Tricentis Core Values:

Knowing what we need to achieve and how to achieve it is important. Tricentis core values define our ways of working and the behaviours we model that create an enjoyable and successful Tricentis life.

Demonstrate Self-Awareness: Own your strengths and limitations.

Finish What We Start: Do what we say we are going to do.

Move Fast: Create momentum and efficiency.

Run Towards Change: Challenge the status quo.

Serve Our Customers & Communities: Create a positive experience with each interaction.

Solve Problems Together: We win or lose as one team.

Think Big & Believe: Set extraordinary goals and believe you can achieve them.

About Tricentis:

Tricentis is a software company officially founded in 2007, with primary focus on software quality assurance. Whether exploratory or automated, functional or performance, API or UI, as well as mainframes or custom applications or packaged applications, or cloud-native applications - our comprehensive suite of specialized Continuous Testing tools makes DevOps real by giving our clients the confidence to release on demand.

Tricentis has more than 1500 employees working in across over 20 global offices in US, EMEA, APAC serving over 2100 customers.

Top Skills

AWS

Azure

Datadog

Github Actions

Grafana

JIRA

Kubernetes

Prometheus

Terraform

Similar Jobs

Motorola Solutions

Architect

Yesterday

Hybrid

Cork, IRL

Senior level

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics

Design and implement testing approaches for software systems, mentor teams, and ensure compliance with specifications while driving defect resolution.

Top Skills: Alm OctaneAzure DevopsC#GoJavaJavaScriptJIRAPython

Motorola Solutions

Quality Assurance Manager

Yesterday

Hybrid

Cork, IRL

Senior level

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics

The QA Manager oversees QA team activities, ensures QA processes align with project goals, mentors team members, and conducts testing while collaborating with Scrum teams.

Top Skills: JavaJavaScriptJIRAPythonSeleniumTestrail

Motorola Solutions

Test Engineer

Yesterday

Hybrid

Cork, IRL

Junior

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics

As a Junior Test Engineer, you'll analyze product specifications, execute test cases, identify defects, and maintain software quality standards.

Top Skills: Agile FrameworksAlmTest Management Software

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.