Luupli Jobs

Site Reliability Engineer

Luupli

Site Reliability Engineer

Reposted 10 Days Ago

Remote

Hiring Remotely in United Kingdom

Mid level

Remote

Hiring Remotely in United Kingdom

Mid level

The Site Reliability Engineer will design, build, and maintain AWS cloud infrastructure, ensure performance and reliability, automate tasks, and participate in incident management.

The summary above was generated by AI

About Luupli

Luupli is a social media app that has equity, diversity, and equality at its heart. We believe that social media can be a force for good, and we are committed to creating a platform that maximizes the value that creators and businesses can gain from it, while making a positive impact on society and the planet. Luupli started internal testing since June 2024 and getting ready for a commercial BETA testing from December 2024, with the hope of launching fully summer of 2025

Job Title: Site Reliability Platform Engineer

About Luupli:

Luupli is a social media app that has equity, diversity, and equality at its heart. We believe that social media can be a force for good, and we are committed to creating a platform that maximizes the value that creators and businesses can gain from it, while making a positive impact on society and the planet. Our team is made up of passionate and dedicated individuals who are committed to making Luupli a success.

Role Description:

We are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure and services, primarily hosted on AWS. If you have a passion for problem-solving, a deep understanding of AWS services, hands-on experience with Terraform, and proficiency in scripting with Python or Bash, we invite you to apply for this exciting opportunity.

Role and Responsibilities:

1. Infrastructure Design and Automation:

- Collaborate with software engineering and operations teams to design, build, and maintain cloud-based infrastructure using AWS and Terraform.

- Implement and enhance infrastructure-as-code (IaC) practices using Terraform to ensure reproducibility and scalability of infrastructure components.

2. Monitoring and Incident Management:

- Develop and maintain monitoring solutions to proactively identify performance bottlenecks, system outages, and other potential issues.

- Participate in incident response and root cause analysis efforts to drive continuous improvement and prevent future incidents.

3. Reliability and Performance Optimization:

- Optimise system performance, reliability, and cost efficiency through continuous monitoring, performance tuning, and capacity planning.

- Identify opportunities to automate manual processes and improve system resilience.

4. Scripting and Automation:

- Utilise Python or Bash scripting to create and maintain automation tools for various operational tasks and deployments.

- Implement and improve continuous integration and continuous deployment (CI/CD) pipelines.

5. Security and Compliance:

- Collaborate with security teams to implement best practices for securing cloud infrastructure and services.

- Ensure compliance with relevant industry standards and regulations.

6. Deployment and Release Management:

- Support CI/CD pipelines for application deployments and updates.

- Contribute to the design and implementation of deployment strategies that promote zero-downtime releases.

7. Documentation and Knowledge Sharing:

- Maintain clear and up-to-date documentation for infrastructure configurations, processes, and incident resolution procedures.

- Participate in knowledge sharing with team members to enhance overall expertise and skill sets.

Requirements:

1. Education and Experience:

- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).

- Proven experience as a Site Reliability Engineer or similar role.

2. Technical Skills:

- Extensive experience with Amazon Web Services (AWS) and its core services (EC2, S3, RDS, IAM, etc.).

- Strong proficiency in infrastructure-as-code (IaC) tools, with a focus on Terraform.

- Proficient in scripting with Python or Bash for automation and operational tasks.

- Solid understanding of networking principles and protocols.

- Knowledge of CI/CD pipelines and related tools.

3. Problem-Solving and Analytical Abilities:

- Ability to diagnose and resolve complex technical issues in a fast-paced environment.

- Analytical mindset to proactively identify potential system weaknesses and performance bottlenecks.

4. Collaboration and Communication:

- Strong teamwork and collaboration skills to work effectively with cross-functional teams.

- Excellent verbal and written communication skills.

Compensation

This is an equity-only position, offering a unique opportunity to gain a stake in a rapidly growing company and contribute directly to its success.

Similar Jobs

Cisco ThousandEyes

Site Reliability Engineer

4 Days Ago

Remote or Hybrid

Mid level

Cloud • Software

Design, operate, and scale large distributed systems for telemetry processing. Build automation, use AI tooling to reduce toil, ensure availability and disaster recovery, participate in on-call incident response, troubleshoot production AWS/Kubernetes environments, and collaborate with application teams to meet SLOs/SLAs.

Top Skills: Ai ToolingAWSGnu/LinuxGoKubernetesPythonTerraform

GitLab

Site Reliability Engineer

22 Days Ago

Easy Apply

Remote

United Kingdom

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Maintain and improve reliability, scalability, and automation for user-facing production systems. Build infrastructure tooling, operate Kubernetes-based services, write IaC, participate in on-call and incident response, and advance observability and runbooks to reduce toil and improve platform reliability.

Top Skills: AWSCi/CdGCPGitopsGoInfrastructure As Code (Iac)KubernetesKubernetes Operators/ControllersLoggingMetricsRubySlos/SlisTerraform

Patsnap

Site Reliability Engineer

5 Days Ago

Remote

Senior level

Artificial Intelligence • Software

Lead and grow the UK SRE team to ensure availability, security, performance and scalability of a global SaaS platform. Define operational strategy, set SLIs/SLOs, run major incident response, drive automation and AI-powered operations, improve platform architecture and resilience, and collaborate with Engineering, Product, Infrastructure and Security across regions.

Top Skills: AWSChatgptCi/CdClaudeCodexDockerGithub CopilotInfrastructure As CodeKubernetesObservability Platforms

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.