Pragmatike Jobs

Senior Site Reliability Engineer / Kubernetes (Remote)

Pragmatike

Senior Site Reliability Engineer / Kubernetes (Remote)

Posted 24 Days Ago

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Italy

Senior level

In-Office or Remote

Hiring Remotely in Italy

Senior level

Operate and scale production Kubernetes clusters across bare-metal, virtualized and on-prem environments. Manage Linux infrastructure (Debian/Ubuntu), networking (VLANs, L2/L3, VPNs), automation (Ansible, Bash/Python, GitOps), observability (Prometheus/Grafana, ELK/Loki/Graylog), virtualization (OpenStack/Proxmox/VMware), bare-metal provisioning (MAAS/PXE), incident response, SLO/SLI definition, on-call rotations, SOPs, and cross-team architecture and maintenance coordination.

The summary above was generated by AI

Job Description

Location: Fully remote EU timezone (CET ±2h)
Start date: ASAP
Languages: Fluent English is mandatory
Industry: Cloud Computing

We are hiring at Pragmatike to expand our team and drive the growth of our internal projects.

Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.

If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you!

Responsibilities

Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.
Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.
Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.
Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.
Build automated deployment workflows (PXE boot, Preseed, cloud-init).
Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).
Lead incident response and escalation activities across the platform.
Improve system availability and reduce latency at all levels.
Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
Optimize alerting and monitoring pipelines to provide actionable insights.
Establish and maintain on-call schedules to ensure coverage across timezones.
Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).
Help develop and maintain overall architecture across all products.
Plan resources for future initiatives, accounting for demand and growth projections.
Work with development teams to improve overall quality and optimize resource utilization.
Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams).

Requirements

Expert-level, hands-on experience operating Kubernetes in production environments.
Strong network engineering skills (VLANs, L2/L3 routing, VPNs, multi-site connectivity) - this is essential for the role.
Strong proficiency with Linux systems administration (Debian/Ubuntu).
Solid understanding of networking fundamentals and ability to design complex network architectures.
Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.
Background with virtualization technologies (OpenStack, Proxmox, VMware).
Experience with bare-metal provisioning and MAAS (Metal as a Service).
Strong understanding of distributed systems and container orchestration.
Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.
Experience with incident response, escalation procedures, and on-call rotations.
Ability to work autonomously in a fast-paced, engineering-driven environment.
Strong technical skills combined with alignment to team values.

Nice To Have

Experience with service mesh (Istio, Linkerd) or advanced CNI implementations.
Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.
Experience with GPU infrastructure, node preparation, or resource scheduling.
Familiarity with security best practices (RBAC, firewalls, network policies).
Exposure to IT asset management or license tracking workflows.
Experience working in multi-timezone environments and coordinating across distributed teams.
Background establishing reliability practices and SRE frameworks in growing organizations.

Why Join Us:

100% remote work with flexible hours
High-impact role with autonomy and ownership
Collaborative and international engineering team
Cutting-edge tech stack with strong focus on reliability and automation.

Similar Jobs

Pragmatike

Senior Site Reliability Engineer

24 Days Ago

In-Office or Remote

Senior level

Information Technology • Software

Operate and scale production Kubernetes clusters across bare-metal and virtualized environments; automate provisioning and ops with Ansible/Bash/Python and GitOps; design networking and multi-site connectivity; deploy observability stacks; lead incident response and on-call rotations; define SLOs/SLIs and SOPs; manage virtualization (OpenStack/Proxmox/VMware) and bare-metal provisioning (MAAS/PXE).

Top Skills: AnsibleBashCloud-InitDebianElkGitGitopsGrafanaGraylogKubernetesL2 RoutingL3 RoutingLokiMaasOpenstackPreseedPrometheusProxmoxPxePythonUbuntuVlansVMwareVpns

Pragmatike

Senior Site Reliability Engineer

24 Days Ago

In-Office or Remote

Senior level

Information Technology • Software

Operate and scale Linux (Debian/Ubuntu) infrastructure and Kubernetes clusters across bare-metal, virtualized and on-prem environments. Build automation (Ansible, Bash/Python, GitOps), networking (VLANs, L2/L3, VPN), observability stacks, SLOs/SLIs, incident response and on-call rotations. Manage virtualization (OpenStack, Proxmox, VMware), bare-metal provisioning (MAAS, PXE), create SOPs, and coordinate cross-functional teams to improve availability and performance.

Top Skills: AnsibleBashCloud-InitDebianElkGitGitopsGrafanaGraylogKubernetesL2/L3 RoutingLinuxLokiMaasOpenstackPreseedPrometheusProxmoxPxePythonUbuntuVlansVMwareVpn

Pragmatike

Senior Site Reliability Engineer

24 Days Ago

In-Office or Remote

Senior level

Information Technology • Software

Lead operation and scaling of Linux-based infrastructure and production Kubernetes clusters across bare-metal and virtualized environments. Implement automation (Ansible, Bash/Python, GitOps), design complex networking (VLANs, L2/L3, VPNs), maintain observability stacks, run incident response and on-call rotations, define SLOs/SLIs and SOPs, manage virtualization (OpenStack/Proxmox/VMware) and bare-metal provisioning (MAAS), and collaborate with cross-functional teams to improve reliability and capacity planning.

Top Skills: AnsibleBashCloud-InitCloudflare ApiDebianDns AutomationElkGitGitopsGpu InfrastructureGrafanaGraylogIstioKubernetesL2 RoutingL3 RoutingLinkerdLinuxLokiMaasNetwork PoliciesOpenstackPreseedPrometheusProxmoxPxePythonRbacUbuntuVlansVMwareVpn

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.