Ocient Logo

Ocient

Site Reliability Engineer

Posted 2 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in Greece
Mid level
Remote
Hiring Remotely in Greece
Mid level
Maintain and expand Ocient's hosted data warehouse services with a focus on high availability, performance, observability, automation, security, and incident management. Build monitoring, logging, alerting, CI/CD, and automate Linux server deployments while supporting backup, DR, and test infrastructure.
The summary above was generated by AI
About Ocient:
Ocient is building OcientAIQ™ – a complete ecosystem for delivering trusted agentic AI solutions at petabyte scale, for the organizations that can't afford to get AI wrong. Our customers protect networks, secure nations, and power the global economy. The problems we solve are genuinely hard, and the work matters.
 
Founded in 2016 by the team that built Cleversafe (acquired by IBM in 2015), Ocient is headquartered in Chicago with a remote-first global team. We are a carbon-neutral company backed by leading investors including Greycroft, OCA Ventures, In-Q-Tel, and Buoyant Ventures.

Do not contact Ocient directly to apply for a role. For security purposes, any applications received via email will be deleted.

Job Title: Site Reliability Engineer
Location: Remote (United Kingdom)
Hiring Manager: Service Delivery Engineering Manager
Estimated salary range: £74,000 to £90,000
• The salary offered for this position will be based on a candidate’s experience and skill demonstrated during interviews and other evaluations

Position Overview
Ocient is searching for an experienced Site Reliability Engineer with strong problem-solving skills and a passion for solving hard problems to help maintain and expand Ocient's "as a service" offering of its cutting-edge data warehouse.

Responsibilities
  • Support the design and operations of Ocient's hosted database and related services — including message queues and storage systems — ensuring high availability, performance, and efficiency.
  • Design and maintain monitoring, log centralization, and alerting for all services to facilitate
observability and incident management.
  • Automate deployment and configuration Linux-based servers, including the OS and the
numerous applications that compose our hosted offerings.
  • Develop and maintain rigorous security practices to protect our applications and customer
data.
  • Assist with automation of testing pipelines for the Ocient DB and monitoring of test
infrastructure.

Ideal Qualifications
  • 3+ years of experience in system administration in production environments.
  • Scripting experience with Bash, Python, or other languages.
  • Experience with system and software monitoring and alerting tools, such as the ELK stack,
  • Graylog, InfluxDB, Prometheus, Zabbix, Grafana, Dynatrace, or others.
  • Experience with configuration management software such as Ansible, Puppet, or Chef.
  • Experience with data archiving, backup and disaster recovery
  • Continuous Integration / Continuous Deployment experience with Jenkins, Gitlab CI or
  • others.
  • Experience with source control tools like Git.
  • Ability to work flexible hours and serve in on call rotations.

An Exceptional Candidate Will Have:
  • Knowledge of OWASP principles for application security.
  • Experience with server / system virtualization and containerization technologies e.g.,
  • ProxMox, KVM, VMware.
  • Experience with SQL and Database Administration.
  • Experience managing and operating cloud infrastructure. (e.g. AWS, GCP, Azure)
  • Experience with SSAE18 SOC2 Compliance.
  • Experience with networking administration, including VPN, proxy, DNS, and firewall
configuration.

Interview Requirements: All interviews are conducted via video and require candidates to have their camera on for the duration of the session. The use of video filters, face-altering effects, or virtual backgrounds is not permitted for security and verification purposes.

We are not open to using an agency or staffing company at this time. We do not accept unsolicited agency or staffing resumes and we are not responsible for any fees related to unsolicited resumes. 

Ocient is an equal employment opportunity employer. All qualified applicants will receive consideration for employment without regard to race, creed, color, religion, sex (including pregnancy status), sexual orientation, gender identity, national origin or ancestry, ethnicity, citizenship status, age, physical or mental disability, veteran status, marital status, parental status, genetic information, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, please contact [email protected] for more information.

All official Ocient job postings and recruiting communications will come directly from our team via our Careers page, LinkedIn, or from an ocient.com email address. If you receive communication about a role from any other source, please treat it with caution and direct questions to [email protected].

Similar Jobs

19 Days Ago
Remote
Senior level
Senior level
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
Lead SRE efforts to ensure reliability, scalability, and performance of a large-scale platform. Architect observability, define SLOs/SLIs, lead incident response and post-mortems, automate infrastructure and CI/CD, optimize Kubernetes/GCP deployments, debug distributed systems, mentor engineers, and write production-quality Python or Go code.
Top Skills: Ci/CdCloud-NativeDatadogDockerGCPGoGrafanaInfrastructure As CodeKubernetesLoggingOpentelemetryPrometheusPulumiPythonTerraformTracing
19 Days Ago
In-Office or Remote
Senior level
Senior level
Artificial Intelligence • Information Technology • Consulting
Lead reliability and observability for compute nodes running VMs. Debug Linux user/kernel issues, troubleshoot CPU/memory/NUMA/cgroups, operate QEMU/KVM and container tech, design node-level metrics/logs/traces/SLIs/SLOs, run incident response and collaborate across platform, kernel, GPU and infrastructure teams.
Top Skills: CgroupsContainersEbpfFtraceGpusInfinibandKernel Crash DumpsKubernetesLinuxNamespacesNumaNvlinkPerfQemu/KvmStrace
5 Days Ago
Remote
Mid level
Mid level
Fintech • Information Technology
Operate and improve Alpaca's production infrastructure: on-call incident response, define SLIs/SLOs, enhance observability, ship infrastructure as code via GitOps, and strengthen PostgreSQL reliability (performance, migrations, HA/DR). Mentor teams on reliability and database fundamentals.
Top Skills: DnsGitopsGoKubernetesLinuxLoad BalancingObservabilityPostgresPythonTlsVpc

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account