DeepL

Staff Linux Systems Engineer

Reposted 3 Days Ago

Be an Early Applicant

In-Office

London, England

Senior level

In-Office

London, England

Senior level

Design and operate High-Performance Computing infrastructure, automate processes, optimize performance, and troubleshoot GPU compute clusters in collaboration with development teams.

The summary above was generated by AI

Meet DeepL

DeepL is a global communications platform powered by Language AI. Since 2017, we’ve been on a mission to break down language barriers. Our human-sounding translations and intelligent writing suggestions are designed with enterprise security in mind. Today, they enable over 100,000 businesses to transform communications, reach new markets, and improve productivity. And, empower millions of individuals worldwide to make sense of the world and express their ideas.

Our goal is to become the global leader in Language AI, building products that drive better communication, foster connections, and make a real-life impact. To achieve this, we need talented individuals like you to join our exciting journey. If you're ready to work with a dynamic team and build your career in the fast-moving AI space, DeepL is your next destination.

What sets us apart

What sets us apart is our blend of modern technology, competitive benefits, and an open, welcoming work culture that enables our people to thrive. When we share what it's like to work at DeepL, the reactions are overwhelmingly positive. This may be because of our products that have helped countless people worldwide or our shared mission to improve communication for individuals and businesses, bringing cultures closer together. What we know for sure is this: being part of DeepL means joining a team dedicated to innovation and employee well-being. Discover what our teams have to say about life at DeepL on LinkedIn, Instagram and our Blog.

Meet the team behind this journey

Within the Infrastructure Operations and Security (IOPS) department, our Data Center Unit manages all infrastructure systems across our remote sites. As a key member of the Research Infrastructure Operations (RIO) team, you will architect and design systems to help us operate our research GPU infrastructure, support the Research department and make fundamental contributions to our AI development.

You will be one of the first ones in Europe to work hands-on with the latest Nvidia’s AI systems GB200 NVL72. Given the scale and complexity of our infrastructure, it's not just about maintaining our systems, it's about advancing them. You will use your expertise in tooling and automation to improve the efficiency, reliability and performance of our infrastructure, taking our operations to the next level.

In this role, you will also coordinate with on-site personnel and work closely with various teams within our organization. Joining our team means becoming part of a skilled group of engineers ready to support and kick-start your journey with us.

Your responsibilities

Co-own the architecture and roadmap for the model‑training infrastructure with the Engineering Manager.
Lead cross-team project implementations end to end—align stakeholders, define scope and milestones, manage dependencies, and drive on-time delivery.
Provide technical mentorship through design reviews, documentation and hands-on coaching, without managing direct reports.
Build and own automation tooling for provisioning, maintenance and troubleshooting of our GPU infrastructure while continuously improving team tooling.
Plan and execute fleet upgrades (kernels, NVIDIA drivers, BIOS/NIC/HBA firmware) with minimal disruption; keep sites consistent.
Establish observability across the whole GPU cluster including storage and network by extending and optimizing our monitoring systems.
Lead cross-team incident response and drive root-cause analysis.
Benchmark and optimize cluster performance.
Partner with the network team to design and tune the fabric for high-performance workloads.
Participation in our on-call rotation: You’ll ensure the reliability and availability of our services by being available to join the team's shared on-call rotation as needed.

About you

Staff-level individual contributor with a proven track record of setting and implementing technical strategy and leading cross-team technical projects
Extensive experience in management and troubleshooting of GPU compute clusters, being able to architect solutions that scale
Proficiency in containerization and container orchestration technologies such as Docker and Kubernetes
Software engineering expertise and fluency in at least one programming language, preferably in Go.
Expertise in patch and OS management at scale
Experienced in Linux performance benchmarking, tuning and troubleshooting
Familiarity with distributed storage solutions like Lustre and Ceph
Knowledgeable in networking technologies and protocols, including Ethernet and ideally Infiniband
Proactive and solution-oriented mindset
Excellent problem-solving skills
Initiative-driven and able to take ownership

What we offer

Diverse and internationally distributed team: joining our team means becoming part of a large, global community with people of more than 90 nationalities. We're more than just colleagues; we're a group of professionals with a shared mission to connect diverse cultures. Our global presence is growing–we've doubled in size nearly every year, with our employees based in the UK, Germany, the Netherlands, Poland, the US, and Japan, and we continue to expand our network.
Open communication, regular feedback: as a language-focused company, we value the importance of clear, honest communication. We value smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
Hybrid work, flexible hours: we offer a hybrid work schedule, with team members coming into the office twice a week. This allows you to engage directly with your team and experience the unique energy of our workspace, while still enjoying the flexibility and comfort of working from home. With flexible working hours and trust in your productivity, we are in sync with your team’s general locations and time zones to foster effective and seamless collaboration.
Regular in-person team events: we bond over vibrant events that are as unique as our team, from local team and business unit gatherings, to new-joiner onboardings, to company-wide events that bring us all together–literally.
Monthly full-day hacking sessions: every month, we have Hack Fridays, where you can spend your time diving into a project you're passionate about and get the opportunity to work with other teams–we value your initiatives, impact, and creativity.
30 days of annual leave: we value your peace of mind. With 30 days off (excluding public holidays) and access to mental health resources, we make sure you're as strong mentally as you are professionally.
Virtual Shares: An ownership mindset in every role. We believe everyone should share in our success, and that’s why every employee receives Virtual Shares, linking your contribution directly to DeepL’s growth and rewarding you with a stake in our future.
Competitive benefits: just as our team spans the globe, so does our benefits package. We've crafted it to reflect the diversity of our team and tailored it to align with your unique location, to ensure you feel supported every step of the way.

If this role and our mission resonate with you, but you're hesitant because you don't check all the boxes, don't let that hold you back. At DeepL, it's all about the value you bring and the growth we can foster together. Go ahead, apply—let's discover your potential together. We can't wait to meet you!

We are an equal opportunity employer

You are welcome at DeepL for who you are—we appreciate authenticity here. Our product is for everyone, and so is our workplace. The more voices we have represented and amplified in our business, the more we will all succeed, contribute, and think forward! So bring us your personal experience, your perspectives, and your background. It’s in our diversity that we will find the power to break down language barriers in the world.

Top Skills

Ceph

Docker

Ethernet

Infiniband

Kubernetes

Linux

Lustre

Similar Jobs

WISE

Marketing Manager

An Hour Ago

Hybrid

London, England, GBR

Mid level

Fintech • Mobile • Payments • Software • Financial Services

The Events Marketing Manager will lead event strategy and planning for EMEA, manage logistics, and analyze event performance while collaborating with sales and product teams.

Top Skills: HubspotSalesforce

Morningstar

Chief Product Officer

An Hour Ago

Hybrid

London, Greater London, England, GBR

Expert/Leader

Enterprise Web • Fintech • Financial Services

The Chief Product Officer will oversee product development for ESG solutions, lead a product team, collaborate across functions, and ensure product success in the market.

Top Skills: Agile Development MethodologiesAPIsData ProductsProduct Management

Morningstar

Platform Administrator

An Hour Ago

Hybrid

London, Greater London, England, GBR

Entry level

Enterprise Web • Fintech • Financial Services

The Platform Administrator role requires adherence to compliance regarding personal investments and managing employee accounts with approved brokers for conflict of interest. Hybrid work structure in place for collaboration.

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.