Fluidstack Logo

Fluidstack

Director, Network Engineering

Posted 8 Days Ago
Be an Early Applicant
In-Office or Remote
4 Locations
Expert/Leader
In-Office or Remote
4 Locations
Expert/Leader
Lead architecture, design, and operations of networking services for AI infrastructure. Build and mentor a networking team, focusing on automation, performance, and reliability.
The summary above was generated by AI
About Fluidstack

We build and operate high-performance GPU clusters so the most ambitious teams can move fast, stay focused, and scale without friction. Our clusters power top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.

Our team is highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.

We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.

You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.

About the Role

As Director of Network Engineering, you will lead the architecture, design, and operations of our network services that power our AI infrastructure platform. In this role, you will architect networks that move packets for frontier AI models while ensuring maximum reliability and performance through extensive automation. You will build a team that spans.

You will build and lead a world-class network engineering team ranging from junior network engineers eager to learn high-performance computing, to senior architects who have scaled networks at hyperscalers, to specialized engineers with deep expertise in RDMA/InfiniBand for AI workloads. Your team will span network operations, architecture, automation engineering, and performance optimization roles. You'll be responsible for hiring, mentoring, and developing this team while establishing a culture of technical excellence and continuous learning

Focus
  • Build networks that scale beyond hundreds of thousands of GPUs.

  • Collaborate with compute, storage, security, and data center teams to deliver integrated infrastructure solutions

  • Build and lead a team of network engineers and architects focused on performance, reliability, and automation.

  • Automate everything. Manual processes kill velocity. Build systems that configure themselves, heal themselves, and optimize themselves. Drive automation initiatives across service deployment, provisioning, and lifecycle management

  • Design scalable network architectures supporting clusters from 2,000 to 200,000 GPUs

  • Optimize traffic patterns for AI/ML training workloads and high-performance computing

  • Lead the design and implementation of scalable, high-performance network architectures supporting GPU clusters and AI workloads

  • Establish comprehensive monitoring, alerting, and incident response procedures. Create remediation systems that detect and resolve issues before customer impact

  • Lead root cause analysis and implement preventive measures for network incidents

  • Ensure network reliability, security, and performance meet the demanding requirements of AI supercomputing workloads

  • Ensure compliance with data sovereignty and regulatory requirements

About You
  • 10+ years of experience designing and operating large-scale network infrastructure

  • 5+ years in leadership roles at cloud providers, hyperscalers, or technology companies

  • Deep expertise in routing protocols and distributed network design

  • Proven track record scaling networks for high-throughput, low-latency workloads

  • Experience with AI/ML infrastructure and GPU cluster networking (RoCE / InfiniBand)

  • Deep understanding of internet routing, switching, peering, and distributed network design.

  • Expert knowledge of routing protocols (BGP, EVPN), TCP/IP, and network services (DHCP, DNS)

  • Proven track record of designing and operating large-scale, high-performance networks in cloud or datacenter environments

  • Strong knowledge of automation frameworks (e.g., Ansible, Terraform) and infrastructure-as-code principles

  • Experience offloading services into smart NICs and working with hardware acceleration technologies

  • Excellent communication skills with ability to influence technical strategy across organizations

  • Monitoring stacks (Prometheus, Grafana) and observability best practices

Nice to haves
  • Contributions to open-source networking projects

  • Experience with network source of truth platforms (NetBox, Nautobot, ..) and integrating them with automation workflows

  • Familiarity with Kubernetes networking, overlay networks, and container networking solutions

Benefits
  • Competitive total compensation package (cash + equity).

  • Retirement or pension plan, in line with local norms.

  • Health, dental, and vision insurance.

  • Generous PTO policy, in line with local norms.

Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Top Skills

Ai/Ml Infrastructure
Ansible
Bgp
Dhcp
Distributed Network Design
Dns
Evpn
Gpu Clusters
Grafana
Infiniband
Kubernetes
Prometheus
Roce
Routing Protocols
Software-Defined Networking
Tcp/Ip
Terraform

Similar Jobs

An Hour Ago
Remote or Hybrid
Albany, NY, USA
Junior
Junior
Automotive • Professional Services • Software • Consulting • Energy • Chemical • Renewable Energy
As a Junior Solar Energy Analyst, you will analyze solar energy projects, validate data, estimate production, and prepare client reports.
Top Skills: Energy Production EstimatesMeteorological Data AnalysisProject Management SoftwareSolar Resource Assessments
An Hour Ago
In-Office or Remote
Los Angeles, CA, USA
Junior
Junior
Artificial Intelligence • Cloud • Information Technology • Machine Learning • Consulting • Generative AI • Big Data Analytics
Manage onboarding for new Mission Cloud customers, gather technical requirements, and ensure implementation aligns with contracts while providing guidance on AWS technologies.
Top Skills: Aws Cloud
An Hour Ago
Remote or Hybrid
USA
Mid level
Mid level
Machine Learning • Payments • Security • Software • Financial Services
As a Product Owner II, you will manage and prioritize the product backlog, lead Scrum teams, ensure alignment with customer needs, and deliver business value through effective product management.
Top Skills: Agile MethodsConfluenceJIRASafe

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account