Lead Site Reliability Engineer

Posted 5 Days Ago
Be an Early Applicant
Remote
Big Data • Software
The Role
Seeking a Lead Site Reliability Engineer to ensure the reliability, scalability, and performance of software systems and infrastructure. Responsibilities include defining reliability standards, designing scalable architectures, developing automated tools, troubleshooting issues, and participating in incident response. Requirements include a Bachelor's degree, experience with large-scale distributed systems, proficiency in Python, C++, or Go, and familiarity with monitoring tools.
Summary Generated by Built In

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.

 

Responsibilities:

  • Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
  • Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
  • Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
  • Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
  • Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
  • Take a proactive approach to continuously improving reliability.
  • Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
  • Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
  • Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
  • Participate in on-call rotation.

 

Requirements:

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Proven experience deploying and managing large-scale distributed systems successfully.
  • Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
  • Proficiency in programming languages such as Python, C++, or Go
  • Familiarity with monitoring and observability tools.
  • Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
  • Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.

 

Desirable Qualifications:

  • Familiarity with security best practices and experience implementing security measures in a production environment.
  • Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
  • Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
  • Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).

 

Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment. 

Top Skills

C++
Go
Python
The Company
HQ: Norwalk, CT
10,310 Employees
On-site Workplace
Year Founded: 1978

What We Do

FactSet creates flexible, open data and software solutions for tens of thousands of investment professionals around the world, providing instant access to financial data and analytics that investors use to make crucial decisions.

For 40 years, through market changes and technological progress, our focus has always been to provide exceptional client service. From more than 60 offices in 23 countries, we’re all working together toward the goal of creating value for our clients, and we’re proud that 95% of asset managers who use FactSet continue to use FactSet, year after year.

As big as we grow, as far as we reach, and as successful as we become, we stay connected to our clients and to each other.

Similar Jobs

Remote
United Kingdom
10310 Employees

Applied Systems Logo Applied Systems

Cloud Network Engineer

Cloud • Insurance • Payments • Software • App development • Big Data Analytics
Remote
United Kingdom
2780 Employees

Smartcat Logo Smartcat

Chief Software Architect (.NET) - Europe

Artificial Intelligence • Machine Learning • Natural Language Processing • Conversational AI
Easy Apply
Remote
28 Locations
242 Employees

Motorola Solutions Logo Motorola Solutions

Test Engineer

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics
Remote
United Kingdom
21000 Employees

Similar Companies Hiring

Workrise Thumbnail
Software • Professional Services • Information Technology • Energy
Austin, TX
275 Employees
Magnite Thumbnail
Software • Digital Media • Big Data • AdTech
Los Angeles, CA
915 Employees
Applied Systems Thumbnail
Software • Payments • Insurance • Cloud • Big Data Analytics • App development
Chicago, IL
2780 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account