EOS IT Solutions Logo

EOS IT Solutions

Collaboration Reliability Engineering Lead

Posted 13 Days Ago
Be an Early Applicant
Remote
Mid level
Remote
Mid level
Lead and mentor a team to support advanced collaboration technologies, ensuring system reliability, managing incidents, and implementing monitoring tools.
The summary above was generated by AI
WHO WE ARE:

EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world’s largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.
We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.

WHAT YOU’LL DO:
**While this position is remote, only candidates in Pacific Time Zone will be considered**

We are seeking an experienced and technically proficient Collaboration Reliability Engineering Lead to join our team. In this role, you will support advanced collaboration technologies in a fast-paced and industry-leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, operations, networking, scripting, and collaboration technologies, and a passion for continuous learning.

TEAM LEADERSHIP:

    • Lead, mentor, and manage a global team of 8-12 reliability engineers.
    • Foster ownership, accountability, and collaboration within the team.
    • Develop team members technical and professional skills through coaching and performance reviews.

SYSTEM RELIABILITY AND PERFORMANCE:

    • Oversee maintenance of highly available and scalable architecture including but not limited to cisco server templates, endpoints, edge & proxy appliances
    • Develop, present, and achieve service-level objectives (SLOs), service-level agreements (SLAs), and key performance indicators (KPIs).
    • Perform quality assurance on video conferencing infrastructure, calendar tooling, touch panel hardware, automation bots, cisco endpoints, and call center tooling.   

INCIDENT MANAGEMENT RESOLUTION:

    • Drive incident response, root cause analysis, and post-mortem processes to identify and address reliability issues impacting users.
    • Implement proactive monitoring, alerting, and automation to minimize downtime and improve recovery times in live production environments.
    • Serve as an escalation point for video conferencing infrastructure and network troubleshooting, maintaining up-to-date documentation and on-call runbooks.

RELIABLILITY IMPROVEMENTS:

    • Identify opportunities to improve system performance and reduce operational toil.
    • Develop and implement strategies for failure testing, and future-capacity planning.

CROSS FUNTIONAL COLLABORATION:

    • Work closely with engineering, security, networking, and third-party vendors (e.g., Cisco, Brightsign, Arista, Zoom, Webex) to resolve support cases and critical escalations.
    • Provide highly-visible communications to hundreds of users regarding large scale changes and updates.
    • Advocate for reliability-focused initiatives and communicate their value to stakeholders.

TOOLS AND AUTOMATION:

    • Leverage internal-tooling to monitor, analyze, and improve system reliability.
    • Lead efforts to automate repetitive tasks, ensuring efficient system operations.

TECHNICAL REQUIREMENTS:

  • 3+ years of experience in Reliability Engineering or similar roles.

Health Monitoring: Experience implementing and coordinating telemetry using monitoring tools like Splunk, Grafana, and Prometheus, or similar technologies.

  • VMware expertise: Hands-on experience with VMware from a VM deployment, lifecycle and API/CLI perspective
  • ITIL Knowledge: Understanding of ITIL processes, service management principles, and IT service delivery best practices
  • Automation: Experience as an automation advocate with a history of removing operational toil via software
  • Experience supporting internet-facing production services and distributed systems, including: Deployments, On-Call rotations, and Incident management.

TECHNICAL SKILLS:

  • Familiarity with Bash, Python, Terraform, and REST APIs.
  • Fundamental understanding of networking protocols (e.g., HTTP, TCP/IP, WebRTC, SIP).
  • Infrastructure components (e.g., load balancers, firewalls, DNS).

ADDITIONAL KEY PRIORITIES:

  • Expertise in disaster recovery and future-capacity planning.
  • Excellent communication and interpersonal skills, with the ability to work effectively in a team-oriented environment.
  • Self-motivated and eager to learn new technologies, tools, and methodologies.

Experience with collaboration hardware, platforms (e.g., Zoom, Microsoft Teams, WebEx), or media delivery networks.The EOS pay range for this job is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) location, responsibilities of the job, experience, education, knowledge, skills, and abilities, as well as internal equity, market data, or other laws. 


EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor. 

The EOS pay range for this job is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) location, responsibilities of the job, experience, education, knowledge, skills, and abilities, as well as internal equity, market data, or other laws. 

#LI-ML1
#LI-Hybrid

Pay Range

$135,000$150,000 USD

Top Skills

Bash
Cisco
Grafana
HTTP
Prometheus
Python
Rest Apis
Sip
Splunk
Tcp/Ip
Terraform
VMware
Webrtc
HQ

EOS IT Solutions Banbridge, Northern Ireland Office

10 Cascum Cresent, The Boulevard, Banbridge, United Kingdom, BT32 4GL

Similar Jobs

4 Hours Ago
Remote
Hybrid
GB
Senior level
Senior level
Productivity • Sales • Software
Lead the Reporting team at monday.com, shaping engineering standards and fostering a culture of innovation while delivering impactful solutions.
Top Skills: AWSK8SMySQLNode.jsReactRedisRuby On Rails
4 Hours Ago
Remote
Hybrid
GB
Mid level
Mid level
Productivity • Sales • Software
Lead and mentor a team in feature development, automation, and Agile planning. Drive technical leadership and manage architecture evolution in a collaborative environment.
Top Skills: AWSElasticsearchMySQLNode.jsReactRedisReduxRuby On Rails
4 Hours Ago
Remote
Hybrid
GB
Senior level
Senior level
Productivity • Sales • Software
The Engineering Director will lead the Reporting & Analytics team, develop scalable solutions, and enhance analytics capabilities on the monday.com platform.
Top Skills: AnalyticsEngineering StrategySaaS

What you need to know about the Belfast Tech Scene

If asked to name the birthplace of the RMS Titanic, you might not say Belfast. Similarly, if asked to name Europe's leading destination for foreign direct investment in new software development, Belfast might not come to mind. Yet, both are true. The city has emerged as a tech powerhouse, recently ranked among the best in the U.K. for tech careers — especially for software developers. It also leads the U.K. with the highest percentage of software development jobs advertised.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account