Factset

Lead Site Reliability Engineer

Posted 7 Hours Ago

Be an Early Applicant

Remote

Hiring Remotely in United Kingdom

Senior level

Remote

Hiring Remotely in United Kingdom

Senior level

The Lead Site Reliability Engineer will ensure the reliability and performance of software systems by collaborating with various teams to implement best practices, designing scalable architectures, developing automated tools, and participating in incident response. Key tasks include system monitoring, deployment optimization, troubleshooting, performance analysis, and continuous improvement of reliability.

The summary above was generated by AI

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.

Responsibilities:

Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
Take a proactive approach to continuously improving reliability.
Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
Participate in on-call rotation.

Requirements:

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
Proven experience deploying and managing large-scale distributed systems successfully.
Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
Proficiency in programming languages such as Python, C++, or Go
Familiarity with monitoring and observability tools.
Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.

Desirable Qualifications:

Familiarity with security best practices and experience implementing security measures in a production environment.
Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).

Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment.

Top Skills

C++

Python

Similar Jobs

GitLab

Intermediate Site Reliability Engineer, Software Delivery

30 Minutes Ago

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

The Intermediate Site Reliability Engineer will enhance GitLab's delivery platform by automating release processes, improving monitoring, and optimizing deployment strategies. Key tasks include collaborating with Engineering teams, creating new tools, and ensuring timely and efficient software releases.

Top Skills: Kubernetes

GitLab

Intermediate Site Reliability Engineer, Database Operations

25 Minutes Ago

Easy Apply

Remote

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

As an Intermediate Site Reliability Engineer, you'll manage PostgreSQL lifecycle for GitLab.com, collaborating with teams to enhance operations, plan capacity, respond to alerts, and implement security measures. You'll also automate tasks and improve monitoring systems for efficient database management.

Top Skills: AnsibleChefGCPGrafanaLinuxPostgresPrometheusTerraform

Factset

Lead Site Reliability Engineer

2 Days Ago

Remote

United Kingdom

Senior level

Big Data • Software

As a Lead Site Reliability Engineer, you'll ensure software systems' reliability and performance, collaborate with teams to establish best practices, develop automated tools, troubleshoot complex issues, and participate in incident responses while continuously enhancing system reliability.

Top Skills: C++GoPython

What you need to know about the Bristol Tech Scene

Along with Gloucester, Swindon and Bath, Bristol is part of the "Silicon Gorge" tech hub, a region in the U.K. renowned for its high-tech and research-driven industries, with a particular emphasis on sustainability and reducing environmental impact. As the European Green Capital, Bristol is home to 25,000 cleantech companies, including Baker Hughes and unicorn Ovo Energy. The city has committed to achieving net-zero emissions within the next decade.