Focused

Staff SRE - Observability

Posted 8 Days Ago

Be an Early Applicant

Easy Apply

In-Office

London, Greater London, England

Mid level

Easy Apply

In-Office

London, Greater London, England

Mid level

The Staff SRE will design and implement OpenTelemetry solutions, establishing telemetry strategies, enhancing infrastructure, and optimizing observability practices across platforms and architectures.

The summary above was generated by AI

Who we are:

At Focused, we move quickly to deliver quality software that achieves client outcomes and meets their customer’s needs. We strategically partner with our clients to leverage our expertise in design and software, while our clients bring their own domain expertise. We work with a variety of clients from different industries, collaborating as we get new products to market, modernizing legacy systems, or helping teams learn the skills they need to be successful.

Our values:

Listen first • We are experts in product practices but life long learners in the domain of our customers. We research, collaborate, and understand.
Learn why • We ask questions and talk to users to understand problem spaces, objectives, and goals, which allows us to deeply invest and drive towards the outcomes of our clients.
Love your craft • We love diving into a variety of domains and solving problems. We take pride in delivering value, in communicating progress, and guiding our clients to success.

We are seeking an experienced Staff Observability Consultant with deep expertise in OpenTelemetry, leading clients and teams, and strong Platform Engineering capabilities to help organizations implement, optimize, and scale their observability infrastructure. This role requires a seasoned consultant who can design comprehensive telemetry strategies, implement distributed tracing solutions, establish robust monitoring practices, and interface closely with clients on the observability journey.

Key Responsibilities:

OpenTelemetry & Observability

Design and implement end-to-end OpenTelemetry solutions across diverse technology stacks
Configure and deploy OpenTelemetry Collectors for efficient data collection, processing, sampling, and routing
Establish telemetry pipelines for metrics, traces, and logs across microservices architectures
Optimize collector configurations for performance, reliability, and cost-effectiveness

Platform Engineering & Infrastructure

Augment existing infrastructure with with integrated observability solutions
Implement Infrastructure as Code (IaC) solutions using Terraform, Pulumi, CloudFormation, etc.
Architect and manage Kubernetes clusters with comprehensive monitoring and logging
Build CI/CD pipelines with embedded observability and automated testing

Site Reliability Engineering (SRE)

Establish and maintain Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs)
Implement error budgets, toil reduction strategies, and capacity planning
Support incident response procedures and post-mortem processes

Cloud & DevOps Engineering

Deploy and manage observability infrastructure across AWS, GCP, and Azure
Establish security, compliance, and governance frameworks for telemetry data
Experience automating Agent Evaluations in CI/CD pipelines and observability backends.

Required Qualifications:

Core Observability & OpenTelemetry

3-7 years of experience in observability, monitoring, and distributed systems
Deep hands-on experience with OpenTelemetry ecosystem, including SDKs, APIs, and specifications
Proficiency with OpenTelemetry Collector configuration, processors, exporters, and receivers
Strong understanding of telemetry data models, semantic conventions, and instrumentation best practices

Platform Engineering & DevOps

7+ years of Platform Engineering or DevOps experience with focus on site reliability, observability, and incident response
Proficiency with Infrastructure as Code tools (Terraform, Pulumi, CloudFormation, CDK)
Strong experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)

Cloud & Infrastructure

Hands-on experience with major cloud providers (AWS, GCP, Azure) and their observability services
Experience with container technologies (Docker, Podman) and container registries
Knowledge of networking, security, load balancing, and distributed systems concepts

Site Reliability Engineering

Experience implementing SRE practices including error budgets and toil metrics
Proficiency in incident management, on-call procedures, and post-mortem culture
Experience with capacity planning, performance optimization, and scalability design

Programming & Automation

Proficiency in multiple programming languages preferred (Go, Python, Java, Node.js, Rust)
Strong scripting and automation skills (Bash, Python, PowerShell)
Understanding of software engineering best practices and testing methodologies

Preferred Qualifications (Exceptional Candidates)

AI & Agentic Frameworks

Understanding of Large Language Models (LLMs) and their application in DevOps
Knowledge of vector databases, embeddings, and retrieval-augmented generation (RAG)
Experience with AI/ML model deployment and monitoring in production environments

Leadership & Communication

Experience leading teams, managing client relationships and expectations
Strong technical writing and documentation skills
Ability to present complex technical concepts to diverse stakeholders
A passion for knowledge sharing

Key Competencies

Systems thinking and ability to design holistic observability solutions
Strong analytical and troubleshooting skills for complex distributed systems
Curiosity about emerging technologies, particularly AI applications in operations
Adaptability to rapidly evolving cloud-native and observability technologies
Collaborative mindset with focus on enabling developer productivity and system reliability

What Sets Exceptional Candidates Apart:

Experience with Honeycomb
Contributions to open-source observability or AI framework projects
Track record of implementing platform engineering solutions that significantly improved developer experience
Experience scaling observability infrastructure to handle high event volume

What to know before you apply:

You will be expected to work for up to four days a week in person, be it from our office in London or from client sites.
The London base salary range for this role is £95,000 - £130,000 GBP.

Top Skills

AWS

Azure

CloudFormation

Docker

GCP

Java

Kubernetes

Node.js

Opentelemetry

Pulumi

Python

Rust

Terraform

Similar Jobs

Boeing

Quality Inspector - Nights

18 Seconds Ago

In-Office

Sheffield, South Yorkshire, England, GBR

Mid level

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing

The Quality Inspector will review product non-conformances, support quality analysis, conduct audits, assist in problem resolution, and ensure adherence to quality standards in aerospace manufacturing.

Top Skills: Aerospace DesignCmm MachinesGd&TIso 9001Measurement EquipmentQuality Procedures

Boeing

Procurement Agent (Supplier Operations and Performance)

An Hour Ago

In-Office

Birmingham, West Midlands, England, GBR

Senior level

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing

This role involves managing supplier contracts, performance, and relationships for the E-7 UK AMS Modification Programme, ensuring alignment with program goals and metrics.

Top Skills: Contract ManagementMS OfficeProcurement SystemsSupply Chain Management

Boeing

Mission Assurance Co-ordinator (Mid -Level or Experienced)

An Hour Ago

In-Office

Mid level

Aerospace • Information Technology • Cybersecurity • Defense • Manufacturing

The Mission Assurance Co-ordinator will enhance BDUK's Mission Assurance capability by developing strategies, supporting governance, and managing independent assurance reviews.

Top Skills: Aviation Safety Management SystemMbse ToolsOffice Productivity ToolsSystems Engineering

What you need to know about the Bristol Tech Scene

Along with Gloucester, Swindon and Bath, Bristol is part of the "Silicon Gorge" tech hub, a region in the U.K. renowned for its high-tech and research-driven industries, with a particular emphasis on sustainability and reducing environmental impact. As the European Green Capital, Bristol is home to 25,000 cleantech companies, including Baker Hughes and unicorn Ovo Energy. The city has committed to achieving net-zero emissions within the next decade.