Site Reliability Engineer
TechotlistContract
Required Skillset:
JavaPythonLinuxAzureDockerJenkinsAnsibleKubernetesSplunkGoNew RelicBashChefPuppetTerraformPrometheusDatadogGrafanaAWSGitHub ActionsCloudFormationGitLab CIELK
Job Description
Vanguard is seeking a Site Reliability Engineer (SRE) to enhance the reliability, scalability, performance, and security of mission-critical platforms supporting investment management and digital client experiences. This role blends software engineering, cloud infrastructure, and operations, with a strong emphasis on automation and resilience in a highly regulated environment.
Key Responsibilities
- Design and maintain highly available, fault-tolerant systems
- Implement SRE best practices: SLIs, SLOs, SLAs, error budgets
- Automate infrastructure provisioning and deployments
- Improve observability using metrics, logs, and tracing
- Participate in on-call rotations, incident response, and RCA
- Lead blameless postmortems and reliability improvements
- Partner with application teams to embed reliability early
- Ensure adherence to security, audit, and compliance standards
- Reduce operational toil via automation and self-healing systems
Required Technical Skills
Cloud & Infrastructure
- AWS and/or Azure (hybrid environments)
- Kubernetes & Docker
- Linux systems administration
- Infrastructure as Code: Terraform / CloudFormation
Programming & Automation
- Proficiency in Python, Java, Go, or Bash
- CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI)
- Configuration management (Ansible, Chef, Puppet)
Monitoring & Observability
- Prometheus, Grafana
- Splunk / ELK
- Datadog or New Relic
Similar Jobs
Site Reliability Engineer
California
Feb 17th, 2026
Site Reliability Engineer
Pennsylvania
Feb 13th, 2026
Site Reliability Engineer
North Carolina
Feb 12th, 2026
Site Reliability Engineer
Remote
Feb 11th, 2026
Site Reliability Engineer
New Jersey
Feb 3rd, 2026