Site Reliability Engineer
TechotlistContract
Required Skillset:
JavaPythonAzureDockerJenkinsAnsibleKubernetesSplunkGoBashChefPuppetTerraformPrometheusGrafanaAWSGitHub ActionsCloudFormationGitLab CIELKLinux systems administration
Job Description
- Design and maintain highly available, fault-tolerant systems
- Implement SRE best practices: SLIs, SLOs, SLAs, error budgets
- Automate infrastructure provisioning and deployments
- Improve observability using metrics, logs, and tracing
- Participate in on-call rotations, incident response, and RCA
- Lead blameless postmortems and reliability improvements
- Partner with application teams to embed reliability early
- Ensure adherence to security, audit, and compliance standards
- Reduce operational toil via automation and self-healing systems
Required Technical Skills
Cloud & Infrastructure
- AWS and/or Azure (hybrid environments)
- Kubernetes & Docker
- Linux systems administration
- Infrastructure as Code: Terraform / CloudFormation
Programming & Automation
- Proficiency in Python, Java, Go, or Bash
- CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI)
- Configuration management (Ansible, Chef, Puppet)
Monitoring & Observability
- Prometheus, Grafana
- Splunk / ELK
Similar Jobs
Site Reliability Engineer
California
Feb 17th, 2026
Site Reliability Engineer
North Carolina
Feb 12th, 2026
Site Reliability Engineer
North Carolina
Feb 12th, 2026
Site Reliability Engineer
Remote
Feb 11th, 2026
Site Reliability Engineer
New Jersey
Feb 3rd, 2026