System Reliability Engineer
Job Description
Position: System Reliability Engineer (SRE)
Location: Atlanta, GA Onsite
Need Ex T Mobile exp candidates
The System Reliability Engineer (SRE) is responsible for ensuring the availability, performance, scalability, and reliability of Client's customer‑facing digital web platforms. This role partners closely with Digital Web, Platform, and Product teams to support high‑traffic web experiences, proactively prevent incidents, and continuously improve operational excellence.
The SRE applies engineering practices to operations, focusing on automation, monitoring, incident management, and resiliency across modern cloud‑native environments.
Required Qualifications
Technical Skills
Experience supporting web‑scale, customer‑facing digital platforms
Strong knowledge of Linux, networking fundamentals, and distributed systems
Hands‑on experience with monitoring and alerting tools (Grafana, Splunk, AppDynamics)
Experience with Kubernetes and containerized applications
Familiarity with CI/CD pipelines and deployment automation
Experience
3+ years of experience in Site Reliability Engineering, DevOps, or Production Support
Experience supporting mission‑critical systems with strict SLAs
Proven ability to handle high‑severity production incidents
Soft Skills
Clear and calm communication during incidents
Strong problem‑solving and troubleshooting skills
Comfortable working in a fast‑paced, always‑on digital environment
Ability to collaborate across engineering, product, and operations teams
Preferred Qualifications
Experience supporting telecom or large‑scale consumer digital platforms
Exposure to AEM or modern web frameworks
Experience with infrastructure as code (Terraform or similar)
Prior experience supporting digital platforms
Similar Jobs
Site Reliability Engineer
Texas
Site Reliability Engineer
Texas
Site Reliability Engineer
Remote
Site Reliability Engineer
California
Site Reliability Engineer
New York