Network Sre Site Reliability Engineer_ Networking
Job Description
Job Title: Network SRE (Site Reliability Engineer – Networking)
Location-Phoenix, AZ (Hybrid)
Role Summary
We are seeking a Network SRE to ensure the reliability, scalability, and performance of cloud and hybrid network platforms. This role applies SRE principles to networking by shifting from manual network operations to automated, observable, and resilient network services. The ideal candidate is a network engineer who thinks like a software engineer and SRE.
Key Responsibilities
Network Reliability Engineering
Define SLIs, SLOs, and Error Budgets for network services.
Design networks for:
High availability
Fault tolerance
Low latency
Predictable performance
Improve network reliability while reducing operational toil.
Required Skills & Qualifications
Must-Have
Experience - 10–16+ Network Engineering with SRE / Automation focus
Strong networking fundamentals (TCP/IP, DNS, BGP, routing)
AWS networking expertise
SRE concepts & practices
Network observability & monitoring
Infrastructure as Code
Production incident handling experience
Cloud & Hybrid Networking
Architect and operate AWS networking:
VPCs, Subnets, Route Tables
Transit Gateway
NAT, IGW
PrivateLink, VPC Endpoints
Design hybrid connectivity:
VPN
Direct Connect
Support multi-account and multi-region architectures.
Network Observability & Monitoring
Build deep network observability using:
VPC Flow Logs
CloudWatch
Datadog
Prometheus / Grafana
Analyze packet loss, latency, and throughput.
Implement proactive alerting based on SLOs.
Correlate network signals with application performance.
Automation & Infrastructure as Code
Automate network provisioning and changes using:
Terraform / CloudFormation
Implement CI/CD for network changes.
Reduce manual configuration and human error.
Version-control network definitions.
Incident Response & Troubleshooting
Lead network-related incident response.
Perform deep root-cause analysis for:
Packet drops
Routing issues
DNS failures
Load balancer degradation
Participate in on-call rotation and post-incident reviews.
Drive permanent fixes rather than workarounds.
Security & Traffic Management
Design and enforce:
Network segmentation
Zero-Trust principles
Firewall rules (Security Groups, NACLs)
Implement secure ingress/egress patterns.
Support DDoS protection (AWS Shield, WAF).
Work with Security teams on audits and remediation.
Performance & Capacity Planning
Conduct traffic modeling and capacity forecasting.
Tune load balancers (ALB, NLB).
Optimize routing and failover strategies.
Validate resilience through failure testing.
Collaboration & Enablement
Partner with:
Cloud Platform teams
Application SREs
Security & Infra teams
Enable application teams with network best practices.
Produce architecture diagrams, runbooks, and SOPs.
Influence platform design decisions.
Similar Jobs
Site Reliability Engineer
FL
Site Reliability Engineer (SRE)
AZ
Sr. Site Reliability Engineer
NY
Site Reliability Engineer / Application Production Support
Remote
Site Reliability Engineer
NY