QData Inc

Network Sre Site Reliability Engineer_ Networking

QData IncContract
AZH1B, GC, US Citizen, L2
12 - 30 YearsJun 5th, 2026
43 ViewsBe an Early Applicant
Required Skillset:
TerraformPrometheusDatadogGrafanaDirect ConnectSubnetsTransit GatewayRoute TablesTCP/IPDNSAWSCloudWatchCloudFormationVPNBGPNATPrivateLinkWAFVPC Flow LogsVPCsALBNLBVPC EndpointsAWS ShieldAWsIGW

Job Description

Job Title: Network SRE (Site Reliability Engineer – Networking)

Location-Phoenix, AZ (Hybrid)

 

Role Summary

We are seeking a Network SRE to ensure the reliability, scalability, and performance of cloud and hybrid network platforms. This role applies SRE principles to networking by shifting from manual network operations to automated, observable, and resilient network services. The ideal candidate is a network engineer who thinks like a software engineer and SRE.

 

Key Responsibilities

Network Reliability Engineering

Define SLIs, SLOs, and Error Budgets for network services.

Design networks for:

High availability

Fault tolerance

Low latency

Predictable performance

Improve network reliability while reducing operational toil.

 

Required Skills & Qualifications

Must-Have

Experience - 10–16+  Network Engineering with SRE / Automation focus

Strong networking fundamentals (TCP/IP, DNS, BGP, routing)

AWS networking expertise

SRE concepts & practices

Network observability & monitoring

Infrastructure as Code

Production incident handling experience

 

Cloud & Hybrid Networking

Architect and operate AWS networking:

VPCs, Subnets, Route Tables

Transit Gateway

NAT, IGW

PrivateLink, VPC Endpoints

Design hybrid connectivity:

VPN

Direct Connect

Support multi-account and multi-region architectures.

 

Network Observability & Monitoring

Build deep network observability using:

VPC Flow Logs

CloudWatch

Datadog

Prometheus / Grafana

Analyze packet loss, latency, and throughput.

Implement proactive alerting based on SLOs.

Correlate network signals with application performance.

 

Automation & Infrastructure as Code

Automate network provisioning and changes using:

Terraform / CloudFormation

Implement CI/CD for network changes.

Reduce manual configuration and human error.

Version-control network definitions.

 

Incident Response & Troubleshooting

Lead network-related incident response.

Perform deep root-cause analysis for:

Packet drops

Routing issues

DNS failures

Load balancer degradation

Participate in on-call rotation and post-incident reviews.

Drive permanent fixes rather than workarounds.

 

Security & Traffic Management

Design and enforce:

Network segmentation

Zero-Trust principles

Firewall rules (Security Groups, NACLs)

Implement secure ingress/egress patterns.

Support DDoS protection (AWS Shield, WAF).

Work with Security teams on audits and remediation.

 

Performance & Capacity Planning

Conduct traffic modeling and capacity forecasting.

Tune load balancers (ALB, NLB).

Optimize routing and failover strategies.

Validate resilience through failure testing.

 

Collaboration & Enablement

Partner with:

Cloud Platform teams

Application SREs

 

Security & Infra teams

Enable application teams with network best practices.

Produce architecture diagrams, runbooks, and SOPs.

Influence platform design decisions.

Similar Jobs

Site Reliability Engineer

FL

Jun 5th, 2026

Site Reliability Engineer (SRE)

AZ

Jun 5th, 2026

Sr. Site Reliability Engineer

NY

Jun 5th, 2026

Site Reliability Engineer / Application Production Support

Remote

Jun 5th, 2026

Site Reliability Engineer

NY

Jun 5th, 2026