Site Reliability Engineer
Job Description
Job Description:
We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong focus on availability, reliability, and performance to join our team. The ideal candidate will have extensive experience in production batch support, Unix shell scripting, and cloud technologies, particularly Google Cloud Platform (GCP). This role requires a proactive individual who can monitor and enhance system reliability while effectively managing batch production incidents.
Core Responsibilities:
Monitor batch flow to ensure system reliability and stability.
Handle batch production incidents and escalations promptly.
Create and support batch plans for both planned and unplanned outages.
Improve alert quality and reduce noise in monitoring systems.
Provide support for batch jobs in a 24x7 shift model.
Collaborate with onshore and offshore teams to ensure effective communication and coordination.
Participate in incident, problem, and change management processes.
Conduct root cause analysis (RCA) and post-incident reviews.
Support production release and change validation efforts.
Key Skills and Qualifications:
SRE & Production Batch Support: Minimum of 8-10 years of proven experience with a strong focus on availability, reliability, and performance.
Unix Shell Scripting: Minimum of 5 years of experience with Unix commands and shell scripting.
Informatica: At least 5 years of experience working with Informatica, including the ability to create mappings.
Google Cloud Platform (GCP): Minimum of 3 years of experience with proficiency in BigQuery, Cloud Spanner, Airflow, and monitoring & logging tools.
Database & Backend Technologies:
MS SQL: Experience in query writing.
PL/SQL: Knowledge of stored procedures and batch job support.
Snowflake: Proficient in query writing.
Scheduling Tools: Experience with at least one scheduling tool (e.g., Control-M, Tidal).
Operations & ITIL: Familiarity with incident, problem, and change management processes.
Similar Jobs
Site Reliability Engineer (Sre) Architect
Remote
Site Reliability Engineer (Sre)
AZ
Site Reliability Engineer (Sre) Vulnerability Management
Washington
Site Reliability Engineer
Remote
(Sre) Site Reliability Engineer / GCP
GA