
GCP Agentic Platform Support Lead
Equitas IT INCContract
Required Skillset:
JiraDataflowCloud ComposerCloud LoggingSAPServiceNowGISMean Time to Repair (MTTR)Mean Time Between Failures (MTBF)self-service billinguptime dashboardsMonitoring reportsGCP billingBigQuery slot spikesGoogle Cloud Supportroot cause analysis (RCA)IAM errorsconfiguration driftsCloud Monitoring Reports
Job Description
Job Summary
SLA & Reliability Reporting
- Establish the initial framework for tracking Mean Time to Repair (MTTR) and Mean Time Between Failures (MTBF)
- Configure self-service billing and uptime dashboards for Con Edison stakeholders
Foundation, Maintenance & Optimization
- Develop and deploy the initial suite of Cloud Logging and Monitoring reports to establish platform visibility
- Monitor GCP billing for anomalies (e.g., BigQuery slot spikes) and implement tactical fixes to ensure budget adherence
- Build and maintain the "Golden Path" runbooks to ensure operational procedures are documented as they are established
Platform Monitoring & Incident Management
- Conduct solo reviews of overnight batch processing logs (e.g., Cloud Composer/Dataflow) to verify completion and identify failures before business hours progress
- Receive and prioritize platform-related tickets; determine if issues stem from infrastructure, pipelines, or upstream sources
- Execute root cause analysis (RCA) and apply fixes for code-based failures, IAM errors, or configuration drifts
- Act as the primary technical point of contact for Google Cloud Support or Con Edison Source System teams (SAP, GIS) when issues are external to the platform
Minor Enhancements (Capacity-Based
- Maintain a prioritized backlog of minor requests to be addressed only after platform stability and incidents are managed
- Within available bandwidth, execute minor schema updates, ingestion schedule tweaks, or IAM modifications
Workstream Deliverables:
- Operations Runbook: The definitive MS Word resource reflecting current operational procedures and recovery steps (MS Word)
- Integrated Health & Cost Reporting: Automated tracking of service uptime and GCP spend via Cloud Monitoring (Cloud Monitoring Reports)
- Unified Incident & RCA Logs: A centralized record of Critical/High severity incidents and their resolutions, stored in the agreed management tool (ServiceNow/Jira or similar)
- Recovery & Maintenance Code: Validated code merged into the repository for bug fixes and configuration updates, including detailed release notes (GCP Code)
Similar Jobs
Citrix Services Engineer / Citrix Platform Lead
Remote
Feb 20th, 2026
Production Support Engineer
Texas, Nebraska, New Jersey … +2
Feb 20th, 2026
Webmethods Integration Support Lead
New York
Feb 20th, 2026
Agentic Platform Manager
New York
Feb 20th, 2026
Senior Cloud Platform Software Engineer
AZ
Feb 20th, 2026