Equitas IT INC

GCP Agentic Platform Support Lead

Equitas IT INCContract
New York
8 - 10 YearsFeb 20th, 2026
26 ViewsBe an Early Applicant
Required Skillset:
JiraDataflowCloud ComposerCloud LoggingSAPServiceNowGISMean Time to Repair (MTTR)Mean Time Between Failures (MTBF)self-service billinguptime dashboardsMonitoring reportsGCP billingBigQuery slot spikesGoogle Cloud Supportroot cause analysis (RCA)IAM errorsconfiguration driftsCloud Monitoring Reports

Job Description

Job Summary

SLA & Reliability Reporting

  • Establish the initial framework for tracking Mean Time to Repair (MTTR) and Mean Time Between Failures (MTBF)
  • Configure self-service billing and uptime dashboards for Con Edison stakeholders

Foundation, Maintenance & Optimization

  • Develop and deploy the initial suite of Cloud Logging and Monitoring reports to establish platform visibility
  • Monitor GCP billing for anomalies (e.g., BigQuery slot spikes) and implement tactical fixes to ensure budget adherence
  • Build and maintain the "Golden Path" runbooks to ensure operational procedures are documented as they are established

Platform Monitoring & Incident Management

  • Conduct solo reviews of overnight batch processing logs (e.g., Cloud Composer/Dataflow) to verify completion and identify failures before business hours progress
  • Receive and prioritize platform-related tickets; determine if issues stem from infrastructure, pipelines, or upstream sources
  • Execute root cause analysis (RCA) and apply fixes for code-based failures, IAM errors, or configuration drifts
  • Act as the primary technical point of contact for Google Cloud Support or Con Edison Source System teams (SAP, GIS) when issues are external to the platform

Minor Enhancements (Capacity-Based

  • Maintain a prioritized backlog of minor requests to be addressed only after platform stability and incidents are managed
  • Within available bandwidth, execute minor schema updates, ingestion schedule tweaks, or IAM modifications

Workstream Deliverables:

  • Operations Runbook: The definitive MS Word resource reflecting current operational procedures and recovery steps (MS Word)
  • Integrated Health & Cost Reporting: Automated tracking of service uptime and GCP spend via Cloud Monitoring (Cloud Monitoring Reports)
  • Unified Incident & RCA Logs: A centralized record of Critical/High severity incidents and their resolutions, stored in the agreed management tool (ServiceNow/Jira or similar)
  • Recovery & Maintenance Code: Validated code merged into the repository for bug fixes and configuration updates, including detailed release notes (GCP Code) 

 

Similar Jobs

Citrix Services Engineer / Citrix Platform Lead

Remote

Feb 20th, 2026

Production Support Engineer

Texas, Nebraska, New Jersey … +2

Feb 20th, 2026

Webmethods Integration Support Lead

New York

Feb 20th, 2026

Agentic Platform Manager

New York

Feb 20th, 2026

Senior Cloud Platform Software Engineer

AZ

Feb 20th, 2026