Data Engineer
Job Description
Role: Data Engineer with Databricks and Spark
Location: Bellevue, WA(Onsite – Hybrid)
No of Position: 3
Mandatory Skills: Data Engineer, Databricks, ADF, PySpark, SQL, PowerBI, Delta Lake
Two roles:
Role 1: Focus on Data Engineer with PowerBI + Delta Lake – (1 Position)
Role 2: Focus on Data engineer with Pyspark and SQL knowledge – (2 Positions)
JOB SUMMARY
This role builds, and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction, transformation, and loading of data across batch and real-time workloads. It involves implementing and managing the Medallion Architecture (Bronze → Silver → Gold) using Azure Data Factory, Databricks-PySpark, and Azure SQL Database and Databricks unity catalogue.
The role requires ensuring SLA-adherent data quality standards. Success is measured by pipeline reliability, data freshness SLA compliance, and the quality of Gold-layer datasets powering Power BI executive dashboards.
The work supports organizational decision-making by delivering trusted, well-governed data to business executives and analytics consumers.
Required Skills:
Experience building and optimizing big data pipelines using Azure Data Factory, PySpark, and SQL across structured and semi-structured data sets
Hands-on experience implementing Medallion Architecture (Bronze/Silver/Gold)
Experience with Delta Lake — ACID transactions, incremental loading, schema evolution, partitioning strategies
Experience performing root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities
Azure Foundational Services :
Working knowledge of: Azure Data Factory (ADF), ADLS Gen2, Azure SQL Database, Azure Blob Storage, Azure Key Vault, Azure Monitor / Log Analytics, Azure Event Hubs, Microsoft Fabric Lakehouse, Azure Active Directory / Entra ID (RBAC, Service Principals)
Programming Languages:
Proficiency in Python and PySpark for data transformation, pipeline automation, and large-scale distributed processing; strong SQL skills including window functions, CTEs, and query optimization across relational and lakehouse engines
Data Architecture:
Solid understanding of Medallion Architecture, dimensional modeling (Star Schema, SCD Types 1/2/3), and the trade-offs between lakehouse, data warehouse, and data lake patterns
Pipeline Engineering:
Ability to build robust ADF pipelines with ForEach, Lookup, Copy Activity, and Data Flows; incremental loading via watermark or CDC; error handling, retry logic, and dead-letter patterns
Data Quality Experience:
Experience implementing SLA-based data quality checks (freshness, completeness, row count), monitoring via Azure Monitor and ADF diagnostic logs, and defining data quality agreements with business stakeholders.
DevOps for Data:
Experience with Git-based workflows, ADF Git integration, CI/CD pipeline promotion across Dev/Test/Prod using Azure DevOps or GitHub Actions
Reporting Layer Awareness:
Understanding of how Gold-layer data feeds Power BI — DirectQuery vs. Import mode trade-offs, dataset refresh patterns, and semantic model collaboration with BI teams
Ability to manage work across multiple concurrent pipeline projects, prioritize by business impact, and communicate status clearly to technical and non-technical stakeholders
Good to have skills:
Experience with Microsoft Fabric (Lakehouse, Notebooks, OneLake, Fabric Pipelines) — active migration or greenfield project
Experience with real-time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark
Experience delivering data platforms for executive-level reporting via Power BI semantic models
Similar Jobs
Data Engineer
New York
AWS Data Engineer
GA
Senior Data Engineer
New Jersey
Snowflake Data Engineer
Colorado
Data Engineer
Texas