Data Engineer

AorzonContractMay 1st, 2026

Remote

8 - 9 YearsMay 1st, 2026

39 ViewsBe an Early Applicant

Required Skillset:

PySpark

Job Description

Design, build, and performance-tune Apache Spark workloads using Spark SQL and PySpark for complex transformations (JSON/semi-structured data, nested structures, window functions, joins, aggregations).
2. Profile and optimize Spark jobs: partitioning, shuffles, join strategies, skew, memory/spill, and right-sized resource usage—especially on EMR Serverless—for large-scale and petabyte-scale data.
3. Support Customers and Monitor Pipelines with Strict SLA for Fixs and Re Instating Issues around the clock.
4. Implement reusable patterns for incremental loads, deduplication and CDC-style processing.
5. Build and maintain ETL/ELT on AWS EMR Serverless (Spark), with S3 as the data lake layer: partitioning, compression, external tables, and layouts that support fast Spark and downstream SQL.
workloads: sort keys, distribution, and SQL patterns that fit S3 Spark Redshift flows.
7. Optimize cost and performance across Spark jobs, S3 storage, and Redshift (including retention and lifecycle thinking where relevant).
8. Produce end-to-end designs: pipeline topology, data models, staging vs curated layers, incremental strategies, and clear tradeoffs (freshness, cost, complexity, reliability).
9. Apply access controls for sensitive financial and user data (least privilege, row/column-level patterns where required).

Similar Jobs

Data Engineer

New Jersey

May 1st, 2026

Data Engineer

New Jersey

May 1st, 2026

SQL Data Engineers

May 1st, 2026

SQL Data Engineers

May 1st, 2026

Jr. Data Engineer

Texas

May 1st, 2026

Jr. Data Engineer

Texas

May 1st, 2026

Data Engineer With AI/ML

Remote

May 1st, 2026

Data Engineer With AI/ML

Remote

May 1st, 2026

Azure Data Engineer

Remote

May 1st, 2026

Azure Data Engineer

Remote

May 1st, 2026