Data Engineer

AorzonContract
Remote
8 - 9 YearsMay 1st, 2026
39 ViewsBe an Early Applicant
Required Skillset:
PySpark

Job Description


Design, build, and performance-tune Apache Spark workloads using Spark SQL and PySpark for complex transformations (JSON/semi-structured data, nested structures, window functions, joins, aggregations).
2. Profile and optimize Spark jobs: partitioning, shuffles, join strategies, skew, memory/spill, and right-sized resource usage—especially on EMR Serverless—for large-scale and petabyte-scale data.
3. Support Customers and Monitor Pipelines with Strict SLA for Fixs and Re Instating Issues around the clock.
4. Implement reusable patterns for incremental loads, deduplication and CDC-style processing.
5. Build and maintain ETL/ELT on AWS EMR Serverless (Spark), with S3 as the data lake layer: partitioning, compression, external tables, and layouts that support fast Spark and downstream SQL.
workloads: sort keys, distribution, and SQL patterns that fit S3  Spark  Redshift flows.
7. Optimize cost and performance across Spark jobs, S3 storage, and Redshift (including retention and lifecycle thinking where relevant).
8. Produce end-to-end designs: pipeline topology, data models, staging vs curated layers, incremental strategies, and clear tradeoffs (freshness, cost, complexity, reliability).
9. Apply access controls for sensitive financial and user data (least privilege, row/column-level patterns where required).

Similar Jobs

Data Engineer

New Jersey

May 1st, 2026

SQL Data Engineers

AZ

May 1st, 2026

Jr. Data Engineer

Texas

May 1st, 2026

Data Engineer With AI/ML

Remote

May 1st, 2026

Azure Data Engineer

Remote

May 1st, 2026