Data Quality Engineer

Plugins IncContract
Texas
8 - 10 YearsMay 1st, 2026
29 ViewsBe an Early Applicant
Required Skillset:
PythonScalaKafkaDatabricksPrometheusGrafanaAvroProtobufSQLAWS LambdaCloudWatchAWS GlueAWS AthenaAWS S3AWS DynamoDBJSONPySparkAWS RedshiftAWS IAMAWS EMRAWS Step FunctionsAWS Kinesis

Job Description

Location: Dallas, TX (Hybrid – 3 days onsite)

Job Type: Long-term Contract

Work Authorization: Open - W2 opportunity

Interview Process: In-person (Client interview- Mandatory)

 

We are looking for a Data Engineer Quality Engineer to drive quality across large-scale data platforms and pipelines.

This role focuses on validating batch and streaming data workflows built on AWS, Kafka, Databricks, SQL, and Python.

You will work closely with data engineering, platform, and analytics teams to ensure reliable, accurate, and production-ready data systems.

 

Key Responsibilities

 

Data Quality & Validation

* Validate batch and streaming pipelines for correctness, completeness, and timeliness

* Implement data quality checks (nulls, duplicates, schema drift, referential integrity)

* Perform SQL-based validation of business rules and transformations

* Ensure end-to-end data reconciliation and traceability

ETL / ELT Pipeline Testing

* Test pipelines built using AWS services (Glue, Lambda, EMR, Step Functions)

* Validate transformations written in SQL and Python

* Test ingestion, enrichment, aggregation, and publishing layers

* Validate backfills, reprocessing, and historical data loads

* Test Spark-based pipelines (PySpark/Scala) in Databricks

Streaming & Kafka Testing

* Validate Kafka-based pipelines for data integrity, ordering, and delivery semantics

* Test producers/consumers and serialization formats (Avro, JSON, Protobuf)

* Validate topics, partitions, offsets, retention policies, and schema evolution

* Simulate edge cases such as late arrivals, duplicates, and consumer failures

AWS Data Platform Validation

* Work with S3, Glue, Lambda, Redshift, Athena, Kinesis, DynamoDB

* Validate IAM roles, permissions, and secure data access

* Verify data lifecycle policies, encryption, and storage optimization

Automation & Frameworks

* Build and maintain automated data testing frameworks using Python

* Develop reusable test utilities and synthetic datasets

* Integrate testing into CI/CD pipelines

* Enable automated alerts for data quality issues

Performance & Reliability Testing

* Validate pipeline performance for large datasets (throughput, latency, concurrency)

* Test retry logic, error handling, idempotency, and recovery mechanisms

* Perform regression, soak, and failover testing

Monitoring & Observability

* Validate metrics, logs, and alerts using CloudWatch, Prometheus, Grafana

* Define and support data SLAs/SLOs

* Participate in incident response and root cause analysis

Required Qualifications

* 7+ years of experience in QA, SDET, or Data Quality Engineering

* 3+ years of hands-on experience with Databricks

* Strong SQL skills for complex data validation

* Proficiency in Python for automation and data testing

* Experience testing ETL/ELT pipelines

* Hands-on experience with Kafka or other streaming platforms

* Solid understanding of AWS data services (S3, Glue, Redshift, Lambda, Athena)

* Experience working with large-scale distributed systems

* Strong analytical, debugging, and problem-solving skills

Preferred Skills

* Experience with Spark (PySpark/Scala)

* Knowledge of CI/CD integration for data testing

* Familiarity with data observability tools and frameworks

Similar Jobs

Quality Engineering & Test Architecture

Connecticut

May 1st, 2026

Quality Engineering & Test Architecture

Connecticut

May 1st, 2026

Senior QA Engineer / QA Lead / Quality Engineering Lead

Virginia

May 1st, 2026

Data Quality Engineer

Texas

May 1st, 2026

Data Quality Analyst

GA

Apr 30th, 2026