Data Quality Engineer
Job Description
Location: Dallas, TX (Hybrid – 3 days onsite)
Job Type: Long-term Contract
Work Authorization: Open - W2 opportunity
Interview Process: In-person (Client interview- Mandatory)
We are looking for a Data Engineer Quality Engineer to drive quality across large-scale data platforms and pipelines.
This role focuses on validating batch and streaming data workflows built on AWS, Kafka, Databricks, SQL, and Python.
You will work closely with data engineering, platform, and analytics teams to ensure reliable, accurate, and production-ready data systems.
Key Responsibilities
Data Quality & Validation
* Validate batch and streaming pipelines for correctness, completeness, and timeliness
* Implement data quality checks (nulls, duplicates, schema drift, referential integrity)
* Perform SQL-based validation of business rules and transformations
* Ensure end-to-end data reconciliation and traceability
ETL / ELT Pipeline Testing
* Test pipelines built using AWS services (Glue, Lambda, EMR, Step Functions)
* Validate transformations written in SQL and Python
* Test ingestion, enrichment, aggregation, and publishing layers
* Validate backfills, reprocessing, and historical data loads
* Test Spark-based pipelines (PySpark/Scala) in Databricks
Streaming & Kafka Testing
* Validate Kafka-based pipelines for data integrity, ordering, and delivery semantics
* Test producers/consumers and serialization formats (Avro, JSON, Protobuf)
* Validate topics, partitions, offsets, retention policies, and schema evolution
* Simulate edge cases such as late arrivals, duplicates, and consumer failures
AWS Data Platform Validation
* Work with S3, Glue, Lambda, Redshift, Athena, Kinesis, DynamoDB
* Validate IAM roles, permissions, and secure data access
* Verify data lifecycle policies, encryption, and storage optimization
Automation & Frameworks
* Build and maintain automated data testing frameworks using Python
* Develop reusable test utilities and synthetic datasets
* Integrate testing into CI/CD pipelines
* Enable automated alerts for data quality issues
Performance & Reliability Testing
* Validate pipeline performance for large datasets (throughput, latency, concurrency)
* Test retry logic, error handling, idempotency, and recovery mechanisms
* Perform regression, soak, and failover testing
Monitoring & Observability
* Validate metrics, logs, and alerts using CloudWatch, Prometheus, Grafana
* Define and support data SLAs/SLOs
* Participate in incident response and root cause analysis
Required Qualifications
* 7+ years of experience in QA, SDET, or Data Quality Engineering
* 3+ years of hands-on experience with Databricks
* Strong SQL skills for complex data validation
* Proficiency in Python for automation and data testing
* Experience testing ETL/ELT pipelines
* Hands-on experience with Kafka or other streaming platforms
* Solid understanding of AWS data services (S3, Glue, Redshift, Lambda, Athena)
* Experience working with large-scale distributed systems
* Strong analytical, debugging, and problem-solving skills
Preferred Skills
* Experience with Spark (PySpark/Scala)
* Knowledge of CI/CD integration for data testing
* Familiarity with data observability tools and frameworks
Similar Jobs
Quality Engineering & Test Architecture
Connecticut
Quality Engineering & Test Architecture
Connecticut
Senior QA Engineer / QA Lead / Quality Engineering Lead
Virginia
Data Quality Engineer
Texas
Data Quality Analyst
GA