
Data Engineer
Job Description
Data Engineer (Python / PySpark / AWS)
Location: McLean, VA (Onsite)
Ex-CapitalOne
Responsibilities:
Develop and optimize scalable data pipelines using Python and PySpark, leveraging AWS services such as Glue, EMR, and Step Functions.
Design, implement, and maintain robust data processing systems using AWS Glue for ingestion, transformation, and orchestration.
Utilize EMR clusters for distributed data processing and analytics, ensuring optimal performance and resource utilization.
Build and manage serverless workflows using AWS Step Functions to orchestrate complex data pipelines.
Collaborate with cross-functional teams to gather data requirements and deliver scalable, efficient solutions.
Ensure data quality, governance, and security best practices across the data lifecycle.
Troubleshoot and resolve production data issues using AWS monitoring and logging tools.
Stay current with emerging big data technologies, AWS services, and data engineering best practices.
Required Qualifications:
Bachelor’s degree in Computer Science, Engineering, or related field.
Proven expertise in building data pipelines using Python and PySpark, with strong hands-on experience in AWS Glue, EMR, and Step Functions.
Strong experience with Relational Databases and Snowflake (Mandatory).
Hands-on experience with workflow orchestration tools such as Apache Airflow and CI/CD pipelines.
Proficiency in SQL for complex data manipulation and querying.
Experience with distributed computing frameworks such as Hadoop and Spark.
Strong understanding of cloud computing principles and AWS ecosystem.
Excellent problem-solving, analytical, and communication skills.
Experience working in Agile environments is a plus.
This is a mandatory onsite role in McLean, VA.
Preferred:
AWS Solution Architect Certification (nice to have).
Similar Jobs
Data Engineer
Remote
Data Engineer
Remote
Data Engineer
Remote
Data Engineer
California
Data Engineer
Virginia