Data Engineer
MetasoftincContract
Required Skillset:
PythonAzureRedisLambdaApache AirflowS3RedshiftKafkaGlueStep FunctionsData GovernanceData CatalogingMetadata ManagementAirflowSemantic Layer DesignAWSFastAPIMySQLPostgreSQLPySparkBigQuerypandasMLflowLLMsSageMakerMLOpsElasticSearchDynamoDBContainerization (Docker)Orchestration (Kubernetes)TS/SCI
Job Description
job Title: ITP_ Data Engineer
Location: On-site, Springfield VA || Nebraska Avenue Complex (NAC) - Washington DC - DHS Headquarters
Client: DHS HQ
Employment Type: Full-Time
- Clearance Required: Current TS/SCI and Must be a US Citizen. Ability to obtain DHS EOD SCI.
Position Overview
We are seeking a skilled ITP_Data Engineer to design, build, and maintain robust data pipelines that enable scalable, secure, and intelligent data processing across cloud environments. The ideal candidate will have hands-on experience in data acquisition from diverse sources, deep familiarity with modern data storage paradigms, and a passion for building pipelines that support advanced analytics and AI/ML use cases.
Key Responsibilities
- Data Ingestion & Acquisition: Collect and integrate data from a wide variety of structured and unstructured sources including APIs, RDBMS, file systems, third-party services, and real-time streams.
- Pipeline Development: Design and implement scalable ETL/ELT pipelines to clean, enrich, normalize, and semantically align data (ontology-driven transformations).
- Cloud Deployment: Build and deploy data pipelines and associated infrastructure on AWS or Azure, using managed services like Lambda, Glue, Step Functions, Azure Data Factory, etc.
- Database Architecture: Understand and optimize for different storage engines—relational (PostgreSQL, MySQL), columnar (Redshift, BigQuery), indexing engines (ElasticSearch), key-value stores (DynamoDB, Redis), Object stores (S3 or similar) and caching layers.
- Streaming Data Processing: Work with Apache Kafka (or similar platforms) to handle high-volume, low-latency data streams.
- Workflow Orchestration: Utilize Apache Airflow (or equivalent) to schedule and monitor complex data workflows.
- AI/ML Integration: Collaborate with data scientists to integrate LLMs and ML models into pipelines for inference, tagging, enrichment, or intelligent routing of data.
Required Qualifications
- Bachelor's or master’s degree in computer science, Engineering, or related field.
- 10+ years of experience in data engineering or software development roles.
- Strong proficiency in Python, including experience with libraries like pandas, PySpark, FastAPI, or similar.
- Solid experience with cloud services (AWS or Azure) and Cloud native data engineering tools.
- Proven experience in building and maintaining data pipelines using Kafka, Airflow, and other open-source frameworks.
- Strong grasp of database internals and trade-offs between different storage technologies.
- Familiarity with data governance, lineage, and metadata management concepts.
- Experience or strong interest in integrating LLMs and AI/ML models into production-grade data systems.
Preferred Qualifications
- Knowledge of data cataloging tools and semantic layer design.
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Familiarity with MLOps tools or platforms (e.g., SageMaker, MLflow).
- Prior experience working in regulated or secure environment
Similar Jobs
Data Engineer
Remote
Feb 18th, 2026
Data Engineer
California
Feb 18th, 2026
Data Engineer
Virginia
Feb 17th, 2026
Data Engineer
Delaware
Feb 9th, 2026
Data Engineer
California
Feb 9th, 2026