AI Engineer (Data Pipelines & RAG)
About the job
We are seeking a versatile Data & AI Engineer with 4-7 years of experience to build, deploy & maintain end-to-end data pipelines for downstream Gen AI applications. You'll design data models and transformations, build scalable ETL/ELT workflows, while learning fast and working on the AI agent space. Key Responsibilities Data Modeling & Pipeline development • Automate data ingestion from diverse sources (Databases, APIs, files, Sharepoint/ document management tools, URLs). Most files are expected to be unstructured documents with different file formats, tables, charts, process flows, schedules, construction layouts/drawings, etc. • Own chunking strategy, embedding, indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems • Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming) • Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency Gen AI Integration • Instrument data pipelines to surface real-time context into LLM prompts • Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical Observability & Governance • Implement monitoring, alerting, and logging (data quality, latency, errors) • Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM) CI/CD & Automation • Develop automated testing, versioning, and deployment (Azure DevOps, GitHub Actions, Prefect/Airflow) • Maintain reproducible environments with infrastructure as code (Terraform, ARM templates) Required Skills & Experience • 5 years in Data Engineering or similar role, with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR, cloud-native solutions and chunking, indexing etc. for downstream consumption by RAG/ Gen AI applications. • Proficiency in Python, dlt for ETL/ELT pipeline, duckDB or equivalent tools for analytical in-process analysis, dvc for managing large files efficiently. • Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred. • Familiarity with Prefect is preferred or others (e.g. Azure Data Factory) • Proficiency with the Azure ecosystem. Should have worked on Azure services in production. • Familiarity with RAG indexing, chunking and storage across file types for efficient retrieval. • Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps) • Experience deploying ML artifacts using MLflow, Docker, or Kubernetes is good to have. Bonus skillsets: • Experience with Computer vision based extraction or experience in building ML models for production • Knowledge of agentic AI system design - memory, tools, context, orchestration • Knowledge of data governance, privacy laws (GDPR) and enterprise security patterns.
Requirements
- Data Engineering
- ETL/ELT workflows
- Python
- SQL
- cloud-native solutions
- OCR
Preferred Technologies
- Data Engineering
- ETL/ELT workflows
- Python
- SQL
- cloud-native solutions
- OCR
Benefits
- remote work with quarterly meet-ups
- fast-growing, revenue-generating startup
- learning opportunities
Similar Jobs
AI Engineer
People Prime Worldwide
AI Engineer
Vista Applied Solutions Group Inc
AI / LLM Engineer
Galaxy i technologies Inc