U

AI Engineer (Data Pipelines & RAG)

Uplers
Noida ₹ Confidential (based on experience)
2 days ago
Remote
Apply to Job

About the job

We are seeking a versatile Data & AI Engineer with 4-7 years of experience to build, deploy & maintain end-to-end data pipelines for downstream Gen AI applications. You'll design data models and transformations, build scalable ETL/ELT workflows, while learning fast and working on the AI agent space. Key Responsibilities Data Modeling & Pipeline development • Automate data ingestion from diverse sources (Databases, APIs, files, Sharepoint/ document management tools, URLs). Most files are expected to be unstructured documents with different file formats, tables, charts, process flows, schedules, construction layouts/drawings, etc. • Own chunking strategy, embedding, indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems • Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming) • Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency Gen AI Integration • Instrument data pipelines to surface real-time context into LLM prompts • Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical Observability & Governance • Implement monitoring, alerting, and logging (data quality, latency, errors) • Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM) CI/CD & Automation • Develop automated testing, versioning, and deployment (Azure DevOps, GitHub Actions, Prefect/Airflow) • Maintain reproducible environments with infrastructure as code (Terraform, ARM templates) Required Skills & Experience • 5 years in Data Engineering or similar role, with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR, cloud-native solutions and chunking, indexing etc. for downstream consumption by RAG/ Gen AI applications. • Proficiency in Python, dlt for ETL/ELT pipeline, duckDB or equivalent tools for analytical in-process analysis, dvc for managing large files efficiently. • Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred. • Familiarity with Prefect is preferred or others (e.g. Azure Data Factory) • Proficiency with the Azure ecosystem. Should have worked on Azure services in production. • Familiarity with RAG indexing, chunking and storage across file types for efficient retrieval. • Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps) • Experience deploying ML artifacts using MLflow, Docker, or Kubernetes is good to have. Bonus skillsets: • Experience with Computer vision based extraction or experience in building ML models for production • Knowledge of agentic AI system design - memory, tools, context, orchestration • Knowledge of data governance, privacy laws (GDPR) and enterprise security patterns.

Requirements

  • Data Engineering
  • ETL/ELT workflows
  • Python
  • SQL
  • cloud-native solutions
  • OCR

Preferred Technologies

  • Data Engineering
  • ETL/ELT workflows
  • Python
  • SQL
  • cloud-native solutions
  • OCR

Benefits

  • remote work with quarterly meet-ups
  • fast-growing, revenue-generating startup
  • learning opportunities

Similar Jobs

People Prime Worldwide

AI Engineer

People Prime Worldwide

Pan IndiaNot disclosed
YesterdayRemote
Vista Applied Solutions Group Inc

AI Engineer

Vista Applied Solutions Group Inc

Not disclosed
3 weeks agoRemote
Galaxy i technologies Inc

AI / LLM Engineer

Galaxy i technologies Inc

Pune/Bangalore/Kochi/HyderabadNot disclosed
Last weekOn-Site