Junior ML Engineer

Bryckel AI

Moradabad • Not disclosed

4 days ago

Remote

About the job

Junior ML Engineer – LLM Infrastructure & Orchestration About Us We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini). We operate schema-constrained LLM systems : prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows. We’re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month. This role is production ML systems engineering, not model training. What You’ll Do • Build and operate end-to-end LLM pipelines for full-document analysis (100–500+ page contracts) • Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs • Own LLM orchestration logic : prompt routing, validation, retries, fallbacks, and partial re-execution • Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution) • Build and scale OCR → document parsing → LLM inference pipelines for scanned leases (Textract) • Develop streaming and async APIs using FastAPI • Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure) • Productionize report generation (DOCX / EXCEL) as deterministic pipeline outputs • Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda) • Debug production issues : timeouts, schema failures, partial extractions, cost spikes What You’ll Own Technically • Pydantic-based schemas for all LLM outputs • Prompt ↔ schema contracts and versioning • Validation, retry, and fallback mechanisms • Latency and cost optimization for long-context inference • Reliability of OCR + LLM pipelines at scale Must Have • Strong Python and async programming fundamentals • ~1 year experience working on production ML or LLM systems • Hands-on experience with Claude, Gemini, and AWS Bedrock • Experience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar) • Experience with OCR and document-heavy pipelines • Experience with Celery or distributed async job systems • Comfort treating LLMs as non-deterministic services requiring validation and retries • Individual contributor mindset in a lean startup • Available to join immediately or within 1 month Nice to Have (Strong ML Signals) • Experience with streaming LLM responses • Familiarity with long-context failure modes and truncation issues • Experience with LLM output evaluation or regression testing • Cost monitoring and optimization for LLM inference Why Join Us • Work on real production ML systems, not demos • Own core LLM infrastructure end-to-end • Direct exposure to long-context, document-scale AI • Fully remote, fast-paced startup

Requirements

Python
async programming
Claude
Gemini
AWS Bedrock
Pydantic
OCR
Celery
distributed async job systems

Preferred Technologies

Python
async programming
Claude
Gemini
AWS Bedrock
Pydantic
OCR
Celery
distributed async job systems

About the company

We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).

Similar Jobs

AI / ML Engineer

Akoni Technologies

Anand•Not disclosed

8 hours ago•Remote

AI / ML Engineer

Akoni Technologies

Anand•Not disclosed

Yesterday•Remote

ML Engineer

IBM

India•Not disclosed

Last Month•On-Site