About the job
Junior ML Engineer – LLM Infrastructure & Orchestration About Us We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini). We operate schema-constrained LLM systems : prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows. We’re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 month. This role is production ML systems engineering, not model training. What You’ll Do • Build and operate end-to-end LLM pipelines for full-document analysis (100–500+ page contracts) • Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputs • Own LLM orchestration logic : prompt routing, validation, retries, fallbacks, and partial re-execution • Optimize latency, throughput, and cost for long-context inference (batching, streaming, async execution) • Build and scale OCR → document parsing → LLM inference pipelines for scanned leases (Textract) • Develop streaming and async APIs using FastAPI • Manage distributed background workloads with Celery (queues, retries, idempotency, backpressure) • Productionize report generation (DOCX / EXCEL) as deterministic pipeline outputs • Deploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda) • Debug production issues : timeouts, schema failures, partial extractions, cost spikes What You’ll Own Technically • Pydantic-based schemas for all LLM outputs • Prompt ↔ schema contracts and versioning • Validation, retry, and fallback mechanisms • Latency and cost optimization for long-context inference • Reliability of OCR + LLM pipelines at scale Must Have • Strong Python and async programming fundamentals • ~1 year experience working on production ML or LLM systems • Hands-on experience with Claude, Gemini, and AWS Bedrock • Experience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar) • Experience with OCR and document-heavy pipelines • Experience with Celery or distributed async job systems • Comfort treating LLMs as non-deterministic services requiring validation and retries • Individual contributor mindset in a lean startup • Available to join immediately or within 1 month Nice to Have (Strong ML Signals) • Experience with streaming LLM responses • Familiarity with long-context failure modes and truncation issues • Experience with LLM output evaluation or regression testing • Cost monitoring and optimization for LLM inference Why Join Us • Work on real production ML systems, not demos • Own core LLM infrastructure end-to-end • Direct exposure to long-context, document-scale AI • Fully remote, fast-paced startup
Requirements
- Python
- async programming
- Claude
- Gemini
- AWS Bedrock
- Pydantic
- OCR
- Celery
- distributed async job systems
Preferred Technologies
- Python
- async programming
- Claude
- Gemini
- AWS Bedrock
- Pydantic
- OCR
- Celery
- distributed async job systems
About the company
We are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).
Similar Jobs
AI / ML Engineer
Akoni Technologies
AI / ML Engineer
Akoni Technologies
ML Engineer
IBM