LLM Engineer

Zorba AI

India • Not disclosed

Yesterday

On-Site

About the job

Position: LLM Engineer — On-site (India). We are hiring an experienced LLM engineer to design, fine-tune, and deploy LLM-based solutions that power search, summarization, agents, and domain-specific assistants. Role & Responsibilities • Design, fine-tune, and validate LLMs for production use-cases—instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters). • Implement retrieval-augmented generation (RAG) pipelines: embeddings, vector search, chunking, and context assembly for high-recall responses. • Optimize inference for latency and cost: quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable). • Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines. • Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks. • Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability. Skills & Qualifications Must-Have • 4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts. • Proficiency in Python for ML engineering and model development. • Experience with PyTorch and Hugging Face Transformers for training and fine-tuning. • Practical experience implementing RAG and vector search using tools like FAISS or similar vector databases. • Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.). • Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus. Preferred • Experience with inference optimizations: quantization (bitsandbytes), Triton, or GPU-accelerated serving. • Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform). • Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana). Benefits & Culture Highlights • Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration. • Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions. • On-site role with hands-on access to GPU infrastructure and cross-functional product teams.

Requirements

pytorch
python
docker
cuda
agentic
llm

Preferred Technologies

pytorch
python
docker
cuda
agentic
llm

About the company

A leading consulting firm operating in the Enterprise Generative AI and Large Language Model (LLM) services sector, delivering production-grade LLM solutions, retrieval-augmented systems, and custom generative AI products for enterprise clients across domains.

Similar Jobs

LLM Engineer

Sei

Chennai•Not disclosed

Last Month•On-Site

LLM Engineer

Sei

Gurugram•Not disclosed

3 weeks ago•Hybrid

LLM Engineer

Sei

Gurugram•Not disclosed

Last Month•On-Site