ML Platform Engineer

ZAVO

Delhi NCR • Not disclosed

4 hours ago

Remote

About the job

About the Role We're building the infrastructure layer for next-generation AI — and we need engineers who care deeply about how GPUs are used, not just that they run. You'll work at the intersection of infrastructure and ML, solving real-world challenges around GPU utilization, distributed training, and developer experience. Key Responsibilities • Build and optimize GPU-backed compute environments for ML workloads • Develop systems for provisioning and managing GPU resources • Create and maintain containerized ML environments (PyTorch, TensorFlow, etc.) • Improve performance of training and inference workloads • Work on distributed training setups across multiple GPUs • Build internal tools for monitoring, profiling, and debugging GPU usage • Contribute to developer tooling (CLI / SDK) for interacting with the platform Required Skills • Strong experience with PyTorch / TensorFlow / JAX • Hands-on with CUDA, NVIDIA drivers, cuDNN • Experience with Docker and containerized environments • Familiarity with Kubernetes or similar orchestration systems • Strong Python skills • Understanding of Linux systems and performance tuning • Exposure to distributed systems or multi-GPU training Nice to Have • Experience with LLM fine-tuning or inference optimization • Familiarity with tools like Ray, Horovod, or Dask • Exposure to GPU monitoring / profiling tools • Experience working with production ML systems

Requirements

PyTorch
TensorFlow
CUDA
GPU Utilization

Qualifications

Strong experience with PyTorch / TensorFlow / JAX
Hands-on with CUDA, NVIDIA drivers, cuDNN

Preferred Technologies

PyTorch
TensorFlow
CUDA
GPU Utilization

Benefits

ESOPs

Similar Jobs

ML Engineer

Agra•Not disclosed

Last week•Remote

ML Engineer

Tide

Hyderabad•Not disclosed

Last week•On-Site

ML Engineer

Anand•Not disclosed

Last week•Remote