ZAVO

ML Platform Engineer

ZAVO
Delhi NCR Not disclosed
4 hours ago
Remote
Apply to Job

About the job

About the Role We're building the infrastructure layer for next-generation AI — and we need engineers who care deeply about how GPUs are used, not just that they run. You'll work at the intersection of infrastructure and ML, solving real-world challenges around GPU utilization, distributed training, and developer experience. Key Responsibilities • Build and optimize GPU-backed compute environments for ML workloads • Develop systems for provisioning and managing GPU resources • Create and maintain containerized ML environments (PyTorch, TensorFlow, etc.) • Improve performance of training and inference workloads • Work on distributed training setups across multiple GPUs • Build internal tools for monitoring, profiling, and debugging GPU usage • Contribute to developer tooling (CLI / SDK) for interacting with the platform Required Skills • Strong experience with PyTorch / TensorFlow / JAX • Hands-on with CUDA, NVIDIA drivers, cuDNN • Experience with Docker and containerized environments • Familiarity with Kubernetes or similar orchestration systems • Strong Python skills • Understanding of Linux systems and performance tuning • Exposure to distributed systems or multi-GPU training Nice to Have • Experience with LLM fine-tuning or inference optimization • Familiarity with tools like Ray, Horovod, or Dask • Exposure to GPU monitoring / profiling tools • Experience working with production ML systems

Requirements

  • PyTorch
  • TensorFlow
  • CUDA
  • GPU Utilization

Qualifications

  • Strong experience with PyTorch / TensorFlow / JAX
  • Hands-on with CUDA, NVIDIA drivers, cuDNN

Preferred Technologies

  • PyTorch
  • TensorFlow
  • CUDA
  • GPU Utilization

Benefits

  • ESOPs

Similar Jobs

Linkedin

ML Engineer

Linkedin

AgraNot disclosed
Last weekRemote
Tide

ML Engineer

Tide

HyderabadNot disclosed
Last weekOn-Site
Linkedin

ML Engineer

Linkedin

AnandNot disclosed
Last weekRemote