About the job
About the Role We're building the infrastructure layer for next-generation AI — and we need engineers who care deeply about how GPUs are used, not just that they run. You'll work at the intersection of infrastructure and ML, solving real-world challenges around GPU utilization, distributed training, and developer experience. Key Responsibilities • Build and optimize GPU-backed compute environments for ML workloads • Develop systems for provisioning and managing GPU resources • Create and maintain containerized ML environments (PyTorch, TensorFlow, etc.) • Improve performance of training and inference workloads • Work on distributed training setups across multiple GPUs • Build internal tools for monitoring, profiling, and debugging GPU usage • Contribute to developer tooling (CLI / SDK) for interacting with the platform Required Skills • Strong experience with PyTorch / TensorFlow / JAX • Hands-on with CUDA, NVIDIA drivers, cuDNN • Experience with Docker and containerized environments • Familiarity with Kubernetes or similar orchestration systems • Strong Python skills • Understanding of Linux systems and performance tuning • Exposure to distributed systems or multi-GPU training Nice to Have • Experience with LLM fine-tuning or inference optimization • Familiarity with tools like Ray, Horovod, or Dask • Exposure to GPU monitoring / profiling tools • Experience working with production ML systems
Requirements
- PyTorch
- TensorFlow
- CUDA
- GPU Utilization
Qualifications
- Strong experience with PyTorch / TensorFlow / JAX
- Hands-on with CUDA, NVIDIA drivers, cuDNN
Preferred Technologies
- PyTorch
- TensorFlow
- CUDA
- GPU Utilization
Benefits
- ESOPs
Similar Jobs
ML Engineer
ML Engineer
Tide