About the job
About TrueFoundry Every production AI system, whether it's powering customer support, writing code, analyzing financial data, or diagnosing medical conditions, needs the same foundational infrastructure. A way to route between models. A way to manage tools and integrate them securely. A way to orchestrate agents and enforce governance. A unified compute layer to run it all. That infrastructure layer is being built right now. We're TrueFoundry, and we're building it. We're looking for a Senior SRE/DevOps Engineer to join the team. The Problem We're Solving Companies are moving beyond simple chatbots to production agentic systems. These systems route between OpenAI, Anthropic, Google, and self-hosted models. They integrate dozens of tools via protocols like MCP. They orchestrate multi-agent workflows where agents coordinate with other agents. The infrastructure to support this doesn't exist yet. You can't just duct-tape together a few API calls and call it production-ready. You need a control plane that handles: • Intelligent routing with observability, cost policies, and fallback logic • Centralized tool and MCP server management with security and lifecycle controls • Agent orchestration with governance and guardrails • A unified compute layer to run self-hosted models, custom tools, and agents We've built two products to solve this: AI Gateway is the control plane, five composable components (Prompts, LLM Gateway, MCP Gateway, Guardrails, Agent Gateway) that handle routing, orchestration, and governance. AI Deploy is the compute layer, a Kubernetes-based platform that abstracts ML workloads as standard software primitives, so everything runs on unified infrastructure. We're Series A, backed by Intel Capital and Sequoia. Companies like CVS, Mastercard, Siemens, Paytm, Synopsys, and Zscaler run production AI workloads on our platform. Roles / Responsibilities: • Write Terraform modules for deploying different component of infrastructure in AWS like Kubernetes, RDS, Prometheus, Grafana, Static Website • The SRE will work closely with TrueFoundry customers, gaining a deep understanding of the TrueFoundry platform to ensure smooth deployments, reliable operations, and best practices adoption. This role will also involve training and onboarding new customers, assisting them in implementing TrueFoundry effectively, and helping drive platform adoption and operational excellence across customer teams. • Configure networking, autoscaling, continuous deployment, security and multiple environments • Make sure the infrastructure is SOC2, ISO 27001 and HIPAA compliant • Automate all the steps to provide a seamless experience to developers. Requirements Experience with Golang or Python is must. • 4+ years work experience writing clean production code • Well versed with maintaining infrastructure as code (Terraform, Cloudformation etc). High proficiency with Terraform / Terragrunt is absolutely critical • Experience of setting CI/CD pipelines from scratch • Experience with ETL pipelines, Bigdata infra • Understanding of common security issues Benefits at TrueFoundry • Work with top engineers who led the Facebook Videos and Infrastructure team • Flexible working hours and directly with Co-founders • Team discussions on product and business growth strategies • Insurance and other benefits like learning credits Our Way Of Working • An opportunity to work on something that really matters • A fast-paced environment to learn and grow • High transparency in decision-making • High autonomy; freedom to take risks, to experiment, and to fail • Full ownership and autonomy • There is no glass ceiling for this role that limits your growth • We promise a meaningful journey and opportunities to learn and grow.
Requirements
- Golang
- Python
- Terraform
- Cloudformation
Preferred Technologies
- Golang
- Python
- Terraform
- Cloudformation
Benefits
- Work with top engineers who led the Facebook Videos and Infrastructure team
- Flexible working hours and directly with Co-founders
- Team discussions on product and business growth strategies
- Insurance and other benefits like learning credits
About the company
TrueFoundry is building foundational infrastructure for production AI systems, enabling companies to route between models, manage tools, and integrate agents securely.
Similar Jobs
Senior Engineer
CodeMyMobile
Senior Engineer
CodeMyMobile
Senior Engineer
CodeMyMobile