Truefoundry
TrueFoundry Senior SRE/DevOps Engineer
Truefoundry is a PaaS for machine learning that enables teams to build, deploy, and scale AI applications with ease.
Get represented
5 minutes to evaluate. 6 months of representation.
What is TrueFoundry?
TrueFoundry is a Cloud-Native PaaS that automates the entire ML lifecycle—from model fine-tuning to production serving with GPU optimization. The platform introduces an 'AI Gateway' that provides a centralized control plane for connecting, governing, and observing agentic AI workloads.
You'll be a good fit if you have
- A strong dedication to the "everything-as-code" philosophy, with expert-level proficiency in Terraform for provisioning complex AWS environments, including Kubernetes and managed databases.
- Deep experience designing and implementing robust CI/CD pipelines from scratch, leveraging tools and programming skills (Go or Python) to achieve high levels of automation and operational efficiency.
- Proven success in ensuring platform infrastructure adheres to stringent compliance and security standards such as SOC2, ISO 27001, and HIPAA.
- Excellent customer-facing technical skills, comfortable directly engaging with enterprise clients to scope, deploy, onboard, and troubleshoot production environments.
- Practical knowledge of observability tooling (Prometheus, Grafana) and best practices for configuring application monitoring, alerting, and logging in a multi-tenant or multi-environment setup.
- A foundational understanding of networking, SRE principles, and resilience patterns to proactively identify and mitigate single points of failure, autoscaling issues, and performance bottlenecks.
- Experience managing big data infrastructure or ETL pipelines, recognizing the unique scaling and reliability challenges of data-intensive applications.
Key Responsibilities
- Write and maintain Terraform modules for deploying various AWS infrastructure components such as Kubernetes, RDS, Prometheus, Grafana, and static websites.
- Collaborate directly with TrueFoundry customers to ensure smooth onboarding, reliable deployments, and best practices adoption.
- Configure networking, autoscaling, continuous deployment pipelines, and multi-environment setups.
- Ensure infrastructure compliance with SOC2, ISO 27001, and HIPAA standards.
- Automate infrastructure provisioning and deployment processes to deliver a seamless developer experience.
- Participate in customer training and onboarding sessions to drive adoption and operational excellence.
This is not a cold application
You're in the top 1%. We represent you to Truefoundry. Not the other way around.
Production-Grade AI
Optimizing infra for Fortune 500 enterprises.
Why Truefoundry
Data scientists are often slowed down by complex dev-ops tasks, making it difficult to move AI models from local machines to production.
Truefoundry provides an 'enterprise-ready' layer that automates infrastructure management, allowing teams to deploy reliable AI agents in days instead of months.
Problems that matter. Company that matters. Infrastructure that matters.
Accelerate economic growth by connecting every talent to the work they are meant to do
Hiring got broken. Too focused on companies. Not enough on people. We're fixing it.
Human judgment plus AI. Talent first.
Built By
We built Grapevine. More than 1 million engineers use it to share real talk about companies, salaries, culture. We saw the problem up close.
Great engineers lost in broken systems.
So we built Round1. Backed by Peak XV and Kae Capital. One mission: connect exceptional engineers to work that matters.

