Truefoundry

Truefoundry

TrueFoundry Senior SRE/DevOps Engineer

Truefoundry is a PaaS for machine learning that enables teams to build, deploy, and scale AI applications with ease.

Salary45L - 60L
LocationBengaluru
Experience4-8 Years

Get represented

5 minutes to evaluate. 6 months of representation.

What is TrueFoundry?

TrueFoundry is a Cloud-Native PaaS that automates the entire ML lifecycle—from model fine-tuning to production serving with GPU optimization. The platform introduces an 'AI Gateway' that provides a centralized control plane for connecting, governing, and observing agentic AI workloads.

You'll be a good fit if you have

  • A strong dedication to the "everything-as-code" philosophy, with expert-level proficiency in Terraform for provisioning complex AWS environments, including Kubernetes and managed databases.
  • Deep experience designing and implementing robust CI/CD pipelines from scratch, leveraging tools and programming skills (Go or Python) to achieve high levels of automation and operational efficiency.
  • Proven success in ensuring platform infrastructure adheres to stringent compliance and security standards such as SOC2, ISO 27001, and HIPAA.
  • Excellent customer-facing technical skills, comfortable directly engaging with enterprise clients to scope, deploy, onboard, and troubleshoot production environments.
  • Practical knowledge of observability tooling (Prometheus, Grafana) and best practices for configuring application monitoring, alerting, and logging in a multi-tenant or multi-environment setup.
  • A foundational understanding of networking, SRE principles, and resilience patterns to proactively identify and mitigate single points of failure, autoscaling issues, and performance bottlenecks.
  • Experience managing big data infrastructure or ETL pipelines, recognizing the unique scaling and reliability challenges of data-intensive applications.

Key Responsibilities

  • Write and maintain Terraform modules for deploying various AWS infrastructure components such as Kubernetes, RDS, Prometheus, Grafana, and static websites.
  • Collaborate directly with TrueFoundry customers to ensure smooth onboarding, reliable deployments, and best practices adoption.
  • Configure networking, autoscaling, continuous deployment pipelines, and multi-environment setups.
  • Ensure infrastructure compliance with SOC2, ISO 27001, and HIPAA standards.
  • Automate infrastructure provisioning and deployment processes to deliver a seamless developer experience.
  • Participate in customer training and onboarding sessions to drive adoption and operational excellence.
You are pre-screened

This is not a cold application

You're in the top 1%. We represent you to Truefoundry. Not the other way around.

5 minsUnlimited Retakes48 Hour Responses
Interview Once
Get Evaluated
Get Represented

Production-Grade AI

Optimizing infra for Fortune 500 enterprises.

10X Faster
Deployment Speed
4X YoY
Customer Growth
1,000+
ML Clusters

Why Truefoundry

Data scientists are often slowed down by complex dev-ops tasks, making it difficult to move AI models from local machines to production.

Truefoundry provides an 'enterprise-ready' layer that automates infrastructure management, allowing teams to deploy reliable AI agents in days instead of months.

Problems that matter. Company that matters. Infrastructure that matters.

Accelerate economic growth by connecting every talent to the work they are meant to do

Hiring got broken. Too focused on companies. Not enough on people. We're fixing it.

Human judgment plus AI. Talent first.

Built By

We built Grapevine. More than 1 million engineers use it to share real talk about companies, salaries, culture. We saw the problem up close.
Great engineers lost in broken systems.

So we built Round1. Backed by Peak XV and Kae Capital. One mission: connect exceptional engineers to work that matters.