Platform Site Reliability Engineer (SRE)
CirrusLabs is seeking a Platform Site Reliability Engineer to manage observability and day-2 operations for production AI platform environments. The role focuses on building monitoring systems, troubleshooting infrastructure, and automating operational tasks to maintain service health. Candidates must have solid experience with Linux, Kubernetes, and monitoring tools like Prometheus and Grafana. The position requires collaboration across various platform and infrastructure teams to improve reliability and incident response.
50k new jobs listed every day. Install TAL to find more jobs like this.

Experience
3-6 years
Function
Engineering
Work mode
Onsite, India
Company
Tier 2
What you will work on
CirrusLabs is seeking a Platform Site Reliability Engineer to manage observability and day-2 operations for production AI platform environments. The role focuses on building monitoring systems, troubleshooting infrastructure, and automating operational tasks to maintain service health. Candidates must have solid experience with Linux, Kubernetes, and monitoring tools like Prometheus and Grafana. The position requires collaboration across various platform and infrastructure teams to improve reliability and incident response.
TAL's take
Solid mid-tier role with well-defined SRE responsibilities in a growing tech services company.
The JD provides a highly specific list of responsibilities, technical requirements, and operational workflows for an SRE role.
Must haves
- Strong Linux administration and troubleshooting
- Experience supporting production environments
- Experience with Kubernetes and containers
- Hands-on experience with monitoring and alerting in production
- Experience with Prometheus and Grafana
- Basic scripting or automation experience using Bash, Python, or Ansible
Tools and skills
Nice to have: elk, loki, opensearch, nvidia gpu infrastructure, dcgm, gpu operator, nvaie.
About the company
CirrusLabs is an established technology services and product engineering firm but lacks the specific engineering brand prestige of tier 1 organizations.