Jobs on TAL
All jobsOnsiteEngineeringdevtools3-6 yearslinux
OnsiteMid Leveldevtools

Platform Site Reliability Engineer (SRE)

CirrusLabsHyderabad, Telangana, IndiaPosted 20 May 2026

CirrusLabs is seeking a Platform Site Reliability Engineer to manage observability and day-2 operations for production AI platform environments. The role focuses on building monitoring systems, troubleshooting infrastructure, and automating operational tasks to maintain service health. Candidates must have solid experience with Linux, Kubernetes, and monitoring tools like Prometheus and Grafana. The position requires collaboration across various platform and infrastructure teams to improve reliability and incident response.

Matched by TAL

50k new jobs listed every day. Install TAL to find more jobs like this.

Install TAL

Experience

3-6 years

Function

Engineering

Work mode

Onsite, India

Company

Tier 2

What you will work on

CirrusLabs is seeking a Platform Site Reliability Engineer to manage observability and day-2 operations for production AI platform environments. The role focuses on building monitoring systems, troubleshooting infrastructure, and automating operational tasks to maintain service health. Candidates must have solid experience with Linux, Kubernetes, and monitoring tools like Prometheus and Grafana. The position requires collaboration across various platform and infrastructure teams to improve reliability and incident response.

TAL's take

Quality 58/1005/5 clarityTier 2 company

Solid mid-tier role with well-defined SRE responsibilities in a growing tech services company.

The JD provides a highly specific list of responsibilities, technical requirements, and operational workflows for an SRE role.

Must haves

  • Strong Linux administration and troubleshooting
  • Experience supporting production environments
  • Experience with Kubernetes and containers
  • Hands-on experience with monitoring and alerting in production
  • Experience with Prometheus and Grafana
  • Basic scripting or automation experience using Bash, Python, or Ansible

Tools and skills

linuxkubernetescontainersprometheusgrafanabashpythonansible

Nice to have: elk, loki, opensearch, nvidia gpu infrastructure, dcgm, gpu operator, nvaie.

About the company

CirrusLabs is an established technology services and product engineering firm but lacks the specific engineering brand prestige of tier 1 organizations.