Senior Cloud Site Reliability Engineer
Bengaluru, Karnataka, India · Full Time
Be the first to apply
- Experience
- Any
- Salary
- —
- Openings
- 1
- Posted
- 2 days ago
Where you'll work
Job description
About the Role
This position is for a senior SRE focused on keeping cloud platforms and core product systems dependable, scalable, secure, and efficient. The work blends software engineering, cloud operations, automation, observability, and governance to create resilient, self-healing environments across both hybrid and cloud-native setups.
The role is expected to raise the overall reliability maturity of the organization by introducing strong SRE practices, improving service stability through automation, and setting clear standards for observability. You will work closely with Engineering, Product, Security, DBA, and DevEx stakeholders.
Key Responsibilities
- Apply SRE methods across cloud platforms and services to improve reliability and operational performance.
- Set up, track, and continuously refine SLIs, SLOs, SLAs, and error budgets.
- Improve service resiliency, scalability, and day-to-day operational effectiveness.
- Define reliability governance, production-readiness standards, and operational best practices.
- Lead root-cause analysis, post-incident reviews, and reliability improvement follow-ups.
- Take part in on-call coverage, incident handling, and major incident resolution.
- Work to lower MTTD and MTTR through better processes and faster response mechanisms.
- Develop and maintain observability and telemetry solutions for enterprise environments.
- Create dashboards, reliability scorecards, and health monitoring views for services.
- Implement alerting, anomaly detection, and event correlation for proactive operations.
- Use tools such as Grafana, Prometheus, Azure Monitor, Log Analytics, ElasticSearch/ELK, and Power BI for centralized monitoring and reporting.
- Produce actionable operational metrics for engineering leaders and technical teams.
- Improve visibility into infrastructure, applications, Kubernetes, and platform health.
- Promote automation-first operations across infrastructure and platform services.
- Build Infrastructure-as-Code assets using Terraform, ARM/Bicep, and Ansible.
- Write automation scripts with Python, Bash, and PowerShell for operational workflows.
- Create self-healing and auto-remediation solutions for repeated incidents.
- Automate provisioning, monitoring, scaling, backup, recovery, and deployment steps.
- Reduce manual effort and improve engineering productivity through smart automation.
- Partner with cloud engineering, product engineering, DevEx, security, DBA, and operations teams.
- Help teams improve production readiness and overall operational maturity.
- Support reliability reviews, continuous improvement efforts, and operational excellence programs.
Required Experience and Technical Background
- Bachelor’s degree in Computer Science, Engineering, or a related discipline, or equivalent experience/certification.
- Hands-on knowledge of Python, scripting, and Infrastructure-as-Code tools such as Terraform, Ansible, or ARM/Bicep.
- Experience managing cloud environments such as Azure, AKS, Pivotal Cloud Foundry, or similar platforms.
- Strong understanding of Kubernetes and containerization.
- Background in application packaging, deployment automation, and release management.
- Working knowledge of relational databases, especially MS-SQL, with exposure to NoSQL systems such as Redis, ElasticSearch, or MongoDB.
- Experience with CI/CD platforms such as Azure DevOps, Jenkins, GitHub Actions, or comparable tools.
- Comfort with monitoring and logging tools including Grafana, ELK, Prometheus, and Power BI.
- Good command of Git and modern branching and merge practices.
- Strong Linux administration and troubleshooting capability.
- Excellent analytical, communication, and collaboration skills.
Additional Information
Location: Bengaluru, India.
The role is for a full-time position.
No stipend or salary details were provided in the source.
No specific number of openings, start date, or application deadline was stated.
No separate perks or benefits were listed.