المسمى الوظيفي

Role Overview

Infosys is hiring a Site Reliability Engineer for its Mangaluru, Karnataka, India location. This is a full-time, on-site role focused on keeping systems reliable, scalable, and efficient across cloud and operational environments.

Core Responsibilities

Build and maintain CI/CD/CT pipelines using tools such as Jenkins, Bamboo, Azure DevOps, or AWS CodePipeline.
Work with infrastructure-as-code solutions like Terraform, CloudFormation, and Azure ARM to automate cloud infrastructure management.
Create, operate, and refine monitoring and log-analysis systems using platforms such as AppDynamics, Datadog, Splunk, Kibana, Prometheus, Grafana, and Elasticsearch.
Use AIOps platforms such as Dynatrace, Splunk, and ServiceNow to support incident handling, observability, and alert-noise reduction.
Oversee infrastructure capacity and performance to support growth across public and private cloud environments.
Set and enforce standards for system architecture, deployment practices, metrics, and operational activities.
Track service availability and system health, and participate in incident response activities.
Improve delivery speed, performance, and efficiency through automation, process improvements, postmortem analysis, and configuration reviews.
Coordinate and communicate effectively across teams and functions within the organization.
Troubleshoot and monitor production systems to maintain strong uptime and stability.
Strengthen and evolve high-availability architecture and day-to-day operational processes.
Integrate GenAI and AIOps capabilities to automate incident detection, root cause analysis, and resolution workflows, including self-healing scripts and intelligent runbooks.
Use prompt engineering to improve the quality and usefulness of responses from AI-driven observability and automation tools.
Apply cloud AI services such as AWS Bedrock, Azure OpenAI, and GCP Vertex AI to design smarter SRE solutions for cloud platforms.
Have working knowledge of agentic AI approaches that can be applied in operations and support environments.

Required Background

Hands-on experience with at least one high-level programming language such as Python, Ruby, or Go.
Good understanding of object-oriented programming concepts.
Practical experience with CI/CD/CT pipeline design and implementation.
Strong familiarity with infrastructure automation and cloud operations tools.
Experience using monitoring, observability, and log-management solutions for production support.
Exposure to AIOps, GenAI, and AI/ML platforms used in operations workflows.
Ability to manage production systems, support availability targets, and respond to incidents effectively.

Additional Information

This position emphasizes reliability engineering, cloud-scale operations, automation, and the adoption of AI-driven support practices. The role also calls for collaboration across the organization and a focus on continuous improvement in system uptime, performance, and operational efficiency.

Site Reliability Engineer

Where you'll work

المسمى الوظيفي

Role Overview

Core Responsibilities

Required Background

Additional Information

مهارات