AI Infrastructure & Platform Operations Engineer
Remote • Vollzeit
Bewerben Sie sich als Erste/r!
- Erfahrung
- 3+ Jahre
- Gehalt
- —
- Stellenangebote
- 1
- Veröffentlicht
- vor 1 Stunde
- Arbeitsmodus
- Arbeiten von zu Hause
- Teilnahmeberechtigung
- Applicants with at least 3 years of relevant technical experience in infrastructure, operations, networking, cloud, datacenter, SRE, or similar environments can apply. The role is intended for professionals able to work in a shift-based operational setting and who can work remotely from within the…
- Wieder aufnehmen
- Bewerbung erforderlich
Stellenbeschreibung
About the Role
Mirantis builds Kubernetes-native AI infrastructure that helps organizations run secure, scalable, and sovereign environments for modern AI, machine learning, and data-heavy workloads. The company combines open source technology with strong Kubernetes expertise to support composable developer platforms across on-premises, cloud, edge, and sovereign data center setups. It focuses on automation, GPU orchestration, and policy-based control so enterprises can manage complex AI infrastructure with confidence and flexibility.
The company works with major global enterprises across a range of industries and is building a European AI Infrastructure & Platform Operations team to run large-scale AI environments powered by NVIDIA GPUs, high-performance networking, Kubernetes, and newer platform technologies.
This position sits at the intersection of infrastructure, networking, and platform operations, with responsibility for keeping critical AI platforms highly available, reliable, and performant across multiple datacenters. The role also offers the chance to work on emerging AI infrastructure services and contribute to operational capabilities built around platforms such as k0rdent AI.
What You Will Do
- Oversee, run, and support production AI infrastructure platforms.
- Diagnose and fix incidents involving infrastructure, networking, hardware, and platform services.
- Support NVIDIA GPU infrastructure and related platform components.
- Watch over and troubleshoot Kubernetes-based production environments.
- Analyze and resolve issues affecting performance, uptime, and reliability across platform layers.
- Work closely with engineering teams, hardware suppliers, datacenter teams, and service delivery groups to close technical issues.
- Take part in incident handling, root-cause analysis, and continuous operational improvements.
- Help strengthen monitoring, observability, automation, and day-to-day operational workflows.
- Keep operational documents, runbooks, and knowledge resources current.
What We Are Looking For
- At least 3 years of experience in infrastructure operations, platform operations, network operations, site reliability engineering, cloud operations, datacenter operations, or a similar technical function.
- Strong hands-on Linux administration and troubleshooting ability.
- Solid understanding of networking fundamentals and experience investigating infrastructure issues.
- Practical knowledge of Kubernetes in production settings.
- Background supporting live production infrastructure and services.
- Strong analytical thinking and problem-solving capability.
- Experience working in structured operations and incident management processes.
- Excellent communication and teamwork skills.
- Ability to work in a shift-based operational setup.
Nice to Have
- Exposure to NVIDIA GPU infrastructure and accelerated computing environments.
- Experience with InfiniBand networking and NVIDIA UFM.
- Kubernetes platform operations experience.
- Background in AI infrastructure or HPC environments.
- Experience in site reliability engineering or platform engineering.
- Familiarity with observability tools such as Grafana, Prometheus, ELK, or OpenTelemetry.
- Knowledge of infrastructure automation and Infrastructure-as-Code practices.
- Experience with large-scale distributed systems and production platforms.
Additional Information
You will have the opportunity to work with advanced AI infrastructure in production, gain hands-on exposure to NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking, and help shape how next-generation AI infrastructure is operated and supported. You will also contribute to the evolution of AI-driven operations through k0rdent AI and join a company investing heavily in AI infrastructure and platform services.
Data Processing and Automated Decisions
The employer may use automated decision-making technology for certain employment-related decisions. If you want to opt out of automated decision-making for evaluation and review connected to this role, you may request that choice. You also have the right to appeal any decision made through automated decision-making by contacting the provided employer email address.
Privacy Notice
By submitting your resume, you agree that your personal data may be processed and stored in line with applicable data protection laws for consideration for this and future job opportunities.
Employer Recognition
The company notes that it is recognized as a leader in container management on G2, ranking second after AWS.