Principal Infrastructure Engineer

Job Type: Full Time
Job Location: Qatar
Company Name: asobbi

Company Overview

asobbi is a boutique talent partner based in Berkhamsted, Hertfordshire, focused on bringing the very best commercial and technical talent together to help build some of the most exciting HPC, AI, ML & DC scale-ups in EMEA and the US. With over 30 years of experience recruiting top talent in the service provider and vendor space, we’ve tailored our offering to ensure we deliver the very best service to our clients and candidates. Across our Engaged, Campaign, Navigator and On-Demand services, we support our clients to accelerate their growth, with not only the best passive talent but the right talent with your business values. Acting as an extension or partner to the people team, we work closely with leaders to help shape your business. If you would like to learn more about how we can help you with expanding your existing team or new hiring goals – you can reach out directly to Nick Asbridge at nick.asbridge@asobbi.com or Daniel Tydeman at daniel.tydeman@asobbi.com. If you are looking for your next career opportunity in the HPC, ML, or AI space you can reach out to Clint Gibbins at clint.gibbins@asobbi.com.

About the job

Our client is a scaling Cloud Service Provider specialising in delivering High-Performance Computing as a Service (HPCaaS) to enterprises globally. Their platform supports cutting-edge AI-native workloads and HPC environments, leveraging modern cloud-native technologies to drive innovation. This is an opportunity to work on infrastructure that powers the future of AI, ML, and advanced computational workloads.

The Role:

Infrastructure Design & Virtualisation

  • Architect and implement virtualisation solutions optimised for AI and HPC workloads, with a focus on hypervisor performance tuning.
  • Design dynamic, scalable infrastructure that meets evolving customer demands for storage and networking.

Bare-Metal and Operating System Management

  • Lead provisioning, orchestration, and optimisation of bare-metal systems across global deployments.
  • Ensure secure, high-performance configurations for Unix/Linux environments at scale.

Networking and High-Performance Storage

  • Design and deploy cloud-native, high-performance storage and networking solutions tailored to demanding workloads.
  • Leverage expertise in networking protocols (TCP, UDP, DNS, BGP) and software-defined networking (SDN) technologies.

Kubernetes and Cloud-Native Platforms

  • Manage Kubernetes clusters across hybrid and multi-cloud environments, including container networking interfaces (CNIs) and service meshes.
  • Develop CI/CD pipelines to automate infrastructure delivery and enhance operational reliability.

Observability and Automation

  • Build observability pipelines integrating logging, metrics, and distributed tracing tools.
  • Automate deployments and streamline operations with tools like Terraform, Ansible, Python, and Go.

Architecture & Solution Design

  • Evaluate emerging technologies for scalability, security, and performance within the client’s platform.
  • Create detailed technical and business-aligned architectural proposals.
  • Collaborate with cross-functional teams to ensure successful solution delivery.

Collaboration and Leadership

  • Foster a solution-driven mindset, championing innovative approaches to challenges.
  • Mentor team members in infrastructure best practices and emerging technologies.
  • Align infrastructure projects with broader organisational goals in partnership with engineering leaders.

Skills and Experience

  • Expertise in AI/ML workloads, GPU-accelerated systems, or HPC infrastructures.
  • Proven experience leading infrastructure architecture initiatives in agile environments.
  • Advanced proficiency in Kubernetes, container networking (CNI), and service mesh technologies.
  • Strong background in virtualisation technologies and hypervisor optimisation.
  • Extensive experience in large-scale global deployments, especially in HPC or AI-native environments.
  • In-depth knowledge of IaC tools like Terraform and Ansible.
  • Proficiency in programming languages like Go or Python for automation.
  • Familiarity with observability tools (e.g., Prometheus, Grafana) and distributed tracing systems.

Preferred Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • 8+ years of experience in global-scale infrastructure design and deployment.
  • Strong communication skills with the ability to convey complex concepts to diverse teams.
  • Commitment to creating clear documentation for infrastructure processes and designs.

How to Apply:

APPLY

Apply for this position

Allowed Type(s): .pdf, .doc, .docx