Senior Service Reliability Operations Administrator

Job Category: Technology and IT
Job Location: United States ·
Company Name: NVIDIA

About the Role

NVIDIA’s NGC team is building a global Service Reliability Operations Center to deliver near-100% availability for its cloud products and services. As part of the CIS Team, you will collaborate with Site Reliability Engineering, Security Operations, DevOps, and other partners to ensure resilient and secure operations. You’ll be the front line during incidents, reducing downtime and driving continuous improvement in reliability and customer experience.

What You Will Be Doing

  • Provide 24/7 support in a follow-the-sun model (US & India teams).

  • Work a 4×10 schedule (some shifts may include weekends and early/late hours).

  • Monitor systems with alerts and alarms to prevent and mitigate incidents.

  • Perform system, network, and security administration tasks.

  • Partner with developers to create and maintain runbooks for troubleshooting and operations.

  • Detect and escalate incidents, coordinating with subject matter experts and service owners.

  • Analyze and feed back improvements into reliability processes.

  • Contribute to predictive monitoring and automation for proactive issue detection.

What We Need To See

  • 5+ years administering open system servers in production.

  • 3+ years in demanding environments (Cloud, Internet, or Telecom) in SysAdmin, DevOps, SRE, or NOC roles.

  • Bachelor’s degree in a relevant field or equivalent experience.

  • Strong skills in monitoring tools, ticketing systems, and troubleshooting.

  • Expertise in server administration, shell scripting, automation, DNS, DHCP, storage, networking, and IP Tables.

  • RHCE-level knowledge or equivalent.

  • Experience with Python scripting (preferred), virtualization, cloud environments, containers, and orchestration systems.

  • Familiarity with Git and Ansible.

  • Ability to analyze system/network performance via alerts, data, and graphs.

  • Strong interpersonal and collaboration skills in high-pressure situations.

Why NVIDIA?

NVIDIA is building the world’s most advanced compute platforms, powering breakthroughs for scientists, researchers, designers, and gamers. Our culture values innovation, excellence, determination, and teamwork, making us one of the most dynamic places to work. We are also deeply committed to diversity, equity, and inclusion and are proud to be an equal opportunity employer.


APPLY

Apply for this position

Allowed Type(s): .pdf, .doc, .docx