Senior DevOps Engineer, Deep Learning Frameworks

Job Type: Full Time
Job Location: United States
Company Name: NVIDIA

Company Overview

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

About the job

NVIDIA’s Deep Learning Optimized Frameworks Group is looking for an excellent DevOps Engineer to enable the next wave of NVIDIA’s highest performing deep learning software stacks. Your role spans multiple products such as TensorFlow and PyTorch and is instrumental for streamlining development, build, and releases with modern DevOps tools. Join our technically hardworking team of software engineers and infrastructure authorities to design the systems that enable NVIDIA to stay ahead of the competition as we deliver the world’s fastest deep learning frameworks.

What You’ll Be Doing

  • Automating and optimizing build, test, integrate, and release processes for optimized NVIDIA Deep Learning Frameworks
  • Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Gitlab, Jenkins, Docker, LXC, HyperV, CMake, Bazel)
  • Developing shared utilities for setting up systems, running tests, and recording results
  • Lead best-practices for building, testing, and releasing software
  • Identifying infrastructure needs and translating them into action

What We Need To See

  • BS or higher degree in computer science (or equivalent experience)
  • 5+ years of relevant experience
  • Strong experience setting up, maintaining, and automating continuous integration systems
  • Fluency in SCM (e.g. Github, Gitlab, Git) and build systems (e.g. Make, CMake, Bazel, Docker)
  • Adept programming skills in Python (or Perl, Shell scripting, like bash, tcsh, sh )
  • Pragmatic approach to solving problems and collaboration
  • Real passion for “it just works” automation and enabling team members

Ways To Stand Out From The Crowd

  • Experience with CUDA and Deep Learning Software Stack
  • Good knowledge of container and cluster technologies like slurm, kubernetes, jenkins, gitlab-ci, and zabbix
  • Experience with GPU computing systems
  • Track record of identifying useful new technologies and incorporating them into SW development flows
  • Experience as an active contributor to a SW project involving many developers

    APPLY

Apply for this position

Allowed Type(s): .pdf, .doc, .docx