DevOps Engineer II

Job Category: Technology and IT
Job Type: Full tme
Job Location: USA
Company Name: Chewy

Company Overview

We don’t just talk about our mission—we live it. Our goal is to be the most reliable and accessible destination for pet parents and partners, and we’re always finding new ways to make that happen.

At Chewy, we understand the joys and challenges of pet parenthood, and we’re here to support them through both the highs and lows. As a leading online retailer for pet products, supplies, and prescriptions, we offer a wide selection of high-quality products and services at competitive prices. Our customer care is exceptional, offering a personal touch to ensure every pet’s happiness and health.

Key Responsibilities

  • Develop and maintain scalable ML Ops pipelines, including model versioning, testing, and monitoring.

  • Design containerized solutions using Docker and Kubernetes for efficient deployment and scalability.

  • Implement Infrastructure as Code (IaC) solutions with tools like Terraform or CloudFormation to streamline infrastructure management.

  • Develop and manage CI/CD workflows tailored to AI models and data pipelines for rapid iteration and deployment.

  • Optimize cloud-based applications on AWS for cost-efficiency and performance improvements.

  • Collaborate with applied scientists and data engineers to ensure seamless integration of research prototypes into production systems.

  • Build robust ETL/ELT pipelines to process structured and unstructured HR data using tools like Apache Airflow or AWS Glue.

  • Ensure quality through automated testing frameworks and implement quality assurance processes for machine learning pipelines.

  • Translate complex technical requirements into practical, business-aligned solutions.

Qualifications

  • 5+ years of experience with containerization and orchestration tools such as Docker and Kubernetes.

  • Expertise in Infrastructure as Code (IaC) tools, including Terraform and CloudFormation.

  • Strong background in Continuous Integration and Continuous Deployment (CI/CD) systems and workflows, particularly with Github Actions and Jenkins.

  • Proven experience integrating web services and applications into existing private networks.

  • Proficiency in Python and SQL, with experience using libraries like pandas, scikit-learn, and PyTorch or TensorFlow.

  • Experience deploying machine learning models in production environments using tools like SageMaker or MLflow.

  • Strong problem-solving skills and ability to collaborate effectively in multi-functional teams.

Preferred Qualifications

  • Experience with monitoring and observability tools such as Prometheus, Grafana, or similar.

  • Knowledge of security standards in cloud and machine learning deployments.

  • Strong understanding of data engineering principles and tools like Apache Spark or Kafka.

  • Familiarity with ML Ops frameworks such as Kubeflow, MLflow, or similar platforms.

  • Experience creating cost-optimized and scalable solutions for machine learning workflows in the cloud.

  • Ability to communicate complex technical concepts to non-technical team members.

Compensation and Benefits

  • The salary offered will depend on factors such as relevant experience, education, and work location. This role also offers a 401k plan and both new hire and annual equity grants.

  • Benefits include medical, dental, vision, life, disability, hospital indemnity, critical illness, and accident insurance, along with parental leave and family services benefits.

  • Additional perks include backup dependent care, flexible spending accounts, telemedicine, pet adoption reimbursement, and employee assistance programs.

APPLY

Apply for this position

Allowed Type(s): .pdf, .doc, .docx