Company Overview
At Gloo Digital, you’re in expert hands. Backed by over 25 years of experience in technology recruitment, we deliver a refined and proven approach to sourcing top talent. As a boutique agency, we have a passion for working with innovative startups while also partnering with some of the world’s largest and most respected companies.
What You’ll Do
-
Design and build high-throughput pipelines for video ingestion, transcoding, feature extraction, and metadata generation.
-
Architect distributed storage systems optimized for various video formats (e.g., MP4, AV1, HEVC) as well as derived data such as embeddings, frames, and metadata.
-
Develop and implement ETL/ELT processes to transform raw video content into analytics-ready datasets.
-
Build feature stores and develop data APIs to deliver structured video data to machine learning and product teams.
-
Collaborate closely with ML engineers, researchers, and platform teams to optimize data access patterns for model training and inference using video datasets.
-
Manage real-time and batch processing systems, using technologies like Kafka, Spark, and Flink, to stream or schedule video processing tasks.
What We’re Looking For
-
5+ years of professional experience in data engineering, particularly with unstructured data such as video, audio, or images.
-
Strong expertise with distributed data processing frameworks like Apache Spark, Beam, or Flink.
-
In-depth knowledge of video formats, codecs, and transcoding workflows.
-
Proficiency in programming languages including Python, SQL, and either Scala or Java.
-
Experience designing and building streaming data pipelines using Kafka, Pulsar, or similar platforms.
-
Strong understanding of cloud infrastructure (AWS, GCP, or Azure), particularly with storage services like S3, GCS, or specialized video storage systems.
-
Experience with metadata stores, feature stores, and data lake architectures.
-
Familiarity with ML/AI workflows involving video data (such as frame extraction, embedding generation, and model preprocessing) is a significant advantage.