This page was automatically translated and may contain errors. View in English.
C

Machine Learning Engineer

Cantina Labs

Singapore • Penuh Waktu

Jadilah yang pertama mendaftar

Pengalaman
Setiap
Gaji
Lowongan
1
Diposting
1 jam yang lalu

Where you'll work

Deskripsi pekerjaan

About the Company

Cantina Labs is a social AI business focused on building advanced real-time models that expand what expression, personality, and realism can look like in digital experiences. Its products aim to bring characters to life and change the way people tell stories, connect, and create. The company develops the underlying systems that power these ecosystems, with Cantina as its flagship social AI platform.

Role Overview

The Singapore team is hiring an ML Engineer to help build and scale systems for ingesting, processing, and delivering large volumes of video and multimodal data used in model training. The role owns the full lifecycle of the pipeline, starting from raw source material through to curated, filtered, training-ready datasets. The emphasis is on fast execution, dependable systems, reproducibility, and keeping costs under control. You will work closely with curation and modeling teams to turn changing dataset recipes into reliable production workflows and continuously improve training results.

Key Responsibilities

  • Build and scale distributed pipelines for preprocessing, dataset creation, and recurring dataset refresh cycles.
  • Take ownership of orchestration, scheduling, observability, and recovery mechanisms for large-scale processing jobs.
  • Develop and maintain containerized pipeline infrastructure using Kubernetes or similar orchestration platforms.
  • Improve cloud data movement and storage across AWS, GCS, or Azure with attention to cost, throughput, and operational efficiency.
  • Set and enforce standards for dataset layout, version control, caching, retention, and access behavior.
  • Create curation workflows that decide which video and image assets are selected, filtered, and retained for model training, including image-text pair datasets used in joint training setups.
  • Build scalable captioning and metadata-generation systems using VLMs for both video and image datasets.
  • Develop quality scoring, aesthetic scoring, CLIP-based semantic filtering, and other signal-extraction methods to improve data selection.
  • Design tooling for large-scale deduplication, including exact-match and near-duplicate detection across extensive video collections.
  • Review dataset composition, surface quality issues, and refine curation logic to improve training performance.
  • Define and update standards for high-quality, training-ready video data across different training approaches.

Candidate Profile

The ideal candidate brings strong practical experience in building or scaling large data systems and machine learning pipelines, especially for dataset curation, filtering, and quality improvement. You should be comfortable working with distributed processing, workflow orchestration, containers, and cloud infrastructure, and be able to reason about storage design and access tradeoffs. Experience with captioning systems, quality/aesthetic scoring, semantic filtering, and media processing tools is important, along with solid Python skills and strong communication and documentation habits.

Required Technical Skills

  • Distributed data processing for machine learning workflows
  • Dataset curation and filtering systems
  • Workflow orchestration and scheduling
  • Containerization with Docker
  • Container orchestration with Kubernetes
  • Cloud storage and compute on AWS, GCS, or Azure
  • Python programming
  • Video and image processing tools
  • Semantic filtering and embedding-based retrieval techniques
  • Documentation and cross-functional communication

Benefits

  • Competitive pay along with meaningful company equity
  • Personal leave plus paid public holidays
  • Health coverage
  • Global travel insurance for international trips
  • Monthly stipend of $500, which is approximately S$635
  • All necessary equipment for a home office setup

Additional Information

This position is based in Singapore and is a full-time onsite role.

Biarkan saja jika Anda ingin mendapat balasan — kami tidak akan menggunakannya untuk hal lain.

Klik untuk melihat-lihat, seret & lepas, atau pasta tangkapan layar

PNG, JPG, GIF, MP4, WebM, MOV · Maksimal 20MB per file · Hingga 5 file