Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Data Engineer – ML Training Infrastructure

SpAItial AI
City of London
2 weeks ago
Create job alert
Data Engineer – ML Training Infrastructure

SpAItial AI


SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content.


We’re seeking a Data Engineer to build the pipelines and infrastructure that fuel our large‑scale model training. As the first engineer focused on data, you’ll shape the backbone of how we handle terabytes of multimodal training data (images, video, and 3D). This role is ideal for someone who thrives at the intersection of data systems and machine learning—designing reliable, scalable, and efficient ways to get high‑quality data into cutting‑edge training runs.


Responsibilities

  • Architect and manage data infrastructure for large‑scale ML training datasets (e.g., Apache, Iceberg, Parquet, Spark).
  • Build and operate ingestion pipelines for multimodal data (e.g., images, videos, 3D), including metadata generation and quality signals.
  • Design data loaders, caching, and serving strategies optimized for ML training.
  • Develop tools for dataset inspection, experiment tracking, and evaluation workflows.
  • Partner closely with ML researchers to ensure infrastructure scales with training demands.
  • Uphold code quality and best practices in testing, CI/CD, and reproducibility.

Key Qualifications

  • 3+ years professional software/data engineering experience with production systems.
  • Proven experience in large‑scale data processing for ML training (not just analytics/BI).
  • Hands‑on with distributed data frameworks (e.g., Spark, Beam, Cloud SQL) and modern data formats (Parquet, Iceberg).
  • Proficiency in cloud platforms (AWS, GCP, or Azure).
  • Strong Python development skills, including testing and code quality.
  • Experience building and maintaining CI/CD pipelines.

Preferred Qualifications

  • Familiarity with ML frameworks (e.g., PyTorch, TensorFlow).
  • Experience preparing multimodal datasets (images, video, 3D) for ML pipelines.
  • Background in computer vision or 3D reconstruction (e.g., Structure-from-Motion).
  • Interest in AI‑assisted developer tools (Cursor, Windsurf, etc.).

At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.


#J-18808-Ljbffr

Related Jobs

View all jobs

Principle Data Engineer

Senior Data Engineer

Data Engineer - Azure, Databricks, ML/AI

AI Data Engineer

AI Data Engineer

Senior Data Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Data Science Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK data science hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise rigorous problem framing, high‑quality analytics & modelling, experiment/causality, production awareness (MLOps), governance/ethics, and measurable product or commercial impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for product/data scientists, applied ML scientists, decision scientists, econometricians, growth/marketing analysts, and ML‑adjacent data scientists supporting LLM/AI products. Who this is for: Product/decision/data scientists, applied ML scientists, econometrics & causal inference specialists, experimentation leads, analytics engineers crossing into DS, ML generalists with strong statistics, and data scientists collaborating with platform/MLOps teams in the UK.

Why Data Science Careers in the UK Are Becoming More Multidisciplinary

Data science once meant advanced statistics, machine learning models and coding in Python or R. In the UK today, it has become one of the most in-demand professions across sectors — from healthcare to finance, retail to government. But as the field matures, employers now expect more than technical modelling skills. Modern data science is multidisciplinary. It requires not just coding and algorithms, but also legal knowledge, ethical reasoning, psychological insight, linguistic clarity and human-centred design. Data scientists are expected to interpret, communicate and apply data responsibly, with awareness of law, human behaviour and accessibility. In this article, we’ll explore why data science careers in the UK are becoming more multidisciplinary, how these five disciplines intersect with data science, and what job-seekers & employers need to know to succeed in this transformed field.

Data Science Team Structures Explained: Who Does What in a Modern Data Science Department

Data science is one of the most in-demand, dynamic, and multidisciplinary areas in the UK tech and business landscape. Organisations from finance, retail, health, government, and beyond are using data to drive decisions, automate processes, personalise services, predict trends, detect fraud, and more. To do that well, companies don’t just need good data scientists; they need teams with clearly defined roles, responsibilities, workflows, collaboration, and governance. If you're aiming for a role in data science or recruiting for one, understanding the structure of a data science department—and who does what—can make all the difference. This article breaks down the key roles, how they interact across the lifecycle of a data science project, what skills and qualifications are typical in the UK, expected salary ranges, challenges, trends, and how to build or grow an effective team.