Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Member of Technical Staff, Data Engineering

Cohere
London
1 day ago
Create job alert

Who are we?

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?

As a Data Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere’s advanced language models. Your responsibilities will encompass the end-to-end management of training data, including ingestion, cleaning, filtering, and optimization, as well as data modeling to ensure datasets are structured and formatted for optimal model performance. You will work with diverse data sources, such as web data, code data, and multilingual corpora, to ensure their quality, diversity, and reliability. By combining research and engineering, you will bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization.

Your work will be essential to Cohere’s mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.

Please Note: We have offices in London, Paris, Toronto, San Francisco and New York but also embrace being remote-friendly! There are no restrictions on where you can be located for this role between EST and EU.

As a Member of Technical Staff, Data Engineering, you will:

  • Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets.
  • Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.
  • Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency.
  • Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing.
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models.

You May Be a Good Fit If You Have

  • Strong software engineering skills, with proficiency in Python and experience building data pipelines.
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
  • Experience working with large-scale web datasets like CommonCrawl.
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training.

Bonus: paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply!

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.

Full-Time Employees At Cohere Enjoy These Perks

🤝 An open and inclusive culture and work environment

🧑‍💻 Work closely with a team on the cutting edge of AI research

🍽 Weekly lunch stipend, in-office lunches & snacks

🦷 Full health and dental benefits, including a separate budget to take care of your mental health

🐣 100% Parental Leave top-up for up to 6 months

🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement

🏙 Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend

✈️ 6 weeks of vacation (30 working days!)

Related Jobs

View all jobs

Head of Data Engineering

Head of Estates Data Engineering

Principal Data Engineer

Principal Data Engineer

Principal Data Engineer. Job in Glasgow Education & Training Jobs

Data Architect

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Data Science Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK data science hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise rigorous problem framing, high‑quality analytics & modelling, experiment/causality, production awareness (MLOps), governance/ethics, and measurable product or commercial impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for product/data scientists, applied ML scientists, decision scientists, econometricians, growth/marketing analysts, and ML‑adjacent data scientists supporting LLM/AI products. Who this is for: Product/decision/data scientists, applied ML scientists, econometrics & causal inference specialists, experimentation leads, analytics engineers crossing into DS, ML generalists with strong statistics, and data scientists collaborating with platform/MLOps teams in the UK.

Why Data Science Careers in the UK Are Becoming More Multidisciplinary

Data science once meant advanced statistics, machine learning models and coding in Python or R. In the UK today, it has become one of the most in-demand professions across sectors — from healthcare to finance, retail to government. But as the field matures, employers now expect more than technical modelling skills. Modern data science is multidisciplinary. It requires not just coding and algorithms, but also legal knowledge, ethical reasoning, psychological insight, linguistic clarity and human-centred design. Data scientists are expected to interpret, communicate and apply data responsibly, with awareness of law, human behaviour and accessibility. In this article, we’ll explore why data science careers in the UK are becoming more multidisciplinary, how these five disciplines intersect with data science, and what job-seekers & employers need to know to succeed in this transformed field.

Data Science Team Structures Explained: Who Does What in a Modern Data Science Department

Data science is one of the most in-demand, dynamic, and multidisciplinary areas in the UK tech and business landscape. Organisations from finance, retail, health, government, and beyond are using data to drive decisions, automate processes, personalise services, predict trends, detect fraud, and more. To do that well, companies don’t just need good data scientists; they need teams with clearly defined roles, responsibilities, workflows, collaboration, and governance. If you're aiming for a role in data science or recruiting for one, understanding the structure of a data science department—and who does what—can make all the difference. This article breaks down the key roles, how they interact across the lifecycle of a data science project, what skills and qualifications are typical in the UK, expected salary ranges, challenges, trends, and how to build or grow an effective team.