Senior Scientific Data Engineer, Data Platform

Recursion
Oxford
1 day ago
Create job alert

Your work will change lives. Including your own.


Recursion is decoding biology to industrialize drug discovery. We are looking for a Senior Scientific Data Engineer. As part of a team, you will own a suite of business‑critical data products, including our Structure‑Activity Relationship data mart.


This is a high‑impact role requiring a strong synthesis of robust software engineering capabilities and deep drug discovery domain expertise. You will take ownership of the data architecture responsible for ingesting, standardizing, and serving both public and proprietary datasets. These systems directly power our competitor intelligence, chemical tractability assessments, and compound design models.


Please note: This is a specialized Data Engineering position focused strictly on data infrastructure and product ownership. While your work will directly enable our machine learning and predictive modeling efforts, the responsibilities do not encompass building or training models. This opportunity is ideally suited for engineers dedicated to architecting complex scientific data systems, rather than data scientists seeking modeling‑focused roles.


The Systems You Will Own

You will join the Data Platform team and maintain an ecosystem of ~100 ingested datasets, while taking specific ownership of high‑value products including:



  • Flagship SAR Data Mart: A unified bioactivity warehouse merging commercial and public (e.g., ChEMBL) databases with internal assay data.
  • Commercial Vendor Data Mart: A massive catalog of purchasable compounds used to guide our internal compound design tools and tractability assessments.
  • Biomedical Knowledge Graph: The critical data feeds and infrastructure that power our semantic graph and associated AI agents, linking targets, diseases, and compounds.
  • Chemical Synthesis Data: The foundational dataset of chemical reactions used for training retrosynthesis models and tractability prediction.
  • Patent Intelligence System: A pipeline transforming patent feeds and competitor data into actionable intelligence.
  • Compound Standardization Registry: A large‑scale chemical structure warehouse ensuring consistency across billions of compounds (similar to UniChem).

What You’ll Do

  • Pipeline Ownership at Scale: Act as a key owner for our core bioactivity pipeline, processing 75M+ records and managing ~100 distinct data feeds. You will navigate complex logic and orchestration, including managing 4000+ lines of complex SQL with 20+ transformation steps.
  • Scientific Data Standardization: Resolve ambiguity by reconciling heterogeneous data formats from diverse commercial and public sources. You will design and implement logic to standardize chemical structures (SMILES, InChI, tautomers), biological targets (UniProt mapping, gene families, species homology), and assay data (IC50/Ki normalization, unit conversion).
  • Engineer for Distributed Compute: Optimize tasks using Python and Snowpark for heavy‑lifting operations, such as large‑scale text mining (extracting dose/concentration from unstructured text) and molecular property calculation.
  • Drive Data Quality: Implement rigorous data quality frameworks (DQF) to handle the nuance of biological data, ensuring our downstream models are trained on clean, semantic‑aware data.
  • Cross‑Functional Consulting: Interface directly with discovery scientists to understand their diverse data needs and translate complex scientific requirements into robust engineering solutions.

The Experience You’ll Need

  • Core Engineering:

    • Advanced SQL & Warehousing: Deep expertise in modern cloud data warehousing (e.g. Snowflake, BigQuery). You should be comfortable with complex window functions, CTEs, and schema design for multi‑layer environments.
    • Python & Distributed Compute: Strong proficiency in Python for data processing. Experience with Data warehouses is a huge plus, but general distributed processing experience is also valuable.
    • Orchestration: Experience managing complex DAGs and asynchronous task coordination (e.g. Prefect, Argo Workflows).


  • Domain Expertise:

    • Medicinal Chemistry Context: You understand how chemistry is represented in data (SMILES, scaffolds) and the nuance of bioactivity measurements (potency vs. efficacy, IC50 vs. pXC50).
    • Biological Context: Familiarity with gene/protein families, species homology, and target nomenclature (e.g., how similar genes appear in different species).
    • Assay Knowledge: Ability to distinguish between assay types (e.g., binding, functional), formats, and the units/measurements associated with them. Ideally familiar with ontologies (e.g., BioAssay Ontology, cell line taxonomies).
    • Data Landscape: Knowledge about public drug discovery datasets and how they can be used to support the drug discovery pipeline.


  • Nice‑to‑Haves:

    • Experience with chemical toolkits (e.g. OpenEye or RDKit).
    • Experience using text mining or LLMs for structured data extraction from scientific text.



Working Location & Compensation

This position can be based at either our London or Milton Park office. Please note that we are a hybrid environment and ask that employees spend 50% of their time in the office.


At Recursion, we believe that every employee should be compensated fairly. Based on the skill and level of experience required for this role, the estimated current annual base range for this role is £75,900 - £101,900. You will also be eligible for an annual bonus and equity compensation, as well as a comprehensive benefits package.


The Values We Hope You Share

  • We act boldly with integrity. We are unconstrained in our thinking, take calculated risks, and push boundaries, but never at the expense of ethics, science, or trust.
  • We care deeply and engage directly. Caring means holding a deep sense of responsibility and respect - showing up, speaking honestly, and taking action.
  • We learn actively and adapt rapidly. Progress comes from doing. We experiment, test, and refine, embracing iteration over perfection.
  • We move with urgency because patients are waiting. Speed isn’t about rushing but about moving the needle every day.
  • We take ownership and accountability. Through ownership and accountability, we enable trust and autonomy—leaders take accountability for decisive action, and teams own outcomes together.
  • We are One Recursion. True cross‑functional collaboration is about trust, clarity, humility, and impact. Through sharing, we can be greater than the sum of our individual capabilities.

Our values underpin the employee experience at Recursion. They are the character and personality of the company demonstrated through how we communicate, support one another, spend our time, make decisions, and celebrate collectively.


Recursion is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other characteristic protected under applicable federal, state, local, or provincial human rights legislation.


Accommodations are available on request for candidates taking part in all aspects of the selection process.


Learn more at www.recursion.com, or connect on X and LinkedIn.


#J-18808-Ljbffr

Related Jobs

View all jobs

Senior Scientific Data Engineer — Bioactivity Data Platform

Director - Principal Engineer, Digital R&D DP&TS Platform and Data Engineering

Senior Data Engineer

Senior Data Engineer

Senior Data Scientist

Data Science Graduate

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

How Many Data Science Tools Do You Need to Know to Get a Data Science Job?

If you’re trying to break into data science — or progress your career — it can feel like you are drowning in names: Python, R, TensorFlow, PyTorch, SQL, Spark, AWS, Scikit-learn, Jupyter, Tableau, Power BI…the list just keeps going. With every job advert listing a different combination of tools, many applicants fall into a trap: they try to learn everything. The result? Long tool lists that sound impressive — but little depth to back them up. Here’s the straight-talk version most hiring managers won’t explicitly tell you: 👉 You don’t need to know every data science tool to get hired. 👉 You need to know the right ones — deeply — and know how to use them to solve real problems. Tools matter, but only in service of outcomes. So how many data science tools do you actually need to know to get a job? For most job seekers, the answer is not “27” — it’s more like 8–12, thoughtfully chosen and well understood. This guide explains what employers really value, which tools are core, which are role-specific, and how to focus your toolbox so your CV and interviews shine.

What Hiring Managers Look for First in Data Science Job Applications (UK Guide)

If you’re applying for data science roles in the UK, it’s crucial to understand what hiring managers focus on before they dive into your full CV. In competitive markets, recruiters and hiring managers often make their first decisions in the first 10–20 seconds of scanning an application — and in data science, there are specific signals they look for first. Data science isn’t just about coding or statistics — it’s about producing insights, shipping models, collaborating with teams, and solving real business problems. This guide helps you understand exactly what hiring managers look for first in data science applications — and how to structure your CV, portfolio and cover letter so you leap to the top of the shortlist.

The Skills Gap in Data Science Jobs: What Universities Aren’t Teaching

Data science has become one of the most visible and sought-after careers in the UK technology market. From financial services and retail to healthcare, media, government and sport, organisations increasingly rely on data scientists to extract insight, guide decisions and build predictive models. Universities have responded quickly. Degrees in data science, analytics and artificial intelligence have expanded rapidly, and many computer science courses now include data-focused pathways. And yet, despite the volume of graduates entering the market, employers across the UK consistently report the same problem: Many data science candidates are not job-ready. Vacancies remain open. Hiring processes drag on. Candidates with impressive academic backgrounds fail interviews or struggle once hired. The issue is not intelligence or effort. It is a persistent skills gap between university education and real-world data science roles. This article explores that gap in depth: what universities teach well, what they often miss, why the gap exists, what employers actually want, and how jobseekers can bridge the divide to build successful careers in data science.