Senior Data Scientist- AI Evaluation

RELX
Bradford
2 months ago
Applications closed

Related Jobs

View all jobs

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist

Senior Data Scientist (GenAI)

Do you have hands-on experience designing reliable evaluations for LLM/NLP features? Do you enjoy turning messy product questions into clear study designs, metrics, and production-ready code?


About our Team

Elsevier’s AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions used across multiple product lines. We partner with Product, Technology, Domain SMEs, and Governance to ensure our AI features are safe, effective, and continuously improving.


About the Role

As a Senior Data Scientist III, you will design and implement end-to-end evaluation studies and pipelines for AI products. You’ll translate product requirements into statistically sound test designs and metrics, build reproducible Python/SQL pipelines, run analyses and QC, and deliver concise readouts that drive roadmap decisions and risk mitigation. You’ll collaborate closely with SMEs, contribute to our shared evaluation libraries, and produce audit-ready documentation aligned with Responsible AI and governance expectations.


Responsibilities

  • Study design & metrics— Translate product questions into hypotheses, tasks/rubrics, datasets, and success criteria; define metrics (accuracy/correctness, groundedness, reliability, safety/bias/toxicity) with acceptance thresholds.
  • Pipelines & tooling— Build and maintain Python/SQL evaluation pipelines (data prep, prompt/rubric generation, LLM-as-judge with guardrails, scoring, QC, reporting); contribute to shared packages and CI.
  • Statistical rigor— Plan for power, confidence intervals, inter-rater reliability (e.g., Cohen’s κ/ICC), calibration, and significance testing; document assumptions and limitations.
  • SME integration— Partner with SME Ops and domain leads to create clear rater guidance, run calibration, monitor IRR, and incorporate feedback loops.
  • Analytics & reporting— Create analyses that highlight regressions, safety risks, and improvement opportunities; deliver crisp write-ups and executive-level summaries.
  • Governance & compliance— Produce audit-ready artifacts (evaluation plans, datasheets/model cards, risk logs); follow privacy/security guardrails and Responsible AI practices.
  • Quality & reliability— Implement test hygiene (dataset/versioning, golden sets, seed control), observability, and failure analysis; help run post-release regression monitoring.
  • Collaboration— Work closely with Product and Engineering to scope, estimate, and land evaluation work; participate in code reviews and design sessions alongside fellow Data Scientists.

Requirements

  • Education/Experience: Master’s + 3 years, or Bachelor’s + 5 years, in CS, Data Science, Statistics, Computational Linguistics, or related field; strong track record shipping evaluation or ML analytics work.
  • Technical: Strong Python and SQL; experience with LLM/NLP evaluation, data/versioning, testing/CI, and cloud-based workflows; familiarity with prompt/rubric design and LLM-as-judge patterns.
  • Statistics: Comfortable with power analysis, CIs, hypothesis testing, inter-rater reliability, and error/slice analysis.
  • Practices: Git, code reviews, reproducibility, documentation; ability to turn ambiguous product needs into executable study plans.
  • Communication: Clear written/oral communication; ability to produce crisp dashboards and decision-ready summaries for non-technical stakeholders.
  • Mindset: Ownership, curiosity, bias-for-action, and collaborative ways of working.

Nice to have

  • Experience with evaluation of retrieval-augmented or agentic systems and/or with safety/bias/toxicity measurements.
  • Familiarity with lightweight orchestration (e.g., Airflow/Prefect) and containerization basics.
  • Exposure to healthcare or education content or working with clinician/academic SMEs.

We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120.


Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams.


Please read our Candidate Privacy Policy.


We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.


USA Job Seekers: EEO Know Your Rights.


#J-18808-Ljbffr

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

How Many Data Science Tools Do You Need to Know to Get a Data Science Job?

If you’re trying to break into data science — or progress your career — it can feel like you are drowning in names: Python, R, TensorFlow, PyTorch, SQL, Spark, AWS, Scikit-learn, Jupyter, Tableau, Power BI…the list just keeps going. With every job advert listing a different combination of tools, many applicants fall into a trap: they try to learn everything. The result? Long tool lists that sound impressive — but little depth to back them up. Here’s the straight-talk version most hiring managers won’t explicitly tell you: 👉 You don’t need to know every data science tool to get hired. 👉 You need to know the right ones — deeply — and know how to use them to solve real problems. Tools matter, but only in service of outcomes. So how many data science tools do you actually need to know to get a job? For most job seekers, the answer is not “27” — it’s more like 8–12, thoughtfully chosen and well understood. This guide explains what employers really value, which tools are core, which are role-specific, and how to focus your toolbox so your CV and interviews shine.

What Hiring Managers Look for First in Data Science Job Applications (UK Guide)

If you’re applying for data science roles in the UK, it’s crucial to understand what hiring managers focus on before they dive into your full CV. In competitive markets, recruiters and hiring managers often make their first decisions in the first 10–20 seconds of scanning an application — and in data science, there are specific signals they look for first. Data science isn’t just about coding or statistics — it’s about producing insights, shipping models, collaborating with teams, and solving real business problems. This guide helps you understand exactly what hiring managers look for first in data science applications — and how to structure your CV, portfolio and cover letter so you leap to the top of the shortlist.

The Skills Gap in Data Science Jobs: What Universities Aren’t Teaching

Data science has become one of the most visible and sought-after careers in the UK technology market. From financial services and retail to healthcare, media, government and sport, organisations increasingly rely on data scientists to extract insight, guide decisions and build predictive models. Universities have responded quickly. Degrees in data science, analytics and artificial intelligence have expanded rapidly, and many computer science courses now include data-focused pathways. And yet, despite the volume of graduates entering the market, employers across the UK consistently report the same problem: Many data science candidates are not job-ready. Vacancies remain open. Hiring processes drag on. Candidates with impressive academic backgrounds fail interviews or struggle once hired. The issue is not intelligence or effort. It is a persistent skills gap between university education and real-world data science roles. This article explores that gap in depth: what universities teach well, what they often miss, why the gap exists, what employers actually want, and how jobseekers can bridge the divide to build successful careers in data science.