Be at the heart of actionFly remote-controlled drones into enemy territory to gather vital information.

Apply Now

Data Engineer (OCR & Data Pipelines, Contract)

Intelance
City of London
2 days ago
Create job alert

Intelance is a specialist architecture and AI consultancy working with clients in regulated, high-trust environments (healthcare, pharma, life sciences, financial services). We are assembling a lean senior team to deliver an AI-assisted clinical report marking tool for a UK-based, UKAS-accredited organisation in human genetic testing.

We are looking for a Data Engineer (OCR & Pipelines) who can turn messy PDFs and documents into clean, reliable, auditable data flows for ML and downstream systems. This is a contract / freelance role (2-3 days/week) working closely with our AI Solution Architect, Lead ML Engineer, and Integration Engineer.

Tasks
  • Design and implement the end-to-end data pipeline for the project:
    • Ingest PDF/Word reports from secure storage
    • Run OCR / text extraction and layout parsing
    • Normalise, structure, and validate the data
    • Store outputs in a form ready for ML and integration.
  • Evaluate and configure OCR / document AI services (e.g. Azure Form Recognizer or similar), and wrap them in robust, retry-safe, cost-aware scripts/services.
  • Define and implement data contracts and schemas between ingestion, ML, and integration components (JSON/Parquet/relational as appropriate).
  • Build quality checks and validation rules (field presence, format, range checks, duplicate detection, basic anomaly checks).
  • Implement logging, monitoring, and lineage so every processed document can be traced from source > OCR > structured output > model input.
  • Work with the ML Engineer to ensure the pipeline exposes exactly the features and metadata needed for training, evaluation, and explainability.
  • Collaborate with the Integration Engineer to deliver clean batch or streaming feeds into the client’s assessment system (API, CSV exports, or SFTP drop-zone).
  • Follow good security and privacy practices in all pipelines: encryption, access control, least privilege, and redaction where needed.
  • Contribute to infrastructure decisions (storage layout, job orchestration, simple CI/CD for data jobs).
  • Document the pipeline clearly: architecture diagrams, table/field definitions, data dictionaries, operational runbooks.
Requirements

Must-have

  • 3-5+ years of hands-on Data Engineering experience.
  • Strong Python skills, including building and packaging data processing scripts or services.
  • Practical experience with OCR / document processing (e.g. Tesseract, Azure Form Recognizer, AWS Textract, Google Document AI, or equivalent).
  • Solid experience building ETL / ELT pipelines on a major cloud platform (ideally Azure, but AWS/GCP is fine if you’re comfortable switching).
  • Good knowledge of data modelling and file formats (JSON, CSV, Parquet, relational schemas).
  • Experience implementing data quality checks, logging, and monitoring for pipelines.
  • Understanding of security and privacy basics: encryption at rest/in transit, access control, secure handling of potentially sensitive data.
  • Comfortable working in a small, senior, remote team; able to take a loosely defined problem and design a clean, maintainable solution.
  • Available for 2-3 days per week on a contract basis, working largely remotely in UK or close European time zones.

Nice-to-have

  • Experience in healthcare, life sciences, diagnostics, or other regulated environments.
  • Familiarity with Azure Data Factory, Azure Functions, Databricks, or similar orchestration/compute tools.
  • Knowledge of basic MLOps concepts (feature stores, model input/output formats).
  • Experience with SFTP-based exchanges and batch integrations with legacy systems.
Benefits
  • Core impact role: you own the pipeline that makes the entire AI solution possible – without you, nothing moves.
  • Meaningful domain: your work supports external quality assessment in human genetic testing for labs worldwide.
  • Lean, senior team: work alongside experienced architects and ML engineers; minimal bureaucracy, direct access to decision-makers.
  • Remote-first, flexible: work from anywhere compatible with UK hours, 2-3 days/week.
  • Contract / freelance: competitive day rate, with potential extension into further phases and additional schemes if the pilot is successful.
  • Opportunity to build reusable data pipeline components that Intelance will deploy across future AI engagements.

We review every application personally. If there’s a good match, we’ll invite you to a short call to walk through the project, expectations, and next steps.


#J-18808-Ljbffr

Related Jobs

View all jobs

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Data Science Recruitment Trends 2025 (UK): What Job Seekers Need To Know About Today’s Hiring Process

Summary: UK data science hiring has shifted from title‑led CV screens to capability‑driven assessments that emphasise rigorous problem framing, high‑quality analytics & modelling, experiment/causality, production awareness (MLOps), governance/ethics, and measurable product or commercial impact. This guide explains what’s changed, what to expect in interviews & how to prepare—especially for product/data scientists, applied ML scientists, decision scientists, econometricians, growth/marketing analysts, and ML‑adjacent data scientists supporting LLM/AI products. Who this is for: Product/decision/data scientists, applied ML scientists, econometrics & causal inference specialists, experimentation leads, analytics engineers crossing into DS, ML generalists with strong statistics, and data scientists collaborating with platform/MLOps teams in the UK.

Why Data Science Careers in the UK Are Becoming More Multidisciplinary

Data science once meant advanced statistics, machine learning models and coding in Python or R. In the UK today, it has become one of the most in-demand professions across sectors — from healthcare to finance, retail to government. But as the field matures, employers now expect more than technical modelling skills. Modern data science is multidisciplinary. It requires not just coding and algorithms, but also legal knowledge, ethical reasoning, psychological insight, linguistic clarity and human-centred design. Data scientists are expected to interpret, communicate and apply data responsibly, with awareness of law, human behaviour and accessibility. In this article, we’ll explore why data science careers in the UK are becoming more multidisciplinary, how these five disciplines intersect with data science, and what job-seekers & employers need to know to succeed in this transformed field.

Data Science Team Structures Explained: Who Does What in a Modern Data Science Department

Data science is one of the most in-demand, dynamic, and multidisciplinary areas in the UK tech and business landscape. Organisations from finance, retail, health, government, and beyond are using data to drive decisions, automate processes, personalise services, predict trends, detect fraud, and more. To do that well, companies don’t just need good data scientists; they need teams with clearly defined roles, responsibilities, workflows, collaboration, and governance. If you're aiming for a role in data science or recruiting for one, understanding the structure of a data science department—and who does what—can make all the difference. This article breaks down the key roles, how they interact across the lifecycle of a data science project, what skills and qualifications are typical in the UK, expected salary ranges, challenges, trends, and how to build or grow an effective team.