Jobs

ML Platform Engineer


Job details
  • Tag
  • London
  • 1 week ago

ML Platform Engineer

Location:Poland Street, London

(2 days minimum in office)


Why choose Tag?

We are the long-standing, production partner of choice that has helped brands across borders and cultures for over half a century achieve their business goals. With 2,700 experts in 29 countries across the world, we are a global team of collaborators, innovators, and motivators.

We pride ourselves on creating empowering, safe, and supportive environments for all our employees, regardless of race, gender, sexual orientation, ability, or any other defining factor. We embrace difference through diversity of thought, experience, and expertise to maximise potential and bring the greatest benefits for our people and our clients.

We can’t bring big ideas to life without exceptional people. At Tag, we respect individuality and the power of the collective. We want people to be themselves, unafraid to voice ideas, no matter how big they are or who they come from.

In June 2023, Tag was acquired by dentsu Group, Inc, though we remain a distinct brand that is led by David Kassler, Tag Global CEO, headquartered in London. As dentsu’s acquisition of Tag significantly expands content delivery capabilities, Tag’s expertise to deliver personalized, omnichannel content in real-time and at-scale for clients remains unparalleled across the entire customer journey, unlocking marketing effectiveness and efficiency.

Tag and dentsu bring together our innovation and technology infrastructure to help solve clients’ toughest challenges. United in business acumen, we share similar core values, company culture, and embrace a vision “to be at the forefront of people-centric transformations that shape society.

dentsu was founded over 120 years ago and proudly counts nearly 72,000 employees around the world.


The role


Responsibilities

  • Platform Development: Design, build, and maintain scalable machine learning platforms to support model development, experimentation, and production workflows.
  • Infrastructure Automation: Automate the deployment and scaling of ML infrastructure, including data pipelines, model training, validation, and deployment.
  • Model Lifecycle Management: Manage the end-to-end lifecycle of machine learning models, including versioning, deployment, monitoring, and retraining.
  • LLM Operations (LLM Ops): Implement systems and practices for managing large language models (LLMs), ensuring efficient fine-tuning, deployment, and monitoring of these models in production.
  • Collaboration with Data Scientists and Engineers: Provide infrastructure and tools that enable seamless collaboration between data science teams and engineering for the development and deployment of machine learning models.
  • Performance Optimization: Optimize model inference and training performance on a range of hardware architectures, including GPU and cloud-based environments.
  • Security and Compliance: Ensure the security of the ML platform and compliance with relevant regulations and standards, especially in environments dealing with sensitive data.
  • Tooling and Frameworks: Evaluate and integrate MLOps tools, frameworks, and libraries to continuously improve platform capabilities and efficiency.
  • Monitoring and Alerting: Implement robust monitoring and alerting systems for production models, ensuring reliability and timely detection of performance drift or anomalies.
  • User-Centric Development: Emphasize user needs and experiences in platform design and implementation.
  • Adaptive Problem-Solving: Quickly adapt to changing requirements and technological landscapes in ML and AI.
  • Product Focus: Maintain a strong product-oriented mindset, aligning technical solutions with business goals and user needs.


Skills and Experience required

  • Experience:
  • 3+ years of experience in software engineering or infrastructure roles, with a focus on machine learning platforms or MLOps.
  • Proven experience in building, deploying, and maintaining ML platforms or systems at scale.
  • Strong experience with cloud platforms such as AWS, GCP, or Azure, particularly for machine learning and data processing tasks.
  • Experience with containerization technologies (Docker) and orchestration tools (Kubernetes) for ML workloads.
  • Proficiency in programming languages such as Python, and familiarity with ML libraries and frameworks (e.g., TensorFlow, PyTorch).
  • Familiarity with CI/CD pipelines tailored for machine learning (e.g., model validation, deployment automation).
  • Technical Expertise:
  • Experience with distributed systems, model serving, and scaling ML models in production.
  • Hands-on experience with MLOps tools and frameworks such as MLflow, Kubeflow, or similar.
  • Strong understanding of model monitoring, performance optimization, and retraining strategies.
  • Exposure to LLM Ops, including fine-tuning, deploying, and maintaining large language models.
  • Strong focus on automation and experience with infrastructure-as-code tools such as Terraform or CloudFormation.
  • Strong problem-solving skills and experience troubleshooting infrastructure and platform issues.

Key Attributes:

  • Ability to thrive in fast-paced environments and deliver with high velocity
  • Strong product focus and ability to empathize with end-users of ML platforms
  • Adaptability to rapidly changing ML landscapes and emerging technologies
  • Excellent communication skills to bridge gaps between technical and non-technical stakeholders

Preferred Qualifications:

  • Master’s degree in Computer Science, Data Engineering, Machine Learning, or a related field
  • Experience with managing the infrastructure for large language models (LLMs) and their specialized operational needs.
  • Experience with big data processing frameworks like Apache Spark, Kafka, or similar.
  • Expertise in optimizing ML workloads for


As an ethical employer, Tag will never ask job applicants to provide private, sensitive information upfront or make offers of employment contingent on financial requests or responsibilities from any candidate.

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Python Developer.

Careers that Change Lives Acquired by Medtronic – the world’s largest medical device company – in 2019, Digital Technology is a part of the Surgical Robotics operating unit. The company was founded by two surgeons to realize the mission of bringing safe and standardized surgical care to patients around the...

Medtronic London

Senior Data Platform Engineer

Location:London - Hybrid (3 days per week) - MandatoryIndustry: Fintech/PaymentsSalary:£100,000 - £115,000 + Share Options + BenefitsWeDo have partnered with a payment tech company on a mission to create the most empowering and flexible way to pay. How do they do this? By combining the best the world has to...

WeDo London

Lead Data Platform Engineer - World leading start-up

Lead Data Platform Engineer - World Leading Start-UpLead Data Platform Engineer required for one of the fastest growing start-ups in the UK and potentially the world. Despite the organisations young age they are already in a highly profitable position. This ML Health-Tech organisation has a world-leading product. This will be...

Oho Group Ltd London

Machine Learning Engineer – Operational Research

The RoleFind out if this opportunity is a good fit by reading all of the information that follows below.As a Machine Learning Engineer, you will play a crucial role in the development, implementation and maintenance of cutting-edge machine learning products. Your responsibilities will involve engineering sophisticated machine learning models, as...

Deliveroo London

Machine Learning Engineer - Operational Research

The Data & Science OrgAt Deliveroo, we have a world-class data and science organisation with a mission to enable the highest quality human and machine decision-making.We work throughout the company - in product, business and platform teams to answer some of the most interesting questions out there. For example, how...

Deliveroo London

Senior MLOps Engineer

As a Senior MLOps Engineer, you will play a crucial role in enabling applied AI. Your main focus will be on the design, build, and maintenance of secure, scalable and efficient ML Platform, with a platform as a product mindset, that automates the end-to-end life-cycle for traditional ML models and...

Intapp London