Lead Machine Learning Engineer

Thoughtworks Inc.
Manchester
3 days ago
Create job alert

This team will provide 24x7 white-glove support to people using large blocks of GPUs (6,000+ contiguous GPUs) for a short period of time (eg: 6-weeks, 12-weeks etc) to perform Managed Post Training. This includes helping with preparation, 24x7 support during training to ensure full utilization of the GPU clusters and off-boarding. The team is in three timezones with hand-off protocols to enable 24x7 support: US, Europe and India.

The Role

While you can be a specialist in MLE, you need to know enough about cluster operations.

Location

This role can be based at any of our offices across Europe.

Job responsibilities

  • You will help shape and iterate this new white glove support service.
  • You will work in close collaboration with a Lead Cluster Operations Support Engineer.
  • You will contribute to accelerator development: find gaps in the tooling, or needed automation, or patterns we would develop accelerators to make the next round of this more efficient and faster. Eg: We need to improve observability, or we need to automate user onboarding, or we need to bring in a new tool which everyone seems to want to use etc.
  • You will help assess the model training readiness and data preparation.
  • You will provide model training support rotating daytime weekend shifts - with pagers, to any issues they may encounter. These can range from infrastructure issues to data sciences issues or anything in between: eg: AWS changed a configuration in EKS that affects the training.
  • You will facilitate collaborative problem solving within the team by actively listening, communicating effectively and mentoring other engineers.
  • You will contribute to the development and execution of the team's overall ML strategy, aligning technical capabilities with business objectives.
  • You will proactively identify and address challenges related to the white glove service for continued pre training, proposing solutions and implementing improvements.

Job qualificationsTechnical Skills

  • You have proven experience in distributed training of large language models (LLMs) across multiple worker nodes and GPUs.
  • You have deep understanding of LLM architectures, including transformer-based models, and demonstrated ability to design and implement custom models.
  • You have expertise in monitoring large training jobs in a distributed environment and ability to debug job failures.
  • You have deep expertise in Pytorch (or Tensorflow) and debugging training failure modes.
  • You have deep knowledge of fine-tuning or training with open-weight Gen AI models (i.e. Llama, Mistral, Gemma).
  • You have previous experience with Weights & Biases, Run.ai, Pytorch, Tensorflow, Hugging Face libraries.
  • You have experience but not limited to NVIDIA NeMo Stack (for both training and inference).

Professional Skills

  • You will be part of a client facing white glove service where a high level of professionalism is required.
  • You understand the importance of stakeholder management and can easily liaise between clients and other key stakeholders throughout projects, ensuring buy-in and gaining trust along the way.
  • You are resilient in ambiguous situations and can adapt your role to approach challenges from multiple perspectives.
  • You don’t shy away from risks or conflicts, instead you take them on and skillfully manage them.
  • You are eager to coach, mentor and motivate others and you aspire to influence teammates to take positive action and accountability for their work.
  • You enjoy influencing others and always advocate for technical excellence while being open to change when needed.

Other things to knowLearning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

Thanks for your interest in joining Thoughtworks. A member of our Recruiting team will review your application as soon as possible.

Please note that we value privacy: all information submitted to us via your online application will be kept confidential to Thoughtworks.

Sign up for our monthly careers newsletter#J-18808-Ljbffr

Related Jobs

View all jobs

Lead Machine Learning Engineer

Senior Machine Learning Engineer (f/m/d), Greater London

Lead Machine Learning Scientist

Machine Learning Engineer LLM Sales Agent (Automotive)

Sr. Machine Learning Engineer, Amazon QuickSight

Machine Learning Engineer (KTP Associate position)

Get the latest insights and jobs direct. Sign up for our newsletter.

By subscribing you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

Navigating Data Science Career Fairs Like a Pro: Preparing Your Pitch, Questions to Ask, and Follow-Up Strategies to Stand Out

Data science has taken centre stage in the modern workplace. Organisations rely on data-driven insights to shape everything from product innovation and customer experience to operational efficiency and strategic planning. As a result, there is a growing need for skilled data scientists who can analyse large volumes of data, build predictive models, communicate findings effectively, and collaborate cross-functionally. If you are looking to accelerate your data science career—or even land your first role—attending data science career fairs can be a game-changer. Unlike traditional online applications, face-to-face interactions let you showcase your personality, passion, and communication skills in addition to your technical expertise. However, to stand out in a busy environment, you need a clear strategy: from polishing your personal pitch and asking thoughtful questions to following up with a memorable message. In this article, we’ll guide you through every step of making a strong impression at data science career fairs in the UK and beyond.

Common Pitfalls Data Science Job Seekers Face and How to Avoid Them

Data science has become a linchpin for decision-making and innovation across countless industries, from finance and healthcare to tech and retail. The demand for data scientists in the UK continues to climb, with businesses seeking professionals who can interpret complex datasets, build predictive models, and communicate actionable insights. Despite this high demand, the job market can be extremely competitive—and many applicants unknowingly fall into avoidable traps. Whether you’re an aspiring data scientist fresh out of university, a professional transitioning from a quantitative role, or a seasoned analyst looking to expand your skill set, it’s crucial to navigate your job search effectively. In this article, we explore the most common pitfalls data science job seekers face and provide pragmatic advice to help you stand out. By refining your CV, portfolio, interview strategies, and communication skills, you can significantly increase your chances of landing a rewarding data science role. If you’re looking for your next data science job in the UK, don’t forget to explore the listings at Data Science Jobs. Read on to discover how to avoid critical mistakes and position yourself for success.

Career Paths in Data Science: From Entry-Level Analysis to Leadership and Beyond

Data is the lifeblood of modern business, and Data Scientists are the experts who turn raw information into strategic insights. From building recommendation engines to predicting market trends, the impact of data science extends across virtually every industry—finance, healthcare, retail, manufacturing, and beyond. In the UK, data-driven decision-making is critical to remaining competitive in a global market, making data science one of the most sought-after career paths. But how does one launch a career in data science, and how can professionals progress from entry-level analysts to senior leadership roles? In this comprehensive guide, we’ll explore the typical career trajectory, from junior data scientist to chief data officer, discussing the key skills, qualifications, and strategic moves you need to succeed. Whether you’re a recent graduate, transitioning from another technical field, or an experienced data scientist aiming for management, you’ll find actionable insights on forging a successful career in the UK data science sector.