Jobs

ML Infrastructure Engineer


Job details
  • Millennium Management
  • London
  • 1 day ago

ML Infrastructure Engineer

This role is a member of the AI/ML Infrastructure Engineering team and will be dedicated to implementing and supporting AI/ML infrastructure solutions in cloud and on-premise environments. The role will work directly with infrastructure teams and potentially face off with data scientists, machine learning engineers, application developers, and quantitative analysts by functioning as both a solutions architect, helping them implement their own AI/ML solutions, and as a professional services engineer, implementing solutions for them in cloud environments such as AWS, GCP, and Kubernetes.

This is a hands-on developer role and candidates ideally have had experience deploying and supporting their own production-ready AI/ML models in cloud environments as well as automating the build and management of a broad range of cloud infrastructure using tools like Terraform. Candidates should be familiar with developing unit and functional tests, have experience designing and implementing CI/CD tools with infrastructure as code pipelines, and have knowledge of Linux systems administration, containerization, networking, security, automated configuration and state management, cross-system orchestration, configuration management, logging, metrics, monitoring, and alerting.

Principal Responsibilities:

  1. Architect, develop and maintain internal AI/ML infrastructure components, frameworks, and offerings
  2. Architect, develop and maintain AI/ML solutions for customers in cloud environments
  3. Help customers architect, develop and maintain their own AI/ML solutions in cloud environments
  4. Implement CI/CD pipelines which include application tests, security tests, and gates
  5. Implement availability, security, performance monitoring, and alerting of AI/ML solutions
  6. Automate data resiliency and replication for AI/ML models
  7. Manage multiple environments and promote code between them
  8. Automate systems configuration and orchestration using tools such as Terraform, Chef, Ansible, or Salt
  9. Automate creation of machine images and containers

Required Qualifications/Skills:

  1. 6+ years of experience designing and supporting production cloud environments
  2. Experience consulting with customers to develop AI/ML solutions
  3. Experience developing collaboratively, including infrastructure as code, preferably in Python
  4. Systems engineering knowledge, including understanding of Linux, security, and networking
  5. Cloud templating tools such as Terraform
  6. Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch)
  7. Experience with distributed computing tools (e.g., Ray, Dask)
  8. Experience with model serving tools (e.g., vLLM, KFServing)
  9. Experience with building, monitoring, and alerting on logs and metrics
  10. Cloud Networking including connectivity, routing, DNS, VPCs, proxies, and load balancers
  11. Cloud Security including IAM, Certificate Management, and Key Management
  12. Excellent written and verbal communications
  13. Excellent troubleshooting and analytical skills
  14. Self-starter able to execute independently, on a deadline, and under pressure

#J-18808-Ljbffr

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Senior Machine Learning Engineer

Senior Machine Learning Engineer£85,000-£110,000Start-UpRemote based with occasional meet ups in the UKChance to work with industry leading expertsWe are currently partnered with a revolutionary start-up looking to bring in a Senior Machine Learning engineering to work with the co-founders and newly appointed CTO. As a tech-driven AI startup, we are...

Burns Sheehan Leeds

Staff NLP Data Engineer and Team Lead

Job descriptionThe Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal...

GSK London

Application Engineer I

Job descriptionAt GSK, we want to supercharge our data capability to better understand our patients and accelerate our ability to discover vaccines and medicines. The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step-change in our ability to...

GSK London

Data Scientist/Data Engineer - Energy Analytics

About Us:At Powerverse, we are empowering people and communities to run their lives on sustainable new energy with ease. Our teams Build Products that Matter, and we value being Passionate, Curious and Connected.Powerverse is a leader in the growing Energy Management market and is owned by Lightsource bp. We are...

Powerverse

Machine Learning Engineer- World-Leading Prop Trading Fund | London, UK

Machine Learning Engineer- World-Leading Prop Trading FundSummary:Fantastic opportunity to work at a tech-centric prop trading fund which trades a wide range of financial products, with offices across the globe. Looking for a pragmatic ML Engineer with strong mathematical foundations to join their growing ML team and help drive the direction...

Oxford Knight London

Lead Machine learning Engineer

LEAD MLOPs ENGINEERUp to £90,000 + 10% bonus, car allowance and benefitsREMOTE (London once a month)This is a chance to join a leading Telecomms company as a part of their Data Science team help build and deploy impactful models and work with cutting-edge technologies. They are looking for a Lead...

Harnham Glasgow