ML Infrastructure Engineer Jobs

Discover the latest remote and onsite ML Infrastructure Engineer roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 20 new ML Infrastructure Engineer opportunities posted on The Homebase

View detail

NPI Engineer

New

Top rated

Figure AI

–

Full-time

–

Posted

Dec 19, 2025 7:53

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000

Undisclosed

YEAR

(USD)

San Jose, United States

Maybe global

Onsite

View detail

Helix Data Creator

New

Top rated

Figure AI

–

Full-time

–

Posted

Dec 19, 2025 2:11

$150,000 – $350,000

Undisclosed

YEAR

(USD)

Spartanburg, United States

Maybe global

Onsite

View detail

Systems Integration Engineer - Actuation Systems

New

Top rated

Figure AI

–

Full-time

–

Posted

Dec 17, 2025 1:46

$150,000 – $350,000

Undisclosed

YEAR

(USD)

San Jose

Maybe global

Onsite

View detail

Tech Lead, LLM & Generative AI (Full Remote - Andorra)

New

Top rated

EverAI

–

Full-time

–

Posted

Nov 27, 2025 22:38

Lead the LLM team of 3 engineers, owning the architecture, training, and deployment of the models powering the core product. Architect the system and mentor the team while spending significant time hands-on writing production code in Python/PyTorch. Own the core chat loop to optimize context windows, memory/RAG retrieval, and inference latency for a real-time experience. Drive the strategy for Supervised Fine-Tuning (SFT) and RLHF/DPO (Preference Optimization) deciding when to prompt, fine-tune, and create new RAG pipelines. Manage the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance. Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment, creating nuanced, context-aware moderation systems beyond simple safe/unsafe classifications.

Undisclosed

()

Andorra

Maybe global

Remote

View detail

Tech Lead, LLM & Generative AI (Full Remote - Serbia)

New

Top rated

EverAI

–

Full-time

–

Posted

Nov 27, 2025 22:32

As a Tech Lead for the LLM team, you will architect the system and mentor the team while spending significant time hands-on in the codebase using Python and PyTorch. You will own the core chat loop by optimizing context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience. You will drive the strategy for Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback / Direct Preference Optimization (RLHF/DPO), deciding when to prompt, fine-tune, or build new RAG pipelines. Additionally, you will manage the data engine by overseeing the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance. You will design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment, moving beyond simple binary safe/unsafe flags to create nuanced, context-aware moderation systems.

Undisclosed

()

Serbia

Maybe global

Remote

View detail

Tech Lead, LLM & Generative AI (Full Remote - Croatia)

New

Top rated

EverAI

–

Full-time

–

Posted

Nov 27, 2025 22:26

The Tech Lead will act as a player/coach by architecting the system and mentoring the team while spending significant time hands-on in the codebase using Python and PyTorch. They will own the core chat loop, optimizing context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience. The Tech Lead will drive the strategy for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF/DPO), deciding when to prompt, fine-tune, or architect a new RAG pipeline. They will manage the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance. Additionally, they will design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment and create nuanced, context-aware moderation systems that go beyond binary safe/unsafe flags.

Undisclosed

()

Croatia

Maybe global

Remote

View detail

Tech Lead, LLM & Generative AI (Full Remote - Gibraltar)

New

Top rated

EverAI

–

Full-time

–

Posted

Nov 27, 2025 22:23

The Tech Lead will lead and code in the LLM team focusing on architecture, training, and deployment of models for the core product. Responsibilities include acting as a player/coach by mentoring the team and writing production code (Python/PyTorch), owning the core chat loop to optimize context windows, memory/RAG retrieval, and inference latency, driving strategies for Supervised Fine-Tuning (SFT) and RLHF/DPO, managing data sourcing, labeling, and cleaning for model steerability and multicultural performance. The role also involves designing and training custom classifiers for nuanced moderation to detect and filter non-consensual or illegal content in an explicit environment, surpassing simple safe/unsafe classifications to build context-aware moderation systems.

Undisclosed

()

Gibraltar

Maybe global

Remote

View detail

Member of Technical Staff, GPU Optimization

New

Top rated

Mirage

–

Full-time

–

Posted

Nov 4, 2025 21:15

Optimize model training and inference pipelines, including data loading, preprocessing, checkpointing, and deployment, to improve throughput, latency, and memory efficiency on NVIDIA GPUs; design, implement, and benchmark custom CUDA and Triton kernels for performance-critical operations; integrate low-level optimizations into PyTorch-based codebases, including custom operators, low-precision formats, and TorchInductor passes; profile and debug the entire stack from kernel launches to multi-GPU I/O paths using various profiling tools such as Nsight, nvprof, PyTorch Profiler, and custom tools; collaborate with colleagues to co-design model architectures and data pipelines that are hardware-friendly while maintaining state-of-the-art quality; stay updated on the latest GPU and compiler technologies and assess their impact; work closely with infrastructure and backend teams to improve cluster orchestration, scaling strategies, and observability for large experiments; provide clear, data-driven insights regarding performance, quality, and cost trade-offs; contribute to a culture emphasizing fast iteration, thoughtful profiling, and performance-centric design.

$200,000 – $350,000

Undisclosed

YEAR

(USD)

New York, United States

Maybe global

Onsite

View detail

Python / PyTorch Developer — Frontend Inference Compiler – Dubai

New

Top rated

Cerebras Systems

–

Full-time

–

Posted

Oct 31, 2025 8:38

You will develop and maintain the frontend compiler infrastructure that ingests PyTorch models and produces intermediate representations to optimize performance on Cerebras' AI hardware platforms. This includes collaborating with ML and compiler teams, extending PyTorch-based tooling, and working with the latest open and closed generative AI models for optimal inference.

Undisclosed

()

Dubai, United Arab Emirates

Maybe global

Onsite

View detail

Backend ML Engineer at Robyn AI

New

Top rated

M13

–

Full-time

–

Posted

Oct 17, 2025 7:07

The Backend ML Engineer at Robyn AI is responsible for building the backend infrastructure that powers the application, including conversations, memory, real-time personalization, voice and chat interfaces, scalable infrastructure for emotional intelligence, secure and fast APIs for the iOS app, and a robust machine learning inference and fine-tuning pipeline. This role involves working and adding to the C#/.NET/ASP.NET backend API layer and progressively adding Python microservices with AI-native architecture. The engineer will own the full backend surface area including authentication, APIs, infrastructure, and orchestration, designing features for scale and velocity. Responsibilities also include building and maintaining REST and GraphQL APIs, architecting a microservice-style ML model serving backend deployed via Docker containers or AWS Lambda with async eventing and pub/sub, managing CI/CD, rollback strategies, logging, and error handling. The engineer will integrate AI and ML systems, manage vector databases for retrieval-augmented generation and personalization, build custom memory pipelines, integrate and scale inference with various models, maintain API orchestration with third-party model providers, manage AWS infrastructure and related technologies, implement search databases and infrastructure-as-code with Terraform, ensure observability with metrics and logging tools, optimize latency and caching, set up secure infrastructure for SOC-2 readiness, and design and ship emotion-aware backend systems that update in real-time. The role includes working closely with product and AI teams to tune the system's behavior based on user feedback, emotion logs, and interaction history, and owning all personalization logic.

$150,000 – $250,000

Undisclosed

YEAR

(USD)

United States

Maybe global

Remote

Want to see more ML Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for ML Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a ML Infrastructure Engineer do?","answer":"ML Infrastructure Engineers design, build, and maintain systems that support machine learning workflows from development to production. They create scalable platforms for model training and serving, implement distributed training systems, and develop monitoring solutions to track model performance. These engineers also build data pipelines, optimize ML systems for performance, and implement automated testing and deployment processes while collaborating with data scientists and researchers to productionize ML models."},{"question":"What skills are required for ML Infrastructure Engineer?","answer":"ML Infrastructure Engineers need strong programming skills in Python and sometimes Go, Rust, or C++. Proficiency with ML frameworks like PyTorch and TensorFlow is essential, alongside expertise in cloud platforms (AWS, GCP), containers (Docker), and orchestration (Kubernetes). They should understand distributed systems, data engineering concepts, and model serving techniques. Experience with infrastructure-as-code tools and monitoring systems rounds out the technical requirements, complemented by problem-solving abilities and collaboration skills."},{"question":"What qualifications are needed for ML Infrastructure Engineer role?","answer":"Most ML Infrastructure Engineer positions require a Bachelor's or Master's degree in Computer Science or related field, plus 4-5+ years of experience building production ML systems. Employers typically expect demonstrable experience with cloud platforms, containerization tools, and ML frameworks. Strong understanding of system-level software, machine learning concepts, and resource utilization is necessary. Experience with distributed systems and high-throughput workloads is highly valued, especially for senior positions."},{"question":"What is the salary range for ML Infrastructure Engineer job?","answer":"The research provided doesn't specify salary ranges for ML Infrastructure Engineer jobs. Compensation typically varies based on factors like location, company size, experience level, and specific technical expertise. Organizations like Anthropic, Scale AI, Apple, and other technology companies actively hiring for these positions likely offer competitive compensation packages reflecting the specialized nature of ML infrastructure skills and the current market demand."},{"question":"How long does it take to get hired as a ML Infrastructure Engineer?","answer":"The hiring timeline for ML Infrastructure Engineer positions isn't specified in the provided research. The process typically includes technical interviews focused on systems design, ML fundamentals, and programming skills. Given the specialized nature of the role, companies often conduct thorough evaluations of candidates' experience with production ML systems, distributed computing, and relevant technologies. The specialized requirements may extend the hiring process compared to more general engineering roles."},{"question":"Are ML Infrastructure Engineer job in demand?","answer":"Yes, ML Infrastructure Engineer jobs show strong demand based on active openings at major companies like DataXight, Scale AI, Anthropic, Apple, and Character.AI. The field is growing particularly in specialized areas such as LLM serving infrastructure, on-device ML optimization, and safety-critical ML systems. These positions are distributed across major tech hubs with opportunities ranging from mid-level to senior roles, reflecting industry's increasing need for engineers who can build reliable ML systems at scale."}]

Find AI jobs in by countries

AI Jobs in Argentina

AI Jobs in Australia

AI Jobs in Brazil

AI Jobs in Canada

AI Jobs in China

AI Jobs in France

AI Jobs in Germany

AI Jobs in Hong Kong

AI Jobs in India

AI Jobs in Japan

AI Jobs in Mexico

AI Jobs in Poland

AI Jobs in Singapore

AI Jobs in South Korea

AI Jobs in Spain

AI Jobs in Sweden

AI Jobs in Taiwan

AI Jobs in United Kingdom

AI Jobs in United States

Skip the AI noise - Learn how smart founders & tech leaders leverage AI to grow & build game-changing lean AI companies in our 5-min weekly read.

For you

Community Podcasts Interviews Newsletters Jobs Companies Events

Company

Home About Partners Resources

Find us on Youtube

Find us on Spotify

Find us on Apple Podcast

Find us on Instagram

Find us on LinkedIn

ML Infrastructure Engineer Jobs

Hiring by

Check out 20 new ML Infrastructure Engineer opportunities posted on The Homebase

NPI Engineer

Helix Data Creator

Systems Integration Engineer - Actuation Systems

Tech Lead, LLM & Generative AI (Full Remote - Andorra)

Tech Lead, LLM & Generative AI (Full Remote - Serbia)

Tech Lead, LLM & Generative AI (Full Remote - Croatia)

Tech Lead, LLM & Generative AI (Full Remote - Gibraltar)

Member of Technical Staff, GPU Optimization

Python / PyTorch Developer — Frontend Inference Compiler – Dubai

Backend ML Engineer at Robyn AI

Access all 4,256 remote & onsite AI jobs.

Frequently Asked Questions

Find AI jobs in by countries

Find AI jobs for similar categories