ML Infrastructure Engineer Jobs

Discover the latest remote and onsite ML Infrastructure Engineer roles across top active AI companies. Updated hourly.

Check out 20 new ML Infrastructure Engineer opportunities posted on The Homebase

Software Engineer, macOS Core Product - Stamford, USA

New
Top rated
Speechify
Full-time
Full-time
Posted

Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for diverse use cases. Deploy and operate core ML inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture to improve performance, latency, throughput, and efficiency of deployed models. Build tools to identify bottlenecks and sources of instability and design and implement solutions to address high priority issues.

$140,000 – $200,000
Undisclosed
YEAR

(USD)

Stamford, United States
Maybe global
Remote

Software Engineer, macOS Core Product - Glendale, USA

New
Top rated
Speechify
Full-time
Full-time
Posted

Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for a diverse range of use cases. Deploy and operate the core ML inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of deployed models. Build tools to provide visibility into bottlenecks and sources of instability, then design and implement solutions to address the highest priority issues.

$140,000 – $200,000
Undisclosed
YEAR

(USD)

Glendale, United States
Maybe global
Remote

Software Engineer, macOS Core Product - Jackson, USA

New
Top rated
Speechify
Full-time
Full-time
Posted

Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for diverse use cases. Deploy and operate the core ML inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture to improve performance, latency, throughput, and efficiency of deployed models. Build tools to identify bottlenecks and sources of instability and design and implement solutions to address the highest priority issues.

$140,000 – $200,000
Undisclosed
YEAR

(USD)

Jackson, United States
Maybe global
Remote

Machine Learning Engineer: ML Infra and Model Optimization

New
Top rated
Genies
Intern
Full-time
Posted

Develop and deploy LLM agent systems within the AI-powered avatar framework. Design and implement scalable and efficient backend systems to support AI applications. Collaborate with AI and NLP experts to integrate LLM and LLM-based systems and algorithms into the avatar ecosystem. Work with Docker, Kubernetes, and AWS for AI model deployment and scalability. Contribute to code reviews, debugging, and testing to ensure high-quality deliverables. Document work for future reference and improvement.

$40 – $50 / hour
Undisclosed
HOUR

(USD)

Los Angeles, United States
Maybe global
Hybrid

Machine Learning Engineer (AI detection, Toronto)

New
Top rated
GPTZero
Full-time
Full-time
Posted

Design, train, and fine-tune state-of-the-art language models; develop AI agents combined with retrieval-augmented language models; build efficient and scalable machine learning training and inference systems; stay up-to-date with the latest literature and emerging technologies to solve novel problems; work closely with product and design teams to develop intuitive applications that create societal impact.

CA$140,000 – CA$260,000
Undisclosed
YEAR

(CAD)

Toronto, Canada
Maybe global
Hybrid

NPI Engineer

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

San Jose, United States
Maybe global
Onsite

Helix Data Creator

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

Spartanburg, United States
Maybe global
Onsite

Systems Integration Engineer - Actuation Systems

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

San Jose
Maybe global
Onsite

Tech Lead, LLM & Generative AI (Full Remote - Andorra)

New
Top rated
EverAI
Full-time
Full-time
Posted

Lead the LLM team of 3 engineers, owning the architecture, training, and deployment of the models powering the core product. Architect the system and mentor the team while spending significant time hands-on writing production code in Python/PyTorch. Own the core chat loop to optimize context windows, memory/RAG retrieval, and inference latency for a real-time experience. Drive the strategy for Supervised Fine-Tuning (SFT) and RLHF/DPO (Preference Optimization) deciding when to prompt, fine-tune, and create new RAG pipelines. Manage the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance. Design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment, creating nuanced, context-aware moderation systems beyond simple safe/unsafe classifications.

Undisclosed

()

Andorra
Maybe global
Remote

Tech Lead, LLM & Generative AI (Full Remote - Serbia)

New
Top rated
EverAI
Full-time
Full-time
Posted

As a Tech Lead for the LLM team, you will architect the system and mentor the team while spending significant time hands-on in the codebase using Python and PyTorch. You will own the core chat loop by optimizing context windows, memory/RAG retrieval, and inference latency to ensure a seamless, real-time experience. You will drive the strategy for Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback / Direct Preference Optimization (RLHF/DPO), deciding when to prompt, fine-tune, or build new RAG pipelines. Additionally, you will manage the data engine by overseeing the sourcing, labeling, and cleaning of diverse datasets to improve model steerability and multicultural performance. You will design and train custom classifiers to detect and filter non-consensual or illegal content within an explicit environment, moving beyond simple binary safe/unsafe flags to create nuanced, context-aware moderation systems.

Undisclosed

()

Serbia
Maybe global
Remote

Want to see more ML Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for ML Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a ML Infrastructure Engineer do?","answer":"ML Infrastructure Engineers design, build, and maintain systems that support machine learning workflows from development to production. They create scalable platforms for model training and serving, implement distributed training systems, and develop monitoring solutions to track model performance. These engineers also build data pipelines, optimize ML systems for performance, and implement automated testing and deployment processes while collaborating with data scientists and researchers to productionize ML models."},{"question":"What skills are required for ML Infrastructure Engineer?","answer":"ML Infrastructure Engineers need strong programming skills in Python and sometimes Go, Rust, or C++. Proficiency with ML frameworks like PyTorch and TensorFlow is essential, alongside expertise in cloud platforms (AWS, GCP), containers (Docker), and orchestration (Kubernetes). They should understand distributed systems, data engineering concepts, and model serving techniques. Experience with infrastructure-as-code tools and monitoring systems rounds out the technical requirements, complemented by problem-solving abilities and collaboration skills."},{"question":"What qualifications are needed for ML Infrastructure Engineer role?","answer":"Most ML Infrastructure Engineer positions require a Bachelor's or Master's degree in Computer Science or related field, plus 4-5+ years of experience building production ML systems. Employers typically expect demonstrable experience with cloud platforms, containerization tools, and ML frameworks. Strong understanding of system-level software, machine learning concepts, and resource utilization is necessary. Experience with distributed systems and high-throughput workloads is highly valued, especially for senior positions."},{"question":"What is the salary range for ML Infrastructure Engineer job?","answer":"The research provided doesn't specify salary ranges for ML Infrastructure Engineer jobs. Compensation typically varies based on factors like location, company size, experience level, and specific technical expertise. Organizations like Anthropic, Scale AI, Apple, and other technology companies actively hiring for these positions likely offer competitive compensation packages reflecting the specialized nature of ML infrastructure skills and the current market demand."},{"question":"How long does it take to get hired as a ML Infrastructure Engineer?","answer":"The hiring timeline for ML Infrastructure Engineer positions isn't specified in the provided research. The process typically includes technical interviews focused on systems design, ML fundamentals, and programming skills. Given the specialized nature of the role, companies often conduct thorough evaluations of candidates' experience with production ML systems, distributed computing, and relevant technologies. The specialized requirements may extend the hiring process compared to more general engineering roles."},{"question":"Are ML Infrastructure Engineer job in demand?","answer":"Yes, ML Infrastructure Engineer jobs show strong demand based on active openings at major companies like DataXight, Scale AI, Anthropic, Apple, and Character.AI. The field is growing particularly in specialized areas such as LLM serving infrastructure, on-device ML optimization, and safety-critical ML systems. These positions are distributed across major tech hubs with opportunities ranging from mid-level to senior roles, reflecting industry's increasing need for engineers who can build reliable ML systems at scale."}]