AI Infrastructure Engineer Jobs

Discover the latest remote and onsite AI Infrastructure Engineer roles across top active AI companies. Updated hourly.

Check out 13 new AI Infrastructure Engineer opportunities posted on The Homebase

Software Engineer, macOS Core Product - Gilbert, USA

New
Top rated
Speechify
Full-time
Full-time
Posted

Work alongside machine learning researchers, engineers, and product managers to bring AI Voices to customers for diverse use cases. Deploy and operate the core machine learning inference workloads for the AI Voices serving pipeline. Introduce new techniques, tools, and architecture to improve the performance, latency, throughput, and efficiency of deployed models. Build tools to identify bottlenecks and sources of instability and design and implement solutions to address the highest priority issues.

$140,000 – $200,000
Undisclosed
YEAR

(USD)

Gilbert, United States
Maybe global
Remote

Manufacturing Engineer

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

San Jose, United States
Maybe global
Onsite

MCP & Tools Python Developer - Agent Evaluation Infrastructure

New
Top rated
Mindrift
Part-time
Full-time
Posted

Developing and maintaining MCP-compatible evaluation servers, implementing logic to check agent actions against scenario definitions, creating or extending tools that writers and QAs use to test agents, working closely with infrastructure engineers to ensure compatibility, and occasionally helping with test writing or debug sessions when needed.

$12 / hour
Undisclosed
HOUR

(USD)

Hyderabad, India
Maybe global
Remote

VP of Engineering – AI

New
Top rated
Bjak
Full-time
Full-time
Posted

Build, scale, and uphold the technical backbone of a global AI product by personally building and maintaining core AI infrastructure, designing model training, evaluation, and deployment pipelines, debugging and resolving production AI failures, reviewing and merging critical PRs, defining standards for model lifecycle and experimentation, designing org structure and hiring strategy, and aligning the AI roadmap with business goals. Lead by example through real systems and real code, shaping engineering culture, hiring strategy, and long-term technical direction, and act as the final technical decision-maker responsible for AI quality, reliability, and scalability end-to-end.

Undisclosed

()

Jakarta, Indonesia
Maybe global
Remote

VP of Engineering – AI

New
Top rated
Bjak
Full-time
Full-time
Posted

Build, scale, and uphold the technical backbone of a global AI product. Personally build critical AI systems while shaping engineering culture, hiring strategy, and long-term technical direction. Set the technical and cultural foundation of the AI organization. Own AI quality, reliability, and scalability end-to-end. Balance research ambition with real product delivery. Act as the final technical decision-maker. Personally build and maintain core AI infrastructure. Design model training, evaluation, and deployment pipelines. Debug and resolve production AI failures. Review and merge critical PRs. Define standards for model lifecycle and experimentation. Design org structure and hiring strategy. Align AI roadmap with business goals.

Undisclosed

()

Bandar Utama or Petaling Jaya, Malaysia
Maybe global
Onsite

Mechanical Engineer, Packaging Systems

New
Top rated
Figure AI
Full-time
Full-time
Posted

$150,000 – $350,000 / year
Undisclosed
YEAR

(USD)

San Jose, United States
Maybe global
Onsite

Helix Data Creator

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

Los Angeles
Maybe global
Onsite

Senior Platform/DevOps Engineer (Kubernetes-Linux-Azure Local)

New
Top rated
Armada
Full-time
Full-time
Posted

Translating business requirements into requirements for AI/ML models. Preparing data to train and evaluate AI/ML/DL models. Building AI/ML/DL models by applying state-of-the-art algorithms, especially transformers, sometimes leveraging existing algorithms from academic or industrial research. Testing, evaluating AI/ML/DL models, benchmarking their quality, and publishing the models, data sets, and evaluations. Deploying models in production by containerizing them. Working with customers and internal employees to refine the quality of the models. Establishing continuous learning pipelines for models with online learning or transfer learning. Building and deploying containerized applications on cloud or on-premise environments.

$134,400 – $168,000
Undisclosed
YEAR

(USD)

Bellevue, United States
Maybe global
Onsite

QA Engineer (Agents)

New
Top rated
Sana
Full-time
Full-time
Posted

Design and implement test plans for agent infrastructure, LLM-based APIs, and end-to-end user journeys. Build and maintain automated test suites for backend, frontend, and integration layers, including prompt and response validation for generative models. Develop tools and frameworks to accelerate testing and catch regressions early, especially in agent reasoning, tool use, and context handling. Collaborate closely with engineers to embed quality into every stage of development, focusing on the unique challenges of AI/LLM systems such as non-determinism, hallucinations, and safety. Lead root cause analysis and drive resolution for critical issues and incidents, including those arising from model updates or agent behaviors. Advocate for best practices in code quality, observability, and CI/CD pipelines, ensuring quality signals are actionable and visible.

Undisclosed

()

Stockholm, Sweden
Maybe global
Onsite

Systems Architect - Active Safety

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

San Jose, United States
Maybe global
Onsite

Want to see more AI Infrastructure Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Infrastructure Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does a AI Infrastructure Engineer do?","answer":"AI Infrastructure Engineers design and build the systems that power machine learning workloads. They optimize performance by resolving bottlenecks, implement scaling solutions through load balancing and redundancy, and deploy cloud infrastructure specifically for AI applications. These specialists build fault-tolerant systems for serving large language models, maintain continuous integration pipelines, and collaborate with AI teams to translate research needs into production-ready infrastructure."},{"question":"What skills are required for AI Infrastructure Engineer?","answer":"Key skills for this role include proficiency with cloud platforms (AWS SageMaker, Azure ML, Vertex AI), infrastructure as code tools like Terraform, and containerization technologies such as Docker and Kubernetes. Strong programming abilities in Python, Go or C++ are essential, with CUDA knowledge for GPU optimization. Experience with monitoring tools (Prometheus, Grafana), distributed systems, deep learning frameworks, and Linux/UNIX environments is highly valued in candidates."},{"question":"What qualifications are needed for AI Infrastructure Engineer role?","answer":"Employers typically require a bachelor's degree in Computer Science, AI, Machine Learning, or related technical field. Most positions demand 4+ years of experience in cloud infrastructure, large-scale systems, or software engineering with an infrastructure focus. Practical expertise in cloud computing, Linux administration, network architecture, and container technologies is essential. Specialized knowledge in GPU programming, distributed systems, and LLM serving capabilities strengthens applications considerably."},{"question":"What is the salary range for AI Infrastructure Engineer job?","answer":"The research provided doesn't contain specific salary information for AI Infrastructure Engineers. Compensation typically varies based on location, experience level, company size, and the specific technical skills required. As this role combines specialized AI knowledge with infrastructure expertise, salaries generally reflect the high demand for professionals who can effectively build and optimize systems for machine learning workloads at scale."},{"question":"How long does it take to get hired as a AI Infrastructure Engineer?","answer":"The research doesn't provide specific hiring timeline information. The hiring process length varies by company and often includes technical assessments of cloud architecture knowledge, infrastructure as code experience, and machine learning operations skills. Given the specialized nature of AI infrastructure roles and their typical requirement of 4+ years of relevant experience, candidates should expect thorough evaluation of their technical capabilities and problem-solving abilities."},{"question":"Are AI Infrastructure Engineer job in demand?","answer":"Yes, AI Infrastructure Engineer positions show strong demand signals. Major companies like Accenture, Scale AI, and Zoom are actively recruiting for these specialized roles. The increasing deployment of large language models and AI applications across industries creates consistent need for professionals who can build optimized infrastructure. The specialized skill intersection of cloud platforms, containerization, GPU optimization, and machine learning operations makes qualified candidates particularly valuable in today's job market."}]