Kubernetes AI Jobs

Discover the latest remote and onsite Kubernetes AI roles across top active AI companies. Updated hourly.

Check out 301 new Kubernetes AI roles opportunities posted on The Homebase

Platform Engineer, Forward Deployed Engineering

New
Top rated
OpenAI
Full-time
Full-time
Posted

The Platform Incubation Engineer role within Forward Deployed Engineering (FDE) involves architecting and building new platform capabilities by turning frontier customer signals into concrete designs, implementations, and APIs. Responsibilities include incubating platform bets end-to-end by forming hypotheses, shipping initial capabilities, and iterating based on real usage feedback. The engineer will embed with design partners to conduct technical discovery and translate needs into product and platform requirements, partner with customer-tagged FDEs to deploy, debug, capture repeatable patterns, and improve the platform based on field learnings. They will also design and run pilot programs, collaborate closely with core product and engineering teams to align architecture and production efforts, and drive adoption outcomes by measuring usage, identifying blockers and failure modes, and prioritizing platform increments to unlock repeatable value.

$230,000 – $385,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Hybrid
Python
Go
Docker
Kubernetes
CI/CD

Software Engineer, Agent - Healthcare

New
Top rated
Sierra
Full-time
Full-time
Posted

Design and deliver production-grade AI agents for healthcare that handle sensitive patient and member interactions while maintaining strict HIPAA compliance. Drive the Agent Development Life Cycle with ownership from pilot through deployment and iteration. Partner with healthcare leaders to understand challenges and build AI agents that address operational needs. Develop expertise in healthcare systems, workflows, and data/privacy standards to create trustworthy AI experiences. Guide and contribute to the evolution of Sierra's core platform based on customer feedback. Examples of projects include building AI agents for insurance networks, providers, primary and urgent care clinics, and healthcare financial platforms, as well as experimenting with voice models for secure interactions.

$180,000 – $390,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Go
TypeScript
React
Prompt Engineering

Software Engineer, Agent - Healthcare

New
Top rated
Sierra
Full-time
Full-time
Posted

Design and deliver production-grade AI agents for healthcare that handle sensitive patient and member interactions while maintaining strict HIPAA compliance. Build and ship highly performant, reliable, and empathetic AI agents that help with understanding coverage, finding providers, scheduling appointments, navigating billing, and more. Have complete ownership and autonomy over the Agent Development Life Cycle from pilot through deployment and continuous iteration, building, tuning, and evolving AI agents in production environments serving healthcare payers, providers, and platforms. Work directly with healthcare leaders including executives and technical teams at health plans, provider networks, and healthcare technology companies to understand and solve their most pressing challenges. Develop deep expertise in healthcare systems and workflows, including integrations across EHR and patient access platforms, payer and provider operations, and healthcare data and interoperability standards. Translate complex healthcare knowledge into trustworthy AI experiences. Use customer insights to guide the evolution of Sierra's core platform by surfacing unmet needs, prototyping new tools and features, and collaborating with research, product, and platform teams to shape the future of AI agent development in healthcare.

$180,000 – $390,000
Undisclosed
YEAR

(USD)

New York, United States
Maybe global
Onsite
Python
JavaScript
TypeScript
Go
Prompt Engineering

Senior ML Platform Engineer (Autonomous Driving)

New
Top rated
42dot
Full-time
Full-time
Posted

Set technical strategy and oversee development of a high scale, reliable data platform to manage, visualize and serve large-scale datasets for machine learning model training and validation. Build the data lakehouse for autonomous driving scene datasets, including sensor data, calibration data, and annotation data. Drive development of the Autonomous Driving Data SDK, including scene data search, datasets preparation, and dataset loading. Investigate and resolve performance bottlenecks throughout the data processing pipelines, including data processing latency, data search latency, and Test Procedure coverage. Bootstrap and maintain infrastructure for Data Platform components such as Data Processing Pipeline, Database, Data Lakehouse, and Data Serving. Collaborate with cross-functional teams including ML algorithm, ML application, and Cloud Infrastructure to align ML Platforms with overall Autonomous Driving System Architecture.

Undisclosed

()

Sunnyvale or San Francisco, United States
Maybe global
Onsite
Python
PyTorch
TensorFlow
Data Pipelines
MLOps

Backend Engineer, AI

New
Top rated
Bjak
Full-time
Full-time
Posted

Build and operate backend systems that serve AI-powered features in production; design inference pipelines, orchestration layers, and service boundaries around models; own production concerns including monitoring, logging, alerting, and incident response; optimize latency and throughput across inference, caching, batching, and streaming.

Undisclosed

()

Beijing, China
Maybe global
Remote
Python
PyTorch
OpenAI API
Docker
Kubernetes

Backend Engineer, AI

New
Top rated
Bjak
Full-time
Full-time
Posted

Build and operate backend systems that serve AI-powered features in production. Design inference pipelines, orchestration layers, and service boundaries around models. Own production concerns including monitoring, logging, alerting, and incident response. Optimize latency and throughput across inference, caching, batching, and streaming.

Undisclosed

()

New York, United States
Maybe global
Remote
Python
PyTorch
OpenAI API
Docker
Kubernetes

Solutions Architect

New
Top rated
LangChain
Full-time
Full-time
Posted

The Solutions Architect is responsible for designing scalable, highly-available infrastructure for AI platform deployments including compute, storage, networking, security, enterprise integration patterns, Infrastructure as Code (Terraform, Helm), multi-region HA/DR strategies, and CI/CD pipelines. They design multi-agent systems using different patterns, implement agent logic with frameworks like langchain/langgraph, design evaluation frameworks, optimize prompts with A/B testing, and guide deployment and operations. The role involves leading technical maturity assessments, working directly with enterprise customers to understand requirements and provide recommendations, and partnering with Engagement Managers and Product/Engineering teams. Responsibilities combine software development, infrastructure/platform engineering, and customer-facing skills focusing on Kubernetes cluster design to multi-agent system architecture to solve real business problems.

Undisclosed

()

London, United Kingdom
Maybe global
Hybrid
Python
TypeScript
Kubernetes
Terraform
CI/CD

Software Engineer, AI Agent

New
Top rated
Resolve AI
Full-time
Full-time
Posted

Own customer outcomes end to end by working directly with customers, design partners, and internal stakeholders to define technical scope, success criteria, and delivery milestones, then building and shipping the solution. Design and implement features across the full stack with a focus on solving real problems observed in production environments. Integrate deeply with customer environments by working hands-on with cloud platforms, observability systems, CI/CD pipelines, and incident response workflows to ensure product fit. Diagnose and resolve complex issues across customer deployments, turning support interactions into product insights and durable fixes. Build evaluations and feedback loops to quantify customer value and ensure new capabilities are genuinely impactful. Write clean, maintainable, well-tested code, lead design discussions and code reviews, and help shape the technical direction of the product and the engineering culture of the team.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite
Python
Kubernetes
AWS
CI/CD
Docker

Engineering Manager, Product Engineering

New
Top rated
Harvey
Full-time
Full-time
Posted

The Engineering Manager, Product at Harvey is responsible for owning end-to-end delivery of core product initiatives, from technical design through execution and iteration, while managing a high-performing fullstack engineering team. This includes setting technical direction for large-scale, AI-powered systems such as retrieval over petabyte-scale document collections, product interfaces for AI collaboration, long-horizon planning agents for critical workflows, government-grade security for sensitive data, evaluation of LLMs across extensive taxonomies, and internet-scale data collection across multiple jurisdictions. They must translate product vision into architecture balancing speed, quality, and scalability, lead hands-on contributions to design, code, and architecture reviews, and actively engage in implementation to unblock the team or solve difficult problems. Additionally, they build and grow the team by hiring engineers, setting technical and behavioral standards, and mentoring for career development. The role involves close partnership with Product, Design, and AI teams to identify opportunities and deliver intuitive user experiences, establishing an engineering culture focused on simplicity, ownership, craftsmanship, and continuous improvement, and aligning execution with company goals to support product strategy and long-term impact.

Undisclosed

()

Toronto, Canada
Maybe global
Hybrid
Python
JavaScript
Go
Docker
Kubernetes

AI Research Engineer - ML Engineering

New
Top rated
helsing
Full-time
Full-time
Posted

You will develop ML/AI that leverage and extend the latest state-of-the-art methods and architectures, design experiments and conduct benchmarks to evaluate and improve their performance in real-world scenarios, collaborate with people across several teams and backgrounds to integrate cutting edge ML/AI in production systems, and work on AI-based capabilities and enabling infrastructure to allow semi-autonomous platforms to localise, navigate, and perceive the world in real time.

Undisclosed

()

Berlin or London or Munich or Paris
Maybe global
Onsite
Python
PyTorch
TensorFlow
Reinforcement Learning
MLOps

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are Kubernetes AI jobs?","answer":"Kubernetes AI jobs involve orchestrating containerized machine learning applications at scale. Professionals in these roles manage container deployment for AI workloads, distribute computational tasks across nodes for model training, allocate GPU resources efficiently, and automate ML pipelines. They typically work with frameworks like TensorFlow and PyTorch while ensuring high availability for production AI systems through automated scaling and self-healing capabilities."},{"question":"What roles commonly require Kubernetes skills?","answer":"Roles requiring Kubernetes skills include Machine Learning Engineers who deploy models to production, MLOps Engineers working with platforms like Kubeflow, Data Engineers managing processing pipelines, Platform Engineers supporting agentic AI applications, DevOps/SRE professionals handling containerized deployments, and Cloud Architects designing scalable environments. These positions typically involve maintaining infrastructure that supports the complete machine learning lifecycle."},{"question":"What skills are typically required alongside Kubernetes?","answer":"Alongside Kubernetes, employers typically look for container fundamentals (especially Docker), distributed systems knowledge, CI/CD pipeline experience, and cloud platform familiarity. Programming skills are essential for deployment scripts, while experience with ML frameworks like TensorFlow or PyTorch is valuable for AI-specific implementations. Understanding storage solutions, Kubernetes operators, and automated infrastructure management rounds out the typical skill requirements."},{"question":"What experience level do Kubernetes AI jobs usually require?","answer":"Kubernetes AI jobs typically require mid to senior-level experience. Employers look for professionals who understand containerization concepts, have worked with distributed systems, and can manage complex ML workflows. Prior exposure to cloud environments where Kubernetes runs is important. Candidates should demonstrate practical experience with CI/CD pipelines and familiarity with at least one major ML framework."},{"question":"What is the salary range for Kubernetes AI jobs?","answer":"Kubernetes AI jobs command competitive salaries due to the specialized intersection of container orchestration and machine learning skills. Compensation varies based on experience level, location, and specific industry. Roles requiring both strong AI expertise and Kubernetes infrastructure management typically offer premium compensation compared to general software engineering positions, reflecting the high market value of these combined skill sets."},{"question":"Are Kubernetes AI jobs in demand?","answer":"Kubernetes AI jobs are in high demand as organizations increasingly adopt containerized applications for machine learning workloads. The growth is driven by enterprises scaling their AI operations, edge computing applications, and the need for platform-agnostic infrastructure. Companies seek professionals who can manage the complexity of distributed ML systems, particularly for high-availability production environments and automated ML pipelines."},{"question":"What is the difference between Kubernetes and Docker in AI roles?","answer":"Docker creates containerized applications while Kubernetes orchestrates those containers at scale. In AI roles, Docker is used to package ML applications with their dependencies, while Kubernetes manages deployment across clusters, automates scaling during training, and handles resource allocation for GPUs. Docker provides consistency between environments, while Kubernetes adds critical production capabilities like load balancing, self-healing, and distributed computing for AI workloads."}]