Kubernetes AI Jobs

Discover the latest remote and onsite Kubernetes AI roles across top active AI companies. Updated hourly.

Check out 301 new Kubernetes AI roles opportunities posted on The Homebase

Senior Software Engineer - Internal Tools & Productivity

New
Top rated
Scale AI
Full-time
Full-time
Posted

Translate research into product by working with client-side researchers on post-training, evaluations, safety/alignment, and building necessary primitives, data, and tooling. Partner closely with core customers and frontier AI labs to address complex technical problems related to model improvement, performance, and deployment. Shape and propose model improvement work through clearly defined technical proposals, scope of work, and execution plans. Own the end-to-end lifecycle of projects, including discovery, writing product requirement documents (PRDs) and technical specifications, prioritizing trade-offs, conducting experiments, shipping solutions, and scaling pilots into repeatable offerings. Lead high-stakes technical engagements by running working sessions with senior stakeholders, defining success metrics, identifying risks early, and driving programs to measurable outcomes. Collaborate across multiple departments such as research, platform, operations, security, and finance to deliver reliable production-grade results. Build rigorous evaluation frameworks, maintain data quality feedback loops, and share insights to improve technical execution across accounts.

$201,600 – $241,920
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Prompt Engineering
Model Evaluation
MLOps
MLflow

Site Reliability Engineer / DevOps

New
Top rated
Scale AI
Full-time
Full-time
Posted

The role involves translating AI research into product solutions by working with client-side researchers on post-training, evaluations, safety, and alignment, building the necessary primitives, data, and tooling. The engineer partners closely with leading AI teams and frontier research labs to solve complex technical problems related to model improvement, performance, and deployment, shaping and proposing technically rigorous model improvement work. Responsibilities include leading the end-to-end lifecycle from discovery to scalable pilots, conducting technical working sessions with senior stakeholders, defining success metrics, managing risks, and driving programs to measurable outcomes. The role requires collaboration with research, platform, operations, security, and finance teams to deliver reliable, production-grade solutions. Additionally, the engineer designs and establishes robust evaluation frameworks, closes feedback loops on data quality, and shares best practices across accounts.

$201,600 – $241,920
Undisclosed
YEAR

(USD)

Mexico City, Mexico
Maybe global
Onsite
Python
MLflow
Docker
Kubernetes
AWS

Data Engineer | Power

New
Top rated
Gecko Robotics
Full-time
Full-time
Posted

As a Data Engineer, you will build and evolve the data backbone of an AI-first product including document intelligence, time-series IoT data, and agentic AI systems. You will design, implement, and operate data systems across the full lifecycle from raw ingestion to AI-driven outputs used by customers. You will work directly with customers and internal stakeholders to understand problems and translate them into technical solutions, iterating quickly. Responsibilities include building pipelines that support document processing, sensor data, and ML workflows, contributing to feature engineering and model experimentation when needed, and owning systems in production. You will make architectural decisions, improve system reliability over time, and help define best practices as the team and product scale.

$154,000 – $204,000
Undisclosed
YEAR

(USD)

New York, United States
Maybe global
Onsite
Python
MLflow
Docker
Kubernetes
GCP

Senior Software Engineer, Connectivity

New
Top rated
Scale AI
Full-time
Full-time
Posted

The role involves partnering closely with ML teams and AI research teams to translate research needs related to post-training, evaluations, safety/alignment into clear product roadmaps and measurable outcomes. Responsibilities include working hands-on with leading AI teams and frontier research labs to tackle technical problems in model improvement and deployment, shaping and proposing model improvement work by translating objectives into well-defined statements of work and execution plans, and collaborating on designing data, primitives, and tooling required to improve frontier models in practice. The position also requires owning the end-to-end lifecycle of projects, including discovery, writing PRDs and technical specs, prioritizing trade-offs, running experiments, shipping initial solutions, and scaling successful pilots into repeatable offerings. Leading complex, high-stakes engagements by running technical working sessions with senior stakeholders, defining success metrics, surfacing risks early, and driving programs to measurable outcomes is part of the role. Additionally, the role requires partnering closely across research, platform, operations, security, and finance to deliver production-grade results for demanding customers and building rigorous evaluation frameworks such as benchmarks and RLVR to improve technical execution across accounts.

$201,600 – $241,920
Undisclosed
YEAR

(USD)

San Francisco or New York, United States
Maybe global
Onsite
Python
Prompt Engineering
Model Evaluation
MLOps
MLflow

Software Engineer - Sensing, Consumer Products

New
Top rated
OpenAI
Full-time
Full-time
Posted

As a Software Engineer on Consumer Products Research, the responsibilities include building and shipping production software for sensing algorithms by translating algorithm prototypes into reliable end-to-end systems, implementing and owning key parts of the Python shipping pipeline including integration surfaces, evaluation hooks, and quality/performance guardrails. The role also involves developing embedded/on-device software in an RTOS environment (such as Zephyr) and deploying models to device runtimes and hardware accelerators. Additional responsibilities include optimizing real-time on-device perception loops for stability, latency, power, and memory constraints, creating data collection and instrumentation tooling to bring up new sensing modalities and accelerate iteration from prototype to dataset to model to device, and partnering cross-functionally with algorithms, human data, firmware/hardware teams to debug, profile, and harden systems against real-world variability.

$325,000 – $325,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Hybrid
Python
C++
Docker
Kubernetes
CI/CD

Production Engineer - Maritime

New
Top rated
helsing
Full-time
Full-time
Posted

The role involves developing machine learning and artificial intelligence systems by leveraging and extending state-of-the-art methods and architectures, designing experiments, and conducting benchmarks to evaluate and improve AI performance in real-world scenarios. The candidate will participate in impactful projects and collaborate with multiple teams and backgrounds to integrate cutting-edge ML/AI into production systems. Responsibilities also include ensuring AI software is deployed to production with proper testing, quality assurance, and monitoring.

Undisclosed

()

Plymouth
Maybe global
Onsite
Python
PyTorch
TensorFlow
Reinforcement Learning
MLOps

Senior Software Engineer, ML Core

New
Top rated
Zoox
Full-time
Full-time
Posted

Design, develop, and deploy custom and off-the-shelf ML libraries and toolings to improve ML development, training, deployment, and on-vehicle model inference latency. Build tooling and establish development best practices to manage and upgrade foundational libraries such as Nvidia driver, PyTorch, TensorRT, to improve ML developer experience and expedite debugging efforts. Collaborate closely with cross-functional teams including applied ML research, high-performance compute, advanced hardware engineering, and data science to define requirements and align on architectural decisions. Work across multiple ML teams within Zoox, supporting in- and off-vehicle ML use cases and coordinating to meet the needs of vehicle and ML teams to reduce the time from ideation to productionization of AI innovations.

$214,000 – $290,000
Undisclosed
YEAR

(USD)

Foster City, United States
Maybe global
Onsite
Python
C++
PyTorch
TensorFlow
JAX

Marketing Intern - Seoul

New
Top rated
Dataiku
Intern
Full-time
Posted

Help users discover and master the Dataiku platform through user training, office hours, demos, and ongoing consultative support. Analyse and investigate various kinds of data and machine learning applications across industries and use cases. Provide strategic input to the customer and account teams that help our customers achieve success. Scope and co-develop production-level data science projects with our customers. Mentor and help educate data scientists and other customer team members to aid in career development and growth.

Undisclosed

()

Seoul, South Korea
Maybe global
Hybrid
Python
JavaScript
SQL
PySpark
Machine Learning

Marketing Intern - Tokyo

New
Top rated
Dataiku
Intern
Full-time
Posted

Help users discover and master the Dataiku platform through user training, office hours, demos, and ongoing consultative support. Analyse and investigate various kinds of data and machine learning applications across industries and use cases. Provide strategic input to the customer and account teams that help our customers achieve success. Scope and co-develop production-level data science projects with our customers. Mentor and help educate data scientists and other customer team members to aid in career development and growth.

Undisclosed

()

Tokyo, Japan
Maybe global
Hybrid
Python
JavaScript
SQL
PyTorch
Spark

Engineering Manager - Engine and Platform

New
Top rated
Arcade.dev
Full-time
Full-time
Posted

The Engineering Manager for the Engine and Platform leads the team responsible for building, maintaining, and deploying the runtime for customers to run, manage, secure, and understand AI tools, enabling advanced agentic use-cases. This role involves scaling the team owning the development of the platform and services, which includes distributed systems engineers and authorization/identity experts developing features like MCP gateways, roles and permissions, and platform-as-service capabilities for tool executions. The manager ensures the team is unblocked, aligns the team's work with the product organization, and stays technically engaged through code reviews, critical contributions, and occasional hands-on coding. Responsibilities include owning deliverables, stability, and uptime, shaping product vision and architecture, owning technical direction and prioritization, hiring and mentoring engineers, defining and delivering platform features, and ensuring reliability, security, and enterprise readiness. The manager also focuses on building leverage into systems through automation and agents to improve efficiency and is expected to navigate ambiguity and evolving standards in AI tools.

$200,000 – $275,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Go
TypeScript
Python
CI/CD
Docker

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are Kubernetes AI jobs?","answer":"Kubernetes AI jobs involve orchestrating containerized machine learning applications at scale. Professionals in these roles manage container deployment for AI workloads, distribute computational tasks across nodes for model training, allocate GPU resources efficiently, and automate ML pipelines. They typically work with frameworks like TensorFlow and PyTorch while ensuring high availability for production AI systems through automated scaling and self-healing capabilities."},{"question":"What roles commonly require Kubernetes skills?","answer":"Roles requiring Kubernetes skills include Machine Learning Engineers who deploy models to production, MLOps Engineers working with platforms like Kubeflow, Data Engineers managing processing pipelines, Platform Engineers supporting agentic AI applications, DevOps/SRE professionals handling containerized deployments, and Cloud Architects designing scalable environments. These positions typically involve maintaining infrastructure that supports the complete machine learning lifecycle."},{"question":"What skills are typically required alongside Kubernetes?","answer":"Alongside Kubernetes, employers typically look for container fundamentals (especially Docker), distributed systems knowledge, CI/CD pipeline experience, and cloud platform familiarity. Programming skills are essential for deployment scripts, while experience with ML frameworks like TensorFlow or PyTorch is valuable for AI-specific implementations. Understanding storage solutions, Kubernetes operators, and automated infrastructure management rounds out the typical skill requirements."},{"question":"What experience level do Kubernetes AI jobs usually require?","answer":"Kubernetes AI jobs typically require mid to senior-level experience. Employers look for professionals who understand containerization concepts, have worked with distributed systems, and can manage complex ML workflows. Prior exposure to cloud environments where Kubernetes runs is important. Candidates should demonstrate practical experience with CI/CD pipelines and familiarity with at least one major ML framework."},{"question":"What is the salary range for Kubernetes AI jobs?","answer":"Kubernetes AI jobs command competitive salaries due to the specialized intersection of container orchestration and machine learning skills. Compensation varies based on experience level, location, and specific industry. Roles requiring both strong AI expertise and Kubernetes infrastructure management typically offer premium compensation compared to general software engineering positions, reflecting the high market value of these combined skill sets."},{"question":"Are Kubernetes AI jobs in demand?","answer":"Kubernetes AI jobs are in high demand as organizations increasingly adopt containerized applications for machine learning workloads. The growth is driven by enterprises scaling their AI operations, edge computing applications, and the need for platform-agnostic infrastructure. Companies seek professionals who can manage the complexity of distributed ML systems, particularly for high-availability production environments and automated ML pipelines."},{"question":"What is the difference between Kubernetes and Docker in AI roles?","answer":"Docker creates containerized applications while Kubernetes orchestrates those containers at scale. In AI roles, Docker is used to package ML applications with their dependencies, while Kubernetes manages deployment across clusters, automates scaling during training, and handles resource allocation for GPUs. Docker provides consistency between environments, while Kubernetes adds critical production capabilities like load balancing, self-healing, and distributed computing for AI workloads."}]