Kubernetes AI Jobs

Discover the latest remote and onsite Kubernetes AI roles across top active AI companies. Updated hourly.

Check out 301 new Kubernetes AI roles opportunities posted on The Homebase

Senior Software Engineering Director, Developer Experience

New
Top rated
Crusoe
Full-time
Full-time
Posted

As the Senior Director of Engineering for Developer Experience at Crusoe, you will own and drive the strategy, execution, and culture of the team responsible for how Crusoe's engineers and non-engineers build, ship, and operate software. Responsibilities include defining and executing the long-term vision for Crusoe's internal developer platform, which encompasses shared services, internal APIs, repositories, and self-service infrastructure to enable engineering teams to move quickly and confidently. You will also rapidly develop and productionize AI-powered tools for the entire company, creating and evangelizing best practices for productionizing AI-developed tools and evaluating SaaS purchases. Additionally, you will oversee the design, reliability, and continuous improvement of CI/CD pipelines, build systems, and deployment infrastructure to ensure safe and rapid scaling of engineering teams' shipping processes. Your role will also involve defining and driving organization-wide engineering productivity initiatives by establishing metrics, identifying bottlenecks, and implementing tooling and process improvements that enhance developer experience across Crusoe. People leadership is a key responsibility, including managing and growing a team of engineers and fostering a high-performance culture based on accountability, innovation, and continuous learning. Furthermore, you will collaborate with senior leaders across Engineering, Infrastructure, Security, and Product to align Developer Experience investments with company-wide engineering goals and priorities.

$301,750 – $355,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
JavaScript
CI/CD
Docker
Kubernetes

Deployed Engineer (Toronto)

New
Top rated
LangChain
Full-time
Full-time
Posted

The Deployed Engineer will co-architect and co-build production AI agents with customer engineering teams, own the technical win in pre-sales by designing POCs, answering deep technical questions, and guiding evaluations, help customers deploy and operate agent-based applications including conversational agents, research agents, and multi-step workflows, and advise customers post-sale on architecture, best practices, and roadmap-level decisions. They will also run technical demos, trainings, and workshops for developer audiences, surface field feedback, contribute reusable patterns, cookbooks, and example code that scale across customers, and occasionally contribute code upstream when it meaningfully improves customer outcomes.

Undisclosed

()

Toronto, Canada
Maybe global
Remote
Python
JavaScript
Prompt Engineering
MLOps
AWS

Software Engineering Manager, Autonomous

New
Top rated
Magical
Full-time
Full-time
Posted

As the Engineering Manager on the Autonomous team, you will lead and scale a high-calibre team of engineers dedicated to defining the future of AI agent development and advancing AI and backend systems. You will oversee the technical roadmap for the Autonomous team, translating architectural complexity into clear product strategies. You will mentor a diverse group of engineers, supporting their professional growth. You will partner closely with Product and Design to ensure the agent-building tools remain intuitive while supporting technical capabilities. You will champion a 'show > tell' culture by ensuring rapid shipping with a high standard for technical stability and user experience. You will clear technical and operational roadblocks to ensure the team operates with high agency and clarity.

Undisclosed

()

San Francisco, United States
Maybe global
Hybrid
Python
Kubernetes
Docker
AWS
CI/CD

Software Engineering Manager, Autonomous

New
Top rated
Magical
Full-time
Full-time
Posted

As an Engineering Manager on the Autonomous team, you will lead and scale a high-calibre team of engineers dedicated to defining the future of AI agent development and advancing AI and backend systems. You will oversee the technical roadmap for the team by translating architectural complexity into clear product strategies, mentor and support the professional growth of a diverse group of engineers, and partner closely with Product and Design to ensure the agent-building tools remain intuitive and technically robust. You will champion a "show > tell" culture to ensure rapid shipping while maintaining high technical stability and user experience standards, and clear technical and operational roadblocks to enable the team to operate with high agency and clarity. You will act as the bridge between product vision and technical execution.

Undisclosed

()

Toronto, Canada
Maybe global
Hybrid
Python
Docker
Kubernetes
AWS
CI/CD

Senior Forward-Deployed Engineer, Federal

New
Top rated
Deepgram
Full-time
Full-time
Posted

The Senior Forward-Deployed Engineer, Federal at Deepgram is responsible for owning technical delivery across federal deployments from initial prototype to stable production. They embed deeply with federal customers to design and build mission-critical applications using Deepgram's Voice AI models, lead technical discovery and solution design for federal prospects and customers, and prototype and build full-stack integrations with technologies such as Python, JavaScript, or Rust. They enable successful deployments by delivering observable systems spanning infrastructure through applications and proactively guide federal stakeholders on platform operational value, including performance optimization and deployment strategies. Responsibilities include scoping work, sequencing delivery, removing blockers, managing relationships with customer leadership and technical stakeholders, contributing to code when necessary, codifying working patterns into reusable tools and playbooks, sharing field feedback with Product and Engineering, serving as an escalation point for technical issues, and analyzing deployment patterns to inform product and go-to-market strategies. The role involves significant collaboration internally and externally, including technical engagements pre-sales, building reusable solutions, and contributing to Applied Engineering strategy.

$160,000 – $200,000
Undisclosed
YEAR

(USD)

Washington D.C., United States
Maybe global
Remote
Python
JavaScript
NLP
Kubernetes
AWS

Senior Engineering Manager, Handshake AI

New
Top rated
Handshake
Full-time
Full-time
Posted

The Senior Engineering Manager leads a core product and platform engineering team building systems that integrate human expertise into AI development workflows. This team owns critical infrastructure connecting talent networks, data operations, and research needs into scalable, reliable, and high-quality platforms. The manager leads, hires, and develops a high-performing engineering team, owns the roadmap and execution in partnership with Product, Research, and Operations, drives architecture and technical strategy for scalable and extensible systems, builds modular platforms for new domains, workflows, and partners, raises engineering quality across reliability, observability, performance, and data integrity, and fosters a culture of ownership, velocity, and strong engineering fundamentals in a fast-moving environment.

$230,000 – $300,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Docker
Kubernetes
AWS
MLOps

Lead AI/ML Engineer

New
Top rated
ASAPP
Full-time
Full-time
Posted

Lead the design and implementation of scalable ML/AI systems focused on large language models, vector databases, and retrieval-based architectures. Integrate and apply foundation models from providers like OpenAI, AWS Bedrock, and Anthropic for prototyping and production use cases. Adapt, evaluate, and optimize large language models for domain-specific enterprise applications. Build and maintain infrastructure for AI model experimentation, deployment, and monitoring in production. Improve model performance and inference workflows addressing latency, cost, and reliability. Provide technical leadership by mentoring engineers and promoting best ML engineering practices. Partner with product and cross-functional stakeholders to translate requirements into scalable ML solutions. Contribute to the evolution of internal standards for AI experimentation, evaluation, and deployment. Lead the design and delivery of end-to-end voice AI solutions combining large language models with speech technologies including speech-to-text, text-to-speech, and real-time streaming audio pipelines, architecting low-latency, highly reliable conversational voice systems and guiding a team through ambiguity toward production excellence. Understand and apply constraints of voice experiences such as latency, turn-taking, interruption handling, streaming inference, and audio quality to create scalable, enterprise-grade systems.

$170,000 – $190,000
Undisclosed
YEAR

(USD)

New York or Mountain View, United States
Maybe global
Hybrid
Python
PyTorch
TensorFlow
OpenAI API
RAG

Senior Product Designer, Mobile

New
Top rated
Grammarly
Full-time
Full-time
Posted

Own the observability and lifecycle management of AI features across the organization. Build tools and infrastructure to enable teams to develop, monitor, and optimize LLM-powered features. Design and implement closed-loop evaluation pipelines that automatically validate prompt changes. Develop comprehensive metrics and dashboards to track LLM usage including cost per feature, token patterns, and latency. Create systems that tie user feedback to specific prompts and LLM calls. Establish best practices and processes for the full lifecycle of prompts including development, testing, deployment, and monitoring. Collaborate with engineering teams to ensure they have the tools and visibility needed to build high-quality AI features.

$103,000 – $128,000
Undisclosed
YEAR

(USD)

North America
Maybe global
Remote
Go
Kubernetes
Google Cloud
OpenAI API
Prompt Engineering

Software Engineer, Inference Platform

New
Top rated
Fluidstack
Full-time
Full-time
Posted

The Software Engineer for the Inference Platform at Fluidstack will own inference deployments end-to-end, including initial configuration, performance tuning, production SLA maintenance, and incident response. They will drive measurable improvements in throughput, time-to-first-token (TTFT), and cost-per-token across diverse model families and customer workload patterns. Responsibilities include building and operating key-value (KV) cache and scheduling infrastructure to maximize utilization across concurrent requests, implementing and validating disaggregated prefill/decode pipelines, and managing Kubernetes-based orchestration at scale. The role requires profiling and resolving bottlenecks at compute, memory, and communication layers, instrumenting deployments for end-to-end observability, partnering with customers to translate model architectures, access patterns, and latency requirements into deployment configurations, and contributing to the inference platform architecture and roadmap focused on reducing deployment complexity, improving hardware utilization, and expanding support for new model classes and accelerators. Additionally, participation in an on-call rotation (up to one week per month) to maintain reliability and SLA commitments of production deployments is required.

$165,000 – $500,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Go
PyTorch
JAX
Kubernetes

Head of Product, AI

New
Top rated
Bjak
Full-time
Full-time
Posted

Own the end-to-end AI product strategy grounded in technical feasibility and real-world constraints, translate model capabilities, data limitations, and evaluation results into clear product decisions, make trade-offs across quality, latency, cost, reliability, and user experience, work daily with ML, backend, and mobile engineers on design, evaluation, and iteration, define success metrics and feedback loops across offline evaluation, online experiments, and human feedback, drive execution with clear specifications, risk awareness, and disciplined prioritization, ensure AI features ship quickly, safely, and reliably into production, and own AI product quality across UX, correctness, and outcomes.

Undisclosed

()

Jakarta, Indonesia
Maybe global
Remote
Python
MLflow
Model Evaluation
Prompt Engineering
MLOps

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are Kubernetes AI jobs?","answer":"Kubernetes AI jobs involve orchestrating containerized machine learning applications at scale. Professionals in these roles manage container deployment for AI workloads, distribute computational tasks across nodes for model training, allocate GPU resources efficiently, and automate ML pipelines. They typically work with frameworks like TensorFlow and PyTorch while ensuring high availability for production AI systems through automated scaling and self-healing capabilities."},{"question":"What roles commonly require Kubernetes skills?","answer":"Roles requiring Kubernetes skills include Machine Learning Engineers who deploy models to production, MLOps Engineers working with platforms like Kubeflow, Data Engineers managing processing pipelines, Platform Engineers supporting agentic AI applications, DevOps/SRE professionals handling containerized deployments, and Cloud Architects designing scalable environments. These positions typically involve maintaining infrastructure that supports the complete machine learning lifecycle."},{"question":"What skills are typically required alongside Kubernetes?","answer":"Alongside Kubernetes, employers typically look for container fundamentals (especially Docker), distributed systems knowledge, CI/CD pipeline experience, and cloud platform familiarity. Programming skills are essential for deployment scripts, while experience with ML frameworks like TensorFlow or PyTorch is valuable for AI-specific implementations. Understanding storage solutions, Kubernetes operators, and automated infrastructure management rounds out the typical skill requirements."},{"question":"What experience level do Kubernetes AI jobs usually require?","answer":"Kubernetes AI jobs typically require mid to senior-level experience. Employers look for professionals who understand containerization concepts, have worked with distributed systems, and can manage complex ML workflows. Prior exposure to cloud environments where Kubernetes runs is important. Candidates should demonstrate practical experience with CI/CD pipelines and familiarity with at least one major ML framework."},{"question":"What is the salary range for Kubernetes AI jobs?","answer":"Kubernetes AI jobs command competitive salaries due to the specialized intersection of container orchestration and machine learning skills. Compensation varies based on experience level, location, and specific industry. Roles requiring both strong AI expertise and Kubernetes infrastructure management typically offer premium compensation compared to general software engineering positions, reflecting the high market value of these combined skill sets."},{"question":"Are Kubernetes AI jobs in demand?","answer":"Kubernetes AI jobs are in high demand as organizations increasingly adopt containerized applications for machine learning workloads. The growth is driven by enterprises scaling their AI operations, edge computing applications, and the need for platform-agnostic infrastructure. Companies seek professionals who can manage the complexity of distributed ML systems, particularly for high-availability production environments and automated ML pipelines."},{"question":"What is the difference between Kubernetes and Docker in AI roles?","answer":"Docker creates containerized applications while Kubernetes orchestrates those containers at scale. In AI roles, Docker is used to package ML applications with their dependencies, while Kubernetes manages deployment across clusters, automates scaling during training, and handles resource allocation for GPUs. Docker provides consistency between environments, while Kubernetes adds critical production capabilities like load balancing, self-healing, and distributed computing for AI workloads."}]