Kubernetes AI Jobs

Discover the latest remote and onsite Kubernetes AI roles across top active AI companies. Updated hourly.

Join our AI community Interested in Hiring?

Hiring by

Check out 301 new Kubernetes AI roles opportunities posted on The Homebase

View detail

Principal Data Scientist

New

Top rated

PhysicsX

–

Full-time

–

Posted

Jan 12, 2026 5:15

Take part in building a platform used by Data Scientists and Simulation Engineers to build, train and deploy Deep Physics Models. Work on a focused, stream-aligned and cross-functional team (back-end, front-end, design) that is empowered to make its implementation decisions towards meeting its objectives. Gather and leverage domain knowledge and experience from the Data Scientists and Simulation Engineers using your product.

Undisclosed

()

Shoreditch, Singapore

Maybe global

Hybrid

Python

Docker

Kubernetes

CI/CD

View detail

Delivery Engineer

New

Top rated

PhysicsX

–

Full-time

–

Posted

Jan 12, 2026 5:15

Take part in building a platform used by Data Scientists and Simulation Engineers to build, train and deploy Deep Physics Models. Work on a focused, stream-aligned and cross-functional team (back-end, front-end, design) empowered to make its implementation decisions towards meeting its objectives. Gather and leverage domain knowledge and experience from the Data Scientists and Simulation Engineers using your product.

Undisclosed

()

Singapore

Maybe global

Hybrid

Python

Docker

Kubernetes

CI/CD

View detail

Software Engineer, ML Data Infrastructure

New

Top rated

Ideogram

–

Full-time

–

Posted

Jan 10, 2026 6:46

The Software Engineer, ML Data Infrastructure will collaborate with engineers to build AI design experiences, tackle complex technical challenges including scaling distributed systems, build robust data infrastructure for foundation models at petabyte scale ensuring reliability and performance across multi-modal training pipelines, optimize data processing workflows for massive throughput, work with distributed systems, TPU infrastructure, and large-scale storage solutions, and partner with research scientists to translate data requirements into production-grade systems that accelerate model development cycles.

Undisclosed

()

Toronto, Canada

Maybe global

Onsite

Python

Kubernetes

GCP

Docker

Data Pipelines

View detail

AI / ML Solutions Engineer

New

Top rated

Anyscale

–

Full-time

–

Posted

Jan 10, 2026 6:44

The AI / ML Solutions Engineer at Anyscale is responsible for designing, implementing, and scaling machine learning and AI workloads using Ray and Anyscale directly with customers. This includes implementing production AI / ML workloads such as distributed model training, scalable inference and serving, and data preprocessing and feature pipelines. The role involves working hands-on with customer codebases to refactor or adapt existing workloads to Ray. The engineer advises customers on ML system architecture including application design for distributed execution, resource management and scaling strategies, and reliability, fault tolerance, and performance tuning. They guide customers through architectural and operational changes needed to adopt Ray and Anyscale effectively. Additionally, the engineer partners with customer MLE and MLOps teams to integrate Ray into existing platforms and workflows, supports CI/CD, monitoring, retraining, and operational best practices, and helps customers transition from experimentation to production-grade ML systems. They also enable customer teams through working sessions, design reviews, training delivery, and hands-on guidance, contribute feedback to product, engineering, and education teams, and help develop reference architectures, examples, and best practices based on real customer use cases.

Undisclosed

()

Maybe global

Remote

Python

Kubernetes

AWS

GCP

MLflow

View detail

Software Engineer, Codex for Teams

New

Top rated

OpenAI

–

Full-time

–

Posted

Jan 10, 2026 6:30

As a Software Engineer on the Codex for Teams team, you will be responsible for shaping the evolution of Codex by identifying how teams actually use and sometimes break AI-powered software engineering tools, driving changes across product, infrastructure, and model behavior to make Codex a reliable teammate for organizations. You will build core team and enterprise primitives that enable Codex to scale, including role-based access control (RBAC), admin and audit surfaces, usage and rate limits, pricing controls, managed configuration and constraints, and analytics for deep visibility into Codex usage. You will design and own secure, observable, full-stack systems that power Codex across web, IDEs, CLI, and CI/CD environments, integrating with enterprise identity and governance systems (SSO/SAML/OIDC, SCIM, policy enforcement) and developing data-access patterns that are performant, compliant, and trustworthy. The role involves leading real-world deployments and launches by working directly with customers and the Go To Market team to roll out Codex, using live usage and operational feedback to rapidly iterate and improve the product and platform capabilities. This position owns systems end-to-end, from architecture and implementation to production operations, emphasizing quality and velocity.

$255,000 – $325,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

Python

Docker

Kubernetes

CI/CD

View detail

Solutions Engineer (AI/ML, Pre-Sales)

New

Top rated

DatologyAI

–

Full-time

–

Posted

Jan 10, 2026 6:26

The Solutions Engineer (AI/ML, Pre-Sales) will work closely with strategic customers to understand their data curation needs, business challenges, and technical requirements. The role involves leading end-to-end customer proofs of concept (PoCs) that connect data curation to training behavior and evaluation outcomes, including dataset analysis, training plan design, and interpreting results. They will partner with customer machine learning teams to map data and curation strategies, design and execute evaluation plans for base and post-trained models, select appropriate benchmarks and metrics, and run model evaluations. Additionally, the engineer will produce customer-ready evaluation reports detailing methodology, metrics, baselines, ablations (e.g., curated vs raw data), conclusions, and recommendations for productionization. They must communicate technical results effectively to both ML experts and executive stakeholders, explaining tradeoffs in compute, latency, and deployment cost. Collaboration with go-to-market, engineering, and research teams is essential to deliver compelling demos, align on requirements, and incorporate customer insights into model training and product strategies. The role also includes providing technical guidance, training, and documentation to enable prospects to confidently assess the solution.

$230,000 – $300,000

Undisclosed

YEAR

(USD)

Redwood City, United States

Maybe global

Onsite

Python

PyTorch

Hugging Face

Distributed Training

Cloud Platforms

View detail

Senior Software Engineer, Applied AI

New

Top rated

Lumi AI

–

Full-time

–

Posted

Jan 10, 2026 3:34

As a Software Engineer working on AI systems, responsibilities include playing a foundational role in research, experimentation, and rapid improvement of AI systems to build a capable, reliable AI automation platform used worldwide in mission critical production environments. Tasks involve designing experiments and testing ideas to optimize key internal AI benchmarks, designing and improving evaluation frameworks to accelerate experimentation speed and direction, training, fine-tuning, and optimizing machine learning models, performing rigorous evaluation and testing for model accuracy, generalization, and performance, collaborating and contributing to core product development to enhance platform capabilities, and setting up observability and monitoring systems to safety check model behavior in critical settings.

$170,000 – $250,000

Undisclosed

YEAR

(USD)

United States

Maybe global

Onsite

Python

C++

Model Evaluation

MLOps

Docker

View detail

Software Engineer - Frontend, Security Products

New

Top rated

OpenAI

–

Full-time

–

Posted

Jan 10, 2026 2:35

As a Full-Stack Software Engineer on the Security Products team, you will build, deploy, and maintain applications and systems that bring advanced AI-driven security capabilities to real users. You will work directly with internal and external customers to understand their workflows and translate them into intuitive, powerful product experiences. Your responsibilities include designing and building efficient and reusable frontend systems that support complex web applications, planning and deploying frontend infrastructure necessary for building, testing, and deploying products, collaborating across OpenAI’s product, research, engineering, and security organizations to maximize impact, and helping to shape the engineering culture, architecture, and processes of this new business unit.

$255,000 – $325,000

Undisclosed

YEAR

(USD)

San Francisco, United States

Maybe global

Onsite

TypeScript

Python

AWS

Kubernetes

Terraform

View detail

Product Security Applied AI Intern, Summer 2026

New

Top rated

Crusoe

–

Intern

Full-time

–

Posted

Jan 10, 2026 2:30

Assist in designing and implementing custom large language models (LLMs) and fine-tuning models for specific tasks. Build and experiment with agent libraries and workflow orchestration frameworks. Explore neo-cloud technologies, containerized environments, and virtualized infrastructure. Learn and apply security and privacy best practices in AI pipelines and deployments. Collaborate with the team to document, test, and optimize agent behaviors and models. Participate in knowledge sharing and mentorship sessions to gain exposure to AI, cloud, and security tradecraft.

$1,905 – $1,905 / week

Undisclosed

WEEK

(USD)

San Francisco, United States

Maybe global

Onsite

Python

PyTorch

TensorFlow

OpenAI API

Hugging Face

View detail

Mechanical Engineer - Hands

New

Top rated

Figure AI

–

Full-time

–

Posted

Jan 10, 2026 2:04

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000

Undisclosed

YEAR

(USD)

San Jose, United States

Maybe global

Onsite

Python

PyTorch

AWS

GCP

Kubernetes

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.

Join our community

(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are Kubernetes AI jobs?","answer":"Kubernetes AI jobs involve orchestrating containerized machine learning applications at scale. Professionals in these roles manage container deployment for AI workloads, distribute computational tasks across nodes for model training, allocate GPU resources efficiently, and automate ML pipelines. They typically work with frameworks like TensorFlow and PyTorch while ensuring high availability for production AI systems through automated scaling and self-healing capabilities."},{"question":"What roles commonly require Kubernetes skills?","answer":"Roles requiring Kubernetes skills include Machine Learning Engineers who deploy models to production, MLOps Engineers working with platforms like Kubeflow, Data Engineers managing processing pipelines, Platform Engineers supporting agentic AI applications, DevOps/SRE professionals handling containerized deployments, and Cloud Architects designing scalable environments. These positions typically involve maintaining infrastructure that supports the complete machine learning lifecycle."},{"question":"What skills are typically required alongside Kubernetes?","answer":"Alongside Kubernetes, employers typically look for container fundamentals (especially Docker), distributed systems knowledge, CI/CD pipeline experience, and cloud platform familiarity. Programming skills are essential for deployment scripts, while experience with ML frameworks like TensorFlow or PyTorch is valuable for AI-specific implementations. Understanding storage solutions, Kubernetes operators, and automated infrastructure management rounds out the typical skill requirements."},{"question":"What experience level do Kubernetes AI jobs usually require?","answer":"Kubernetes AI jobs typically require mid to senior-level experience. Employers look for professionals who understand containerization concepts, have worked with distributed systems, and can manage complex ML workflows. Prior exposure to cloud environments where Kubernetes runs is important. Candidates should demonstrate practical experience with CI/CD pipelines and familiarity with at least one major ML framework."},{"question":"What is the salary range for Kubernetes AI jobs?","answer":"Kubernetes AI jobs command competitive salaries due to the specialized intersection of container orchestration and machine learning skills. Compensation varies based on experience level, location, and specific industry. Roles requiring both strong AI expertise and Kubernetes infrastructure management typically offer premium compensation compared to general software engineering positions, reflecting the high market value of these combined skill sets."},{"question":"Are Kubernetes AI jobs in demand?","answer":"Kubernetes AI jobs are in high demand as organizations increasingly adopt containerized applications for machine learning workloads. The growth is driven by enterprises scaling their AI operations, edge computing applications, and the need for platform-agnostic infrastructure. Companies seek professionals who can manage the complexity of distributed ML systems, particularly for high-availability production environments and automated ML pipelines."},{"question":"What is the difference between Kubernetes and Docker in AI roles?","answer":"Docker creates containerized applications while Kubernetes orchestrates those containers at scale. In AI roles, Docker is used to package ML applications with their dependencies, while Kubernetes manages deployment across clusters, automates scaling during training, and handles resource allocation for GPUs. Docker provides consistency between environments, while Kubernetes adds critical production capabilities like load balancing, self-healing, and distributed computing for AI workloads."}]