Kubernetes AI Jobs

Discover the latest remote and onsite Kubernetes AI roles across top active AI companies. Updated hourly.

Check out 301 new Kubernetes AI roles opportunities posted on The Homebase

Principal Data Scientist

New
Top rated
PhysicsX
Full-time
Full-time
Posted

Take part in building a platform used by Data Scientists and Simulation Engineers to build, train and deploy Deep Physics Models. Work on a focused, stream-aligned and cross-functional team (back-end, front-end, design) that is empowered to make its implementation decisions towards meeting its objectives. Gather and leverage domain knowledge and experience from the Data Scientists and Simulation Engineers using your product.

Undisclosed

()

Shoreditch, Singapore
Maybe global
Hybrid
Python
Go
Docker
Kubernetes
CI/CD

Delivery Engineer

New
Top rated
PhysicsX
Full-time
Full-time
Posted

Take part in building a platform used by Data Scientists and Simulation Engineers to build, train and deploy Deep Physics Models. Work on a focused, stream-aligned and cross-functional team (back-end, front-end, design) empowered to make its implementation decisions towards meeting its objectives. Gather and leverage domain knowledge and experience from the Data Scientists and Simulation Engineers using your product.

Undisclosed

()

Singapore
Maybe global
Hybrid
Python
Go
Docker
Kubernetes
CI/CD

Software Engineer, ML Data Infrastructure

New
Top rated
Ideogram
Full-time
Full-time
Posted

The Software Engineer, ML Data Infrastructure will collaborate with engineers to build AI design experiences, tackle complex technical challenges including scaling distributed systems, build robust data infrastructure for foundation models at petabyte scale ensuring reliability and performance across multi-modal training pipelines, optimize data processing workflows for massive throughput, work with distributed systems, TPU infrastructure, and large-scale storage solutions, and partner with research scientists to translate data requirements into production-grade systems that accelerate model development cycles.

Undisclosed

()

Toronto, Canada
Maybe global
Onsite
Python
Kubernetes
GCP
Docker
Data Pipelines

AI / ML Solutions Engineer

New
Top rated
Anyscale
Full-time
Full-time
Posted

The AI / ML Solutions Engineer at Anyscale is responsible for designing, implementing, and scaling machine learning and AI workloads using Ray and Anyscale directly with customers. This includes implementing production AI / ML workloads such as distributed model training, scalable inference and serving, and data preprocessing and feature pipelines. The role involves working hands-on with customer codebases to refactor or adapt existing workloads to Ray. The engineer advises customers on ML system architecture including application design for distributed execution, resource management and scaling strategies, and reliability, fault tolerance, and performance tuning. They guide customers through architectural and operational changes needed to adopt Ray and Anyscale effectively. Additionally, the engineer partners with customer MLE and MLOps teams to integrate Ray into existing platforms and workflows, supports CI/CD, monitoring, retraining, and operational best practices, and helps customers transition from experimentation to production-grade ML systems. They also enable customer teams through working sessions, design reviews, training delivery, and hands-on guidance, contribute feedback to product, engineering, and education teams, and help develop reference architectures, examples, and best practices based on real customer use cases.

Undisclosed

()

Maybe global
Remote
Python
Kubernetes
AWS
GCP
MLflow

Software Engineer, Codex for Teams

New
Top rated
OpenAI
Full-time
Full-time
Posted

As a Software Engineer on the Codex for Teams team, you will be responsible for shaping the evolution of Codex by identifying how teams actually use and sometimes break AI-powered software engineering tools, driving changes across product, infrastructure, and model behavior to make Codex a reliable teammate for organizations. You will build core team and enterprise primitives that enable Codex to scale, including role-based access control (RBAC), admin and audit surfaces, usage and rate limits, pricing controls, managed configuration and constraints, and analytics for deep visibility into Codex usage. You will design and own secure, observable, full-stack systems that power Codex across web, IDEs, CLI, and CI/CD environments, integrating with enterprise identity and governance systems (SSO/SAML/OIDC, SCIM, policy enforcement) and developing data-access patterns that are performant, compliant, and trustworthy. The role involves leading real-world deployments and launches by working directly with customers and the Go To Market team to roll out Codex, using live usage and operational feedback to rapidly iterate and improve the product and platform capabilities. This position owns systems end-to-end, from architecture and implementation to production operations, emphasizing quality and velocity.

$255,000 – $325,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Go
Docker
Kubernetes
CI/CD

Solutions Engineer (AI/ML, Pre-Sales)

New
Top rated
DatologyAI
Full-time
Full-time
Posted

The Solutions Engineer (AI/ML, Pre-Sales) will work closely with strategic customers to understand their data curation needs, business challenges, and technical requirements. The role involves leading end-to-end customer proofs of concept (PoCs) that connect data curation to training behavior and evaluation outcomes, including dataset analysis, training plan design, and interpreting results. They will partner with customer machine learning teams to map data and curation strategies, design and execute evaluation plans for base and post-trained models, select appropriate benchmarks and metrics, and run model evaluations. Additionally, the engineer will produce customer-ready evaluation reports detailing methodology, metrics, baselines, ablations (e.g., curated vs raw data), conclusions, and recommendations for productionization. They must communicate technical results effectively to both ML experts and executive stakeholders, explaining tradeoffs in compute, latency, and deployment cost. Collaboration with go-to-market, engineering, and research teams is essential to deliver compelling demos, align on requirements, and incorporate customer insights into model training and product strategies. The role also includes providing technical guidance, training, and documentation to enable prospects to confidently assess the solution.

$230,000 – $300,000
Undisclosed
YEAR

(USD)

Redwood City, United States
Maybe global
Onsite
Python
PyTorch
Hugging Face
Distributed Training
Cloud Platforms

Senior Software Engineer, Applied AI

New
Top rated
Lumi AI
Full-time
Full-time
Posted

As a Software Engineer working on AI systems, responsibilities include playing a foundational role in research, experimentation, and rapid improvement of AI systems to build a capable, reliable AI automation platform used worldwide in mission critical production environments. Tasks involve designing experiments and testing ideas to optimize key internal AI benchmarks, designing and improving evaluation frameworks to accelerate experimentation speed and direction, training, fine-tuning, and optimizing machine learning models, performing rigorous evaluation and testing for model accuracy, generalization, and performance, collaborating and contributing to core product development to enhance platform capabilities, and setting up observability and monitoring systems to safety check model behavior in critical settings.

$170,000 – $250,000
Undisclosed
YEAR

(USD)

United States
Maybe global
Onsite
Python
C++
Model Evaluation
MLOps
Docker

Software Engineer - Frontend, Security Products

New
Top rated
OpenAI
Full-time
Full-time
Posted

As a Full-Stack Software Engineer on the Security Products team, you will build, deploy, and maintain applications and systems that bring advanced AI-driven security capabilities to real users. You will work directly with internal and external customers to understand their workflows and translate them into intuitive, powerful product experiences. Your responsibilities include designing and building efficient and reusable frontend systems that support complex web applications, planning and deploying frontend infrastructure necessary for building, testing, and deploying products, collaborating across OpenAI’s product, research, engineering, and security organizations to maximize impact, and helping to shape the engineering culture, architecture, and processes of this new business unit.

$255,000 – $325,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
TypeScript
Python
AWS
Kubernetes
Terraform

Product Security Applied AI Intern, Summer 2026

New
Top rated
Crusoe
Intern
Full-time
Posted

Assist in designing and implementing custom large language models (LLMs) and fine-tuning models for specific tasks. Build and experiment with agent libraries and workflow orchestration frameworks. Explore neo-cloud technologies, containerized environments, and virtualized infrastructure. Learn and apply security and privacy best practices in AI pipelines and deployments. Collaborate with the team to document, test, and optimize agent behaviors and models. Participate in knowledge sharing and mentorship sessions to gain exposure to AI, cloud, and security tradecraft.

$1,905 – $1,905 / week
Undisclosed
WEEK

(USD)

San Francisco, United States
Maybe global
Onsite
Python
PyTorch
TensorFlow
OpenAI API
Hugging Face

Mechanical Engineer - Hands

New
Top rated
Figure AI
Full-time
Full-time
Posted

Design, deploy, and maintain Figure's training clusters. Architect and maintain scalable deep learning frameworks for training on massive robot datasets. Work together with AI researchers to implement training of new model architectures at a large scale. Implement distributed training and parallelization strategies to reduce model development cycles. Implement tooling for data processing, model experimentation, and continuous integration.

$150,000 – $350,000
Undisclosed
YEAR

(USD)

San Jose, United States
Maybe global
Onsite
Python
PyTorch
AWS
GCP
Kubernetes

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are Kubernetes AI jobs?","answer":"Kubernetes AI jobs involve orchestrating containerized machine learning applications at scale. Professionals in these roles manage container deployment for AI workloads, distribute computational tasks across nodes for model training, allocate GPU resources efficiently, and automate ML pipelines. They typically work with frameworks like TensorFlow and PyTorch while ensuring high availability for production AI systems through automated scaling and self-healing capabilities."},{"question":"What roles commonly require Kubernetes skills?","answer":"Roles requiring Kubernetes skills include Machine Learning Engineers who deploy models to production, MLOps Engineers working with platforms like Kubeflow, Data Engineers managing processing pipelines, Platform Engineers supporting agentic AI applications, DevOps/SRE professionals handling containerized deployments, and Cloud Architects designing scalable environments. These positions typically involve maintaining infrastructure that supports the complete machine learning lifecycle."},{"question":"What skills are typically required alongside Kubernetes?","answer":"Alongside Kubernetes, employers typically look for container fundamentals (especially Docker), distributed systems knowledge, CI/CD pipeline experience, and cloud platform familiarity. Programming skills are essential for deployment scripts, while experience with ML frameworks like TensorFlow or PyTorch is valuable for AI-specific implementations. Understanding storage solutions, Kubernetes operators, and automated infrastructure management rounds out the typical skill requirements."},{"question":"What experience level do Kubernetes AI jobs usually require?","answer":"Kubernetes AI jobs typically require mid to senior-level experience. Employers look for professionals who understand containerization concepts, have worked with distributed systems, and can manage complex ML workflows. Prior exposure to cloud environments where Kubernetes runs is important. Candidates should demonstrate practical experience with CI/CD pipelines and familiarity with at least one major ML framework."},{"question":"What is the salary range for Kubernetes AI jobs?","answer":"Kubernetes AI jobs command competitive salaries due to the specialized intersection of container orchestration and machine learning skills. Compensation varies based on experience level, location, and specific industry. Roles requiring both strong AI expertise and Kubernetes infrastructure management typically offer premium compensation compared to general software engineering positions, reflecting the high market value of these combined skill sets."},{"question":"Are Kubernetes AI jobs in demand?","answer":"Kubernetes AI jobs are in high demand as organizations increasingly adopt containerized applications for machine learning workloads. The growth is driven by enterprises scaling their AI operations, edge computing applications, and the need for platform-agnostic infrastructure. Companies seek professionals who can manage the complexity of distributed ML systems, particularly for high-availability production environments and automated ML pipelines."},{"question":"What is the difference between Kubernetes and Docker in AI roles?","answer":"Docker creates containerized applications while Kubernetes orchestrates those containers at scale. In AI roles, Docker is used to package ML applications with their dependencies, while Kubernetes manages deployment across clusters, automates scaling during training, and handles resource allocation for GPUs. Docker provides consistency between environments, while Kubernetes adds critical production capabilities like load balancing, self-healing, and distributed computing for AI workloads."}]