AI MLOps / DevOps Engineer Jobs | Top AI MLOps / DevOps Engineer Openings in 2025

Customer Support Engineer

Together AI

201-500

USD

180000

260000

United States

Full-time

Remote

Customer Support Engineer Location: San Francisco, CA (Hybrid) About the role: As a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they build out training, fine tuning, and inference solutions with Together AI. You'll dive deep into complex technical challenges, providing swift and effective solutions while serving as a product expert. As a part of the Customer Experience organization, you will collaborate closely with product and sales, driving continuous improvement of our offerings. This is an exciting opportunity for a deeply technical professional passionate about AI and customer success to make a significant impact in a fast-paced, innovative environment. Responsibilities Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services; ensure swift and effective solutions every time. Become a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams. Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns; collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction. Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support) Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers. Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers. Qualifications 5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI  Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments. Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages. Familiarity with operating storage systems in HPC environments such as Vast and Weka Familiarity with inspecting and resolving network-related errors  Strong knowledge of Python, TypeScript, and/or JavaScript with testing/debugging experience using curl and Postman-like tools Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters. Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success. Strong sense of ownership and willingness to learn new skills to ensure both team and customer success. Excellent communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders. Ability to operate in dynamic environments, adept at managing multiple projects, and comfortable with frequent context switching and prioritization. About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.  Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $180K-260K + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

MLOps / DevOps Engineer

Software Engineer

Apply

May 27, 2025

Hidden link

Engineering Manager, API Experience

Anthropic

1001-5000

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 27, 2025

Hidden link

Senior Engineering Manager, Multimodal AI Editors

Labelbox

201-500

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 23, 2025

Hidden link

Senior Applied AI Engineer

Horizon3ai

201-500

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 23, 2025

Hidden link

Member of Technical Staff, Search

Cohere

501-1000

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 23, 2025

Hidden link

TPU Kernel Engineer

Anthropic

1001-5000

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 22, 2025

Hidden link

Platform Engineer

Norm AI

101-200

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 22, 2025

Hidden link

Senior Platform Engineer

Abridge

201-500

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

May 21, 2025

Hidden link

Senior Site Reliability Engineer - Fleet Reliability

Lambda AI

501-1000

United States

Full-time

Remote

MLOps / DevOps Engineer

Apply

April 8, 2025

Hidden link

AI Engineer — LLM Infra

Yutori

11-50

United States

Full-time

Remote

Yutori is reimagining how people interact with the web by building AI agents that can reliably do everyday digital tasks. We are building the entire stack to be agent-first, from training our own models to generative product interfaces.Towards this goal, we are looking for a member of the AI technical staff to join the founding team. Someone technically strong, and excited about building superhuman AI agents that take actions on the web.Our founders — Devi Parikh, Abhishek Das, Dhruv Batra — have decades of experience in AI research and product spanning generative, multimodal and embodied AI at Meta. Our team combines AI experience with design-minded product thinking to build and deliver on Yutori’s mission.Yutori is backed by a stellar set of visionary investors — Elad Gil, Sarah Guo, Jeff Dean, Fei-Fei Li, Amjad Masad, Guillermo Rauch, Akshay Kothari, Soleio, Oliver Cameron, Julien Chaumond, Logan Kilpatrick, Bryan McCann, Vladlen Koltun, Jamie Cuffe, Michele Catasta, etc.Responsibilities:Scale infra for post-training of multimodal LLMs (CPT, SFT, RL, search, reward models)Scale infra for agentic inference (throughput and latency of perception-planning-action loops)Build the foundations of a superhuman generalist web-agentWork closely with product engineers to translate cutting-edge AI capabilities into reliable product experiences.What we’re looking for:Experience with ML infrastructure (GPU clusters) and supporting networking (NCCL)Experience optimizing post-training and inference performance of multimodal LLMs (data/tensor/pipeline/context/expert parallelism, optimizing MFU, throughput, latency)Low level systems experience (Triton, CUDA)High IQ, high EQ, high agency, high craftsmanship, low ego. Proactive, clear communication.Benefits and perks:Competitive salary and equityVisa sponsorship and relocation stipend to bring you to SFGenerous health, dental, vision insurance for you and your dependents20 days of paid time off per yearWork laptop and budget to set up your work officeDaily team lunchesCommuter benefitsSmall, focused team of high-potential individuals. In-person in SF.

MLOps / DevOps Engineer

Apply

March 26, 2025

Hidden link