Top MLOps / DevOps Engineer Jobs Openings in 2025

Looking for opportunities in MLOps / DevOps Engineer? This curated list features the latest MLOps / DevOps Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.

Parallel.jpg

Member of Technical Staff, Infrastructure & Scaling

Parallel
-
US.svg
United States
Full-time
Remote
false
At Parallel Web Systems, we are bringing a new web to life: it’s built with, by, and for AIs. Our work spans innovations across crawling, indexing, ranking, retrieval, and reasoning systems. Our first product is a set of APIs for AIs to do more with web data. We are a fully in-person team based in Palo Alto, CA. Our organization is flat; our team is small and talent dense.We want to talk to you if you are someone who can bring us closer to living our aspirational values:Own customer impact - It’s on us to ensure real-world outcomes for our customers.Obsess over craft - Perfect every detail because quality compounds.Accelerate change - Ship fast, adapt faster, and move frontier ideas into production.Create win-wins - Creatively turn trade-offs into upside.Make high-conviction bets - Try and fail. But succeed an unfair amount.Job: You will build, operate, and scale our infrastructure, including our infrastructure around large language models, and ensure that our systems are reliable and cost-efficient as we grow. You will anticipate bottlenecks before they appear, ensure that our architecture evolves to meet increasing demands, and build the tools and systems that keep engineering velocity high.You: Have deep intuition on distributed systems, cloud platforms, performance tuning, and scalable architecture. You like to reason about trade-offs between cost, reliability, and speed of iteration. You care about your work enabling every team to build faster and ship confidently, and about infrastructure that can support products used by millions without breaking a sweat.Our founder is Parag Agrawal. Previously, he was the CEO and CTO at Twitter. Our investors include First Round Capital, Index Ventures, Khosla Ventures, and many others.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Retell AI.jpg

Founding DevOps Engineer

Retell AI
USD
0
215000
-
290000
US.svg
United States
Full-time
Remote
false
About Retell AIAt Retell AI, we're not just automating calls—we’re transforming how the world communicates. Our AI voice agents are reshaping sales, support, and customer engagement for leading brands. Backed by Alt Capital, Y Combinator, and top-tier investors, we've raised $4.7M in seed funding and hit $14M ARR with just 12 people.We’re one of the fastest-growing Voice AI startups and we're on a mission to become the standard for voice automation at scale. We're also one of the top ranking startups at https://leanaileaderboard.com/.About the RoleAs a Founding DevOps Engineer, you’ll be the owner of our build, release, and runtime foundations. You’ll design and automate deployment pipelines for both cloud SaaS and on-prem environments, orchestrate containers at scale, and ship reliable releases that meet compliance requirements. You’ll work cross-functionally with product, security, and customer teams—then turn what you learn in the field into reusable platform capabilities.Key ResponsibilitiesOwn CI/CD end-to-end: design, implement, and operate pipelines with blue/green, canary, and phased rollouts; define graceful draining for HA systems.Architect, maintain, and harden Kubernetes-based runtime (Docker, Kubernetes, Helm), including multi-cluster and multi-tenant concerns.Manage cloud deployments across AWS/Azure/GCP and coordinate with on-prem infrastructure teams; standardize with IaC (e.g., Terraform).Implement robust observability (metrics, logs, traces), SLOs/error budgets, and automated rollback/one-click restore.Partner with compliance to integrate SOC 2 / ISO 27001 / HIPAA controls into pipelines (artifact signing, SBOMs, change management, access/keys).Deploy at customer sites (cloud or on-prem), collaborating with client teams for integration, runbooks, and handover.Lead incident response & postmortems; drive resilience, cost, and performance improvements.Document release processes and platform conventions; codify best practices into tooling and templates.You Might Thrive If YouHave deep hands-on experience with a major cloud (AWS, Azure, or GCP) and container orchestration (Kubernetes, Helm).Build production-grade CI/CD with GitHub Actions / GitLab CI / Jenkins (or similar), including complex rollout strategies.Have shipped both SaaS and on-prem solutions, navigating networking, security, and environment drift.Can integrate compliance and security into delivery (secret management, image signing, policy-as-code).Are comfortable with networking fundamentals, security hardening, and performance tuning.Communicate clearly, move fast in ambiguity, and enjoy being the responsible adult in prod.Job DetailsJob Type: Full-time, 70 hr/week (50 hr/week onsite with flexible hours + 20 hr/week work from home)Cash: 215k - 290kEquity: 0.3 - 0.6%Location: Redwood City, CA, USUS Visas: Sponsors Visa & Green CardOther Benefits100% medical, dental, vision insurance coverageUnlimited breakfast, lunch, dinner, and snacksGym and daily commute fee reimbursementInternet and phone bill coveredCompensation PhilosophyBest Offer Upfront: Choose from three cash-equity balance options, no negotiation needed.Top 1% Talent: Above-market pay (top 5 percentile) to attract high performers.High Ownership: Small teams, >$1M revenue/employee, and significant equity.Performance-Based: Offers tied to interview performance, not experience or past salaries.Interview ProcessOnline Assessment (25–30 min): One HackerRank coding questions on practical problem-solving (7 days to complete).Technical Phone Interview 1 (30 min): Live coding on CoderPad, focusing on data structures and algorithms.Technical Phone Interview 2 (30–45 min): Full-stack development with JavaScript, TypeScript, React, and Node.js in a local environment.Onsite/Virtual Interviews (2-3 hrs): Hosted in our office if located in the Bay Area or virtual, with three rounds:DevOps Build & Run: Design a Kubernetes deployment with blue/green, draining, and a 2-hour instance lifetime constraint; walk through rollout/rollback.Communication (FDE-style): Partner exercise on explaining trade-offs and aligning stakeholders.Systems Design (DevOps): Architect a generalized on-prem solution deployable across multiple clouds with different data stores, key vaults, encryption, availability/failover, monitoring, upgrades, and maintenance.Learn MoreRetell AI - API That Turns Your LLM Into A Human-Like Voice AgentRetell AI Basics: Everything You Need to Start Building Voice AgentsJoin Retell AI to shape the future of voice automation, building scalable, impactful full-stack systems that redefine AI-driven communication.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Otter.ai

Senior Software Engineer, Cloud Security

Otter.ai
USD
0
185000
-
210000
US.svg
United States
Full-time
Remote
false
The Opportunity We are seeking an experienced Cloud Security Engineer to join our team. The successful candidate will be responsible for designing, implementing, and maintaining the security of our cloud infrastructure and applications. This includes ensuring compliance with regulatory requirements, identifying and mitigating security risks, and collaborating with DevOps teams to ensure secure cloud deployments.  Your Impact Design and implement secure cloud architectures and configurations Conduct cloud security assessments and risk analyses Implement and manage cloud security controls, such as firewalls, access controls, and encryption technologies Monitor cloud security logs and investigate security alerts Respond to security incidents and develop incident response plans Ensure cloud compliance with regulatory requirements, such as HIPAA, PCI-DSS, and GDPR Collaborate with DevOps teams to ensure secure cloud deployments Develop and deliver security awareness training programs Stay up-to-date with emerging cloud security threats and technologies We're looking for someone who 4+ years of experience in cloud security engineering Strong knowledge of cloud security architectures, controls, and compliance requirements Expertise in the security of public cloud platforms (e.g. AWS, Microsoft Azure), especially securing multi-cloud networks and infrastructure, and designing cloud agnostic systems. Understand container security, network security, and cloud security services Experience building cloud security infrastructure (e.g. logging, monitoring vuln management, DLP)   Strong understanding of security frameworks, such as NIST and ISO 27001 Excellent problem-solving and analytical skills Strong communication and collaboration skills Bachelor's degree in Computer Science, Cybersecurity, or related field About Otter.ai  We are in the business of shaping the future of work. Our mission is to make conversations more valuable. With over 1B meetings transcribed, Otter.ai is the world’s leading tool for meeting transcription, summarization, and collaboration. Using artificial intelligence, Otter generates real-time automated meeting notes, summaries, and other insights from in-person and virtual meetings - turning meetings into accessible, collaborative, and actionable data that can be shared across teams and organizations. The company is backed by early investors in Google, DeepMind, Zoom, and Tesla. Otter.ai is an equal opportunity employer. We proudly celebrate diversity and are dedicated to inclusivity. *Otter.ai does not accept unsolicited resumes from 3rd party recruitment agencies without a written agreement in place for permanent placements. Any resume or other candidate information submitted outside of established candidate submission guidelines (including through our website or via email to any Otter.ai employee) and without a written agreement otherwise will be deemed to be our sole property, and no fee will be paid should we hire the candidate. Salary range Salary Range: $185,000 to $210,000 USD per year This salary range represents the low and high end of the estimated salary range for this position. The actual base salary offered for the role is dependent based on several factors. Our base salary is just one component of our comprehensive total rewards package.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Figure.jpg

Embedded Systems Integration Engineer

Figure AI
USD
0
140000
-
180000
US.svg
United States
Full-time
Remote
false
Figure is an AI Robotics company developing a general purpose humanoid. Our humanoid robot, Figure 02, is designed for commercial tasks and the home. We are based in San Jose, CA and require 5 days/week in-office collaboration. It’s time to build.About the Role We’re seeking an Embedded Systems Integration Engineer to build the backend infrastructure that validates the interaction between our hardware, firmware, and software. You will be the connective tissue across disciplines: owning how changes to firmware or software are tested against real hardware. Your work ensures we ship reliable, integrated systems that just work. This is a hands-on role where you will design and implement automated test frameworks, bring-up flows, and validation pipelines for embedded subsystems. You'll be responsible for catching regressions early, enabling fast iteration, and giving clear system-level pass/fail signals across the stack. Key Responsibilities Architect test infrastructure that exercises end-to-end functionality of embedded systems across hardware, firmware, and software boundaries. Develop backend systems (Python, CLI tools, internal APIs) to run tests, log results, and determine pass/fail conditions. Bring up and validate  subsystem and system level changes, tracking changes in behavior and performance across releases. Automate testing pipelines for regression detection and continuous integration. Debug and triage failures across layers—hardware faults, firmware bugs, or software integration issues. Collaborate with firmware, software, and hardware teams to define interface contracts and testable behaviors. Instrument devices under test using scopes, logic analyzers, and custom harnesses to characterize system response. Minimum Qualifications Bachelor’s in EE, CE, CS, or a related field. 3+ years of experience working with embedded systems. Strong understanding of how firmware interacts with hardware peripherals (I2C, Ethernet, SPI, CAN, UART, ADCs, GPIO, etc.). Proficiency in Python or similar scripting language for test automation. Experience bringing up custom embedded boards and working across firmware/software stacks. Familiarity with Linux-based development environments. Preferred Qualifications Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, TeamCity). Knowledge of test automation frameworks (e.g., PyTest, Robot Framework). Exposure to hardware-in-the-loop (HIL) systems. Familiarity with board-level validation, power-on sequencing, or sensor verification. Prior experience in robotics, automotive, aerospace, or other complex embedded systems. Comfort working hands-on at the bench with test equipment. The US base salary range for this full-time position is between $140,000 and $180,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended. 
MLOps / DevOps Engineer
Data Science & Analytics
Robotics Engineer
Software Engineering
Software Engineer
Software Engineering
Apply
Hidden link
Thoughtful AI.jpg

Sr. Security Engineer

Thoughtful
USD
0
170000
-
220000
US.svg
United States
Full-time
Remote
false
Join Our Mission to Revolutionize Healthcare Thoughtful is pioneering a new approach to automation for all healthcare providers! Our AI-powered Revenue Cycle Automation platform enables the healthcare industry to automate and improve its core business operations. We’re hiring a Senior Security Engineer to secure and scale our stack. You’ll own platform security, system reliability, audit readiness, and integration strategy across cloud and hybrid environments.  You'll take ownership of system reliability, security posture, audit readiness, and help guide our long-term integration strategy across cloud and legacy environments. We're unifying cloud-native and legacy systems into a secure, high-availability platform that powers AI-driven automation across healthcare. You’ll lead foundational work in infrastructure hardening, audit controls, and production observability, directly supporting mission-critical AI agents. You'll have executive support and budget to modernize everything from our VPN tunnels to our alerting stack. What You’ll Own: Integration Strategy: Lead infrastructure and tooling decisions as we unify multiple environments into a single, scalable platform. Audit Readiness: Own and drive SOC 2 Type II and HITRUST prep, working across engineering, compliance, and security. System Reliability: Ensure uptime, scalability, and fault tolerance across services. Set and enforce SLAs. On-Call Infrastructure: Stand up our alerting, escalation, and incident response systems. Observability: Improve logging, metrics, and dashboards using tools like HyperDX. Infrastructure Provisioning: Spin up and manage production-grade infrastructure using OpenTofu/Terraform. Security & Networking: Architect infrastructure with security best practices, including VPNs, IPsec tunnels, and hybrid network topologies. Your Qualifications: 8+ years of experience spanning Security, DevOps, and/or SRE roles in high-availability, cloud and hybrid environments—with a strong track record of leading integrations, hardening infrastructure, and ensuring audit/compliance readiness. Start-up mentality - desire to tackle ambiguous scope of work and willing to do whatever is necessary to drive the company/mission forward. Track record leading complex infrastructure integrations Deep AWS expertise; strong experience with Azure and/or GCP a bonus Proficiency in OpenTofu or Terraform for Infrastructure-as-Code Comfortable navigating hybrid cloud environments (e.g. EKS, legacy VMs, VPN tunnels) Solid Kubernetes experience (Knative experience a plus) Strong networking fundamentals and experience with on-prem systems Familiar with incident tooling (PagerDuty, Opsgenie) and setting SLOs/SLAs Personable and cross-functional: able to build rapport with stakeholders across engineering, compliance, and executive leadership Security-first mindset, with an eye for compliance and audit readiness Proficiency in SOC2 Type 2, HITRUST preparation. Comfortable spinning up new infrastructure as needed What Sets You Apart: You've integrated cutting edge cloud environments with customer's legacy environments You’ve built platforms, not just maintained them You treat DevOps as a product, not just a support function You care about developer experience, observability, and operational excellence   Why Thoughtful? Competitive compensation Equity participation: Employee Stock Options. Health benefits: Comprehensive medical, dental, and vision insurance. Time off: Generous leave policies and paid company holidays.  California Salary Range $170,000—$220,000 USD
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Lambda.jpg

Storage Engineering Manager

Lambda AI
USD
0
330000
-
495000
US.svg
United States
Full-time
Remote
false
Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services – servicing government, researchers, startups and Enterprises world-wide. If you'd like to build the world's best deep learning cloud, join us.  *Note: This position requires presence in our San Jose office location 4 days per week; Lambda’s designated work from home day is currently Tuesday. Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.In the world of distributed AI, raw GPU and CPU horsepower is just a part of the story. High-performance networking and storage are the critical components that enable and unite these systems, making groundbreaking AI training and inference possible.The Lambda Infrastructure Engineering organization forges the foundation of high-performance AI clusters by welding together the latest in AI storage, networking, GPU and CPU hardware.Our expertise lies at the intersection of:High-Performance Distributed Storage Solutions and Protocols: We engineer the protocols and systems that serve massive datasets at the speeds demanded by modern clustered GPUs.Dynamic Networking: We design advanced networks that provide multi-tenant security and intelligent routing without compromising performance, using the latest in AI networking hardware.Compute Virtualization: We enable cutting-edge virtualization and clustering that allows AI researchers and engineers to focus on AI workloads, not AI infrastructure, unleashing the full compute bandwidth of clustered GPUs.About the Role:We are seeking a seasoned Storage Engineering Manager with experience in the specification, evaluation, deployment, and management of HPC storage solutions across multiple datacenters to build out a world-class team. You will hire and guide a team of storage engineers in building storage infrastructure that serves our AI/ML infrastructure products, ensuring the seamless deployment and operational excellence of both the physical and logical storage infrastructure (including proprietary and open source solutions).Your role is not just to manage people, but to serve as the ultimate technical and operational authority for our high-performance, petabyte-scale storage solutions.Your leadership will be pivotal in ensuring our systems are not just high-performing, but also reliable, scalable, and manageable as we grow toward exascale.This is a unique opportunity to work at the intersection of large-scale distributed systems and the rapidly evolving field of artificial intelligence infrastructure. This is an opportunity to have a significant impact on the future of AI. You will be building the foundational infrastructure that powers some of the most advanced AI research and products in the world.What You’ll DoTeam Leadership & Management:Grow/Hire, lead, and mentor a top-talent team of high-performing storage engineers delivering HPC, petabyte-scale storage solutions.Foster a high-velocity culture of innovation, technical excellence, and collaboration.Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members.Drive outcomes by managing project priorities, deadlines, and deliverables using Agile methodologies.Technical Strategy & Execution:Drive the technical vision and strategy for Lambda distributed storage solutions.You will lead storage vendor selection criteria, vendor selection, and vendor relationship management (support, installation, scheduling, specification, procurement).Manage team in storage lifecycle management (installation, cabling, capacity upgrades, service, RMA, updating both hardware and software components as needed).You will guide choices around optimization of storage pools, sharding, and tiering/caching strategies.Lead team in tasks related to multi-tenant security, tenant provisioning, metering integration, storage protocol interconnection, and customer data-migration.Guide Storage SREs in development of scripting and automation tools for configuration management, monitoring, and operational tasks.Guide team in problem identification, requirements gathering, solution ideation, and stakeholder alignment on engineering RFCs.Lead the team in supporting customers.Cross-Functional Collaboration:Collaborate with the HPC Architecture team on drive selection, capacity determination, storage networking, cache placement, and rack layouts.Work closely with the storage software teams and networking teams to execute on cross-functional infrastructure initiatives and new data-center deployments including integration of storage protocols across a variety of on-prem storage solutions.Work with procurement data-center operations, and fleet engineering teams to deploy storage solutions into new and existing data centers.Work with vendors to troubleshoot customer performance, reliability, and data-integrity issues.Work closely with Networking, Compute, and Storage Software Engineering teams to deploy high-performance distributed storage solutions to serve AI/ML workloads.Partner with the fleet engineering team to ensure seamless deployment, monitoring, and maintenance of the distributed storage solutions.Innovation & Research:Stay current with the latest trends and research into AI and HPC storage technologies and vendor solutions.Guide team in investigating strategies for using Nvidia SuperNIC DPUs for storage edge-caching, offloading, and GPUDirect Storage capabilities.Work with the Lambda product team to uncover new trends in the AI inference and training product category that will inform emerging storage solutions.Encourage and support the team in exploring new technologies and approaches to improve system performance and efficiency.YouExperience:10+ years of experience in storage engineering with at least 5+ years in a management or lead role.Demonstrated experience leading a team of storage engineers and storage SREs on complex, cross-functional projects in a fast-paced startup environment.Extensive hands-on experience in designing, deploying, and maintaining distributed storage solutions in a CSP (Cloud Service Provider), NCP (Neo-Cloud provider), HPC-infrastructure integrator, or AI-infrastructure company.Experience with storage solutions serving storage volumes at a scale greater than 20PB.Strong project management skills, leading high-confidence planning, project execution, and delivery of team outcomes on schedule.Extensive experience with storage site reliability engineering.Experience with one or more of the following in an HPC or AI Infrastructure environment: Vast, DDN, Pure Storage, NetApp, Weka.Experience deploying CEPH at scale greater than 25PB.Technical Skills:Experience in serving one or more of the following storage protocols: object storage (e.g., S3), block storage (e.g., iSCSI), or file storage (e.g., NFS, SMB, Lustre).Professional individual contributor experience as a storage engineer or storage SRE.Familiarity with modern storage technologies (e.g., NVMe, RDMA, DPUs) and their role in optimizing performance.People Management:Experience building a high-performance team through deliberate hiring, upskilling, planned skills redundancy, performance-management, and expectation setting.Nice to HaveExperience:Experience driving cross-functional engineering management initiatives (coordinating events, strategic planning, coordinating large projects).Experience with NVidia SuperNIC DPUs for edge-caching (such as implementing GPUDirect Storage).Technical Skills:Deep experience with Vast, Weka and/or NetApp in an HPC or AI Infrastructure environment.Deep experience implementing CEPH in an HPC or AI infrastructure environment at a scale greater than 100PB.People Management:Experience driving organizational improvements (processes, systems, etc.)Experience training, or managing managers.Salary Range InformationThe annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. About LambdaFounded in 2012, ~400 employees (2025) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Liquid AI.jpg

Member of Technical Staff - Training Infrastructure Engineer

Liquid AI
-
US.svg
United States
Full-time
Remote
true
Work With UsAt Liquid, we’re not just building AI models—we’re redefining the architecture of intelligence itself. Spun out of MIT, our mission is to build efficient AI systems at every scale. Our Liquid Foundation Models (LFMs) operate where others can’t: on-device, at the edge, under real-time constraints. We’re not iterating on old ideas—we’re architecting what comes next.We believe great talent powers great technology. The Liquid team is a community of world-class engineers, researchers, and builders creating the next generation of AI. Whether you're helping shape model architectures, scaling our dev platforms, or enabling enterprise deployments—your work will directly shape the frontier of intelligent systems.This Role Is For You If:You have extensive experience building distributed training infrastructure for language and multimodal models, with hands-on expertise in frameworks like PyTorch Distributed, DeepSpeed, or Megatron-LMYou're passionate about solving complex systems challenges in large-scale model training—from efficient multimodal data loading to sophisticated sharding strategies to robust checkpointing mechanismsYou have a deep understanding of hardware accelerators and networking topologies, with the ability to optimize communication patterns for different parallelism strategiesYou're skilled at identifying and resolving performance bottlenecks in training pipelines, whether they occur in data loading, computation, or communication between nodesYou have experience working with diverse data types (text, images, video, audio) and can build data pipelines that handle heterogeneous inputs efficientlyDesired Experience:You've implemented custom sharding techniques (tensor/pipeline/data parallelism) to scale training across distributed GPU clusters of varying sizesYou have experience optimizing data pipelines for multimodal datasets with sophisticated preprocessing requirementsYou've built fault-tolerant checkpointing systems that can handle complex model states while minimizing training interruptionsYou've contributed to open-source training infrastructure projects or frameworksYou've designed training infrastructure that works efficiently for both parameter-efficient specialized models and massive multimodal systemsWhat You'll Actually Do:Design and implement high-performance, scalable training infrastructure that efficiently utilizes our GPU clusters for both specialized and large-scale multimodal modelsBuild robust data loading systems that eliminate I/O bottlenecks and enable training on diverse multimodal datasetsDevelop sophisticated checkpointing mechanisms that balance memory constraints with recovery needs across different model scalesOptimize communication patterns between nodes to minimize the overhead of distributed training for long-running experimentsCollaborate with ML engineers to implement new model architectures and training algorithms at scaleCreate monitoring and debugging tools to ensure training stability and resource efficiency across our infrastructureWhat You'll Gain:The opportunity to solve some of the hardest systems challenges in AI, working at the intersection of distributed systems and cutting-edge multimodal machine learningExperience building infrastructure that powers the next generation of foundation models across the full spectrum of model scalesThe satisfaction of seeing your work directly enable breakthroughs in model capabilities and performanceAbout Liquid AISpun out of MIT CSAIL, we’re a foundation model company headquartered in Boston. Our mission is to build capable and efficient general-purpose AI systems at every scale—from phones and vehicles to enterprise servers and embedded chips. Our models are designed to run where others stall: on CPUs, with low latency, minimal memory, and maximum reliability. We’re already partnering with global enterprises across consumer electronics, automotive, life sciences, and financial services. And we’re just getting started.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
X.jpg

Core Network Development Engineer - xAI Networking

X AI
-
US.svg
United States
IE.svg
Ireland
Full-time
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the Role xAI is building at a furious pace with the latest hardware to help people understand the universe and we are in need of Network Development Engineers (NDEs) who have at least 5+ years of experience in hyper scale data center networks and their ancillary networks that power them. You will own all the production networks that reside in the data centers, including our primary front and backend networks that train Grok and that customers use for inference. Engineers will own all aspects from design and development to build and operations. You will be expected to participate in a team oncall rota and to contribute to scaling and maintenance efforts. Responsibilities Will Include - Designing and implementing scalable network architectures for AI/HPC workloads. - Automating network operations, monitoring, and troubleshooting using scripting and tools. - Collaborating with cross-functional teams on data center buildouts and optimizations. - Analyzing network performance metrics to identify and resolve bottlenecks. - Ensuring high availability and security of production networks. Location Work will be in-office based out of either Palo Alto, California or Dublin, Ireland. There will be significant travel expected to Memphis, Tennessee for data center buildouts. For Dublin-based employees, additional travel to our head office in Palo Alto will be required for team collaboration. Travel to Memphis may account for up to 25-30% of time during peak buildout phases. Required Qualifications - A minimum of 5 years designing and operating hyper scale networks. - At least 3 years in either or both the ethernet/infiniband AI/HPC space. - Familiarity with networking protocols and tools (e.g., BGP, OSPF, ZTP etc.). - Experience with Python scripting and the ability to automate repetitive tasks and to acquire pertinent metrics and analyze large sets of data. - Deep understanding of RoCEv2. - Bachelor's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience). Preferred Experiences - Experience in AI/ML infrastructure or large-scale GPU clusters. - Proven track record in on-call rotations and incident response in high-stakes environments. - Strong problem-solving skills and ability to thrive in a fast-paced, ambiguous setting. Interview Process After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to an initial interview (45 minutes - 1 hour) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four interviews: Coding assessment in a language of your choice. Data center network technologies Manager Interview. Meet and greet  with the team with a presentation of a large scale solution or problem you owned, start to finish. Our goal is to finish the main process within one week. We don’t rely on recruiters for assessments. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet. Benefits Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Deepgram.jpg

Platform Engineer – AI/ML Infrastructure

Deepgram
USD
0
160000
-
220000
No items found.
Full-time
Remote
true
Company OverviewDeepgram is the leading voice AI platform for developers building speech-to-text (STT), text-to-speech (TTS) and full speech-to-speech (STS) offerings. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through APIs or as self-managed software – due to our unmatched accuracy, latency and pricing. Customers include software companies building voice products, co-sell partners working with large enterprises, and enterprises solving internal voice AI use cases. The company ended 2024 cash-flow positive with 400+ enterprise customers, 3.3x annual usage growth across the past 4 years, over 50,000 years of audio processed and over 1 trillion words transcribed. There is no organization in the world that understands voice better than DeepgramOpportunity:We're looking for an expert (Senior/Staff-level) Platform Engineer to build and operate the hybrid infrastructure foundation for our advanced AI/ML research and product development. You'll architect, build, and run the platform spanning AWS and our bare metal data centers, empowering our teams to train and deploy complex models at scale. This role is focused on creating a robust, self-service environment using Kubernetes, AWS, and Infrastructure-as-Code (Terraform), and orchestrating high-demand GPU workloads using schedulers like Slurm.What You’ll DoArchitect and maintain our core computing platform using Kubernetes on AWS and on-premise, providing a stable, scalable environment for all applications and services.Develop and manage our entire infrastructure using Infrastructure-as-Code (IaC) principles with Terraform, ensuring our environments are reproducible, versioned, and automated.Design, build, and optimize our AI/ML job scheduling and orchestration systems, integrating Slurm with our Kubernetes clusters to efficiently manage GPU resources.Provision, manage, and maintain our on-premise bare metal server infrastructure for high-performance GPU computing.Implement and manage the platform's networking (CNI, service mesh) and storage (CSI, S3) solutions to support high-throughput, low-latency workloads across hybrid environments.Develop a comprehensive observability stack (monitoring, logging, tracing) to ensure platform health, and create automation for operational tasks, incident response, and performance tuning.Collaborate with AI researchers and ML engineers to understand their infrastructure needs and build the tools and workflows that accelerate their development cycle.Automate the life cycle of single-tenant, managed deploymentsYou’ll Love This Role If YouAre passionate about building platforms that empower developers and researchers.Enjoy creating elegant, automated solutions for complex infrastructure challenges in both cloud and data center environments.Thrive on optimizing hybrid infrastructure for performance, cost, and reliability.Are excited to work at the intersection of modern platform engineering and cutting-edge AI.Love to treat infrastructure as a product, continuously improving the developer experience.It’s Important To Us That You Have5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE).Proven, hands-on experience building and managing production infrastructure with Terraform.Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment.Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads.Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management.Strong scripting and automation skills (e.g., Python, Go, Bash).  It Would Be Great if You Had Experience with CI/CD systems (e.g., GitLab CI, Jenkins, ArgoCD) and building developer tooling.Familiarity with FinOps principles and cloud cost optimization strategies.Knowledge of Kubernetes networking (e.g., Calico, Cilium) and storage (e.g., Ceph, Rook) solutions.Experience in a multi-region or hybrid cloud environment.Backed by prominent investors including Y Combinator, Madrona, Tiger Global, Wing VC and NVIDIA, Deepgram has raised over $85 million in total funding. If you're looking to work on cutting-edge technology and make a significant impact in the AI industry, we'd love to hear from you!Deepgram is an equal opportunity employer. We want all voices and perspectives represented in our workforce. We are a curious bunch focused on collaboration and doing the right thing. We put our customers first, grow together and move quickly. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, veteran status, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.We are happy to provide accommodations for applicants who need them.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Tandem Diabetes Care.jpg

Infrastructure Engineer

Tandem
USD
0
150000
-
250000
US.svg
United States
Full-time
Remote
false
Why you should join usTandem is a generational opportunity to rethink how we bring new therapies to market, and our path to doing so is significantly de-risked – we have:Exponential organic growth: We have product-market fit and are growing rapidly through word-of-mouth. Tandem supports thousands of patients every day, is doubling doctor users every quarter, and is working with the largest biopharma companies in the world.An AI-first business model: Our approach is distinctly enabled by AI, but our business will get stronger (not commoditized) as foundation models improve. We are building durability through two-sided network effects that will compound over time.Top tier investors: With the traction to support conviction in our model, we raised significant funding from investors (including Thrive Capital, General Catalyst, Bain Capital Ventures, and Pear VC) to build an exceptional team of engineers and operators.Our number one priority is scaling to market demand. We are looking for individuals who are high horsepower, high throughput, and hyper resourceful to help us increase capacity and grow. We move fast and need to move faster.All full-time roles are in person in New York. You can learn more about working with us in the last section of this page.About the roleAs an Infrastructure Engineer at Tandem, you’ll be our first dedicated hire focused on infrastructure and developer experience. You’ll own and evolve the core systems that make our engineering team faster, our platform more reliable, and our company more scalable. This role sits at the foundation of everything we build — from AI product development to partner integrations to high-stakes, high-throughput operations.You’ll have the opportunity to define our foundational infra practices — from CI/CD to observability to cloud architecture — and set the tone for how we scale. You’ll work across infrastructure, DevOps, and developer productivity, building systems that let us move fast without compromising stability or clarity.This is a demanding role, with a high level of autonomy and responsibility. You will be expected to "act like an owner" and commit yourself to Tandem's success. If you are low-ego, hungry to learn, and excited about intense, impactful work that drives both company growth and accelerated career progression, we want to hear from you.If you join, you will:Design and evolve CI/CD pipelines to improve speed, safety, and developer experienceScale and maintain core infrastructure — including Kubernetes clusters, PostgreSQL databases, ephemeral browsers, and data replication workflowsBuild and own our monitoring, alerting, and observability systems to ensure platform reliability and uptimeIntegrate AI-powered development tools to streamline engineering workflowsAnalyze and control cloud infrastructure spend while maintaining performance and reliabilityImprove internal tooling and developer environments to unblock execution at every layerPartner with engineering and product teams to anticipate scaling challenges and harden critical systemsWe’ll be most excited if you:Experience in infrastructure, DevOps, or SRE roles at fast-moving, product-driven tech companiesHands-on experience with cloud infrastructure (we use AWS; GCP and Azure are nice to have)Proficiency with infrastructure-as-code tooling (Terraform preferred)Deep familiarity with containerization and orchestration (Docker, Kubernetes)Experience with observability and monitoring systems (e.g., Grafana, Datadog)Proven ability to design and improve CI/CD pipelines (we use GitHub Actions)High NPS with your former teammatesThis is a list of ideal qualifications for this position. If you don't meet every single one of them, you should still consider applying! We’re excited to work with people from underrepresented backgrounds, and we encourage people from all backgrounds to apply.Working with usTandem is based in New York, with our full team working out of a beautiful and spacious office in SoHo. We run as a high-trust environment with high autonomy, which requires that everyone is fully competent and operates in line with our principles:Commit to audacity. "Whether you think you can, or you think you can't – you're right.”Do the math. Be rigorous, assume nothing.Find the shortest path. Use hacks, favors, and backdoors. Only take a longer road on purpose.Spit it out. Be direct, invite critique, avoid equivocation – we want right answers.Be demanding and supportive. Expect excellence from everyone and offer help to achieve it.Do what it takes to be number 1. We work hard to make sure we win.We provide competitive compensation with meaningful equity (for full-time employees). Everyone who joins early will be a major contributor to our success, and we reflect this through ownership and pay.We also provide rich benefits to ensure you can focus on creating impact (for full-time employees):Fully covered medical, vision, and dental insurance.Memberships for One Medical, Talkspace, Teladoc, and Kindbody.Unlimited paid time off (PTO) and 16 weeks of parental leave.401K plan setup, FSA option, commuter benefits, and DashPass.Lunch at the office every day and Dinner at the office after 7 pm.Our salary ranges are based on paying competitively for our company’s size and industry, and are one part of the total compensation package that also includes equity, benefits, and other opportunities at Tandem (for full-time employees). Individual pay decisions are ultimately based on a number of factors, including qualifications for the role, experience level, skillset, geography, and balancing internal equity. Tandem is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Lambda.jpg

SRE - Observability (Senior)

Lambda AI
USD
267000
-
401000
US.svg
United States
Full-time
Remote
false
Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services – servicing government, researchers, startups and Enterprises world-wide. If you'd like to build the world's best deep learning cloud, join us.  *Note: This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work from home day is currently Tuesday. Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance. What You’ll DoDeploy and operate observability platforms for logging, metrics, and distributed tracing.Automate the deployment and operation of these observability systems.Set up monitoring for modern AI/HPC clusters.Develop platform software to make observability adoptable and improve system reliability across Lambda engineering.Lead members of other engineering teams to design and develop solutions for their monitoring challenges.YouHave 8+ years of experience in software engineering, with 3+ years in GoHave 5+ years of experience in Site Reliability Engineering practicesPossess proven understanding of Observability tools and practicesHave experience with application deployment and monitoring using KubernetesHave experience building CI/CD pipelinesExpect quality and reliability from the solutions you buildEnjoy collaborating across team boundaries to help our engineering teams meet their observability needs.Nice to HaveExperience monitoring AI systems or HPC clustersExperience with Prometheus and writing queries in PromQLExperience with messaging systems like NATSUnderstanding of the OpenTelemetry ecosystem and experience with both OTel instrumentation and the OTel collectorExperience with network monitoring, Ethernet and InfinibandUnderstanding of dashboard design principlesStrong understanding of Linux fundamentals and system administration.Experience with infrastructure automation tooling such as Ansible and TerraformSalary Range InformationBased on market data and other factors, the annual salary range for this position is $267K-$401K. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. About LambdaFounded in 2012, ~350 employees (2024) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Together AI.jpg

Platform Engineer, Model Shaping

Together AI
USD
0
200000
-
290000
US.svg
United States
Full-time
Remote
false
About Model Shaping The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition to that, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad spectrum of ideas across machine learning, natural language processing, and ML systems. About the Role As a Platform Engineer at Model Shaping, you will work on the foundational layers of Together’s platform for model customization and evaluation. You will design the infrastructure and backend services that will allow us to sustainably and reliably scale the systems powering production workflows launched by our users, as well as internal research experiments. You will operate in a cross-functional environment, collaborating with other engineers and researchers in the team to improve the infrastructure based on the needs of projects they work on. You will also interact with other engineering teams at Together (such as Commerce, Data Engineering, and Cloud Infrastructure) to integrate the services developed by Model Shaping with systems developed by those teams. Responsibilities Design and build Together’s systems and infrastructure for model customization, including user-facing features and internal improvements Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response Create and improve internal tooling for deployment, continuous integration, and observability Build a job orchestration platform spanning multiple data centers, supporting a highly heterogeneous hardware landscape Partner with teams developing internal services, co-designing these services and incorporating them in systems built by Model Shaping Requirements 3+ years of experience in building infrastructure or backend components of production services Comfortable with the fundamentals of Linux environments and modern container/orchestration stacks (e.g., Docker and Kubernetes)  Strong software engineering background in Python or Go Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD) Skilled with analyzing non-trivial issues of complex software systems and documenting your findings Have cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment Strong communication skills, willing to document systems and processes and collaborate with peers of varying technical expertise Stand-out experience Developing large-scale production systems with high reliability requirements Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte) Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA’s networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA) Deployment of services for AI training or inference Maintaining or contributing to open-source projects About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancements such as FlashAttention, RedPajama, SWARM Parallelism, and SpecExec. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is $200,000 - $290,000. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy at https://www.together.ai/privacy
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
OpenAI.jpg

Enterprise Security Engineer

OpenAI
USD
260000
-
325000
US.svg
United States
Full-time
Remote
true
About the TeamWithin the OpenAI Security organization, our IT team works to ensure our team of researchers, engineers, and staff have the tools they need to work comfortably, securely, and with minimal interruptions. As an Enterprise Security Engineer, you will work in a highly technical and employee-focused environment.Our IT team is a small and nimble team, where you’ll have the opportunity to dive into a wide breadth of areas and build from the ground up. We’re well supported and well resourced, and have a mandate to deliver a world-class enterprise security program to our teams.About the RoleAs an Enterprise Security Engineer, you will be responsible for implementing and managing the security of OpenAI's internal information systems’ infrastructure and processes. You will work closely with our IT and Security teams to develop security capabilities, enforce security policies, and monitor internal systems for security threats.This role is open to remote employees, or relocation assistance is available to Seattle.In this role, you will:Develop and implement security measures to protect our company's information assets against unauthorized access, disclosure, or misuse.Monitor internal and external systems for security threats and respond to alerts.Contribute to and enforce our company's IT and Security policies and procedures.Work closely with our IT department to harden our infrastructure using best practices in AzureAD, GSuite, Github, and other SaaS tooling.Advise our employees on best practices for maintaining the security of their endpoints, and office AV and network infrastructure.Devise novel sharing controls and associated monitoring to protect company data, including  intelligent groups management, Data Loss Prevention (DLP) and other security controls as appropriate.Employ forward-thinking models like “secure by default” and “zero trust” to create sustainably secure environments for knowledge workers and developers.Identify and remediate vulnerabilities in our internal systems, adhering to best practices for data security.Use our own AI-driven models to develop systems for improved security detection and response, data classification, and other security-related tasks.Educate employees on the importance of data security, and advise them on best practices for maintaining a secure environment.Contribute to OpenAI's endpoint and cloud security roadmaps by staying up to date with the latest security threats, and making recommendations for improving our security posture.You might thrive in this role if you have: Experience in protecting and managing macOS fleets.Experience deploying and managing endpoint security solutions (e.g. management frameworks, EDR tools).Experience with public cloud service providers (e.g. Amazon AWS, Microsoft Azure).Experience with identity and access management frameworks and protocols, including SAML, OAUTH, and SCIM.Experience with e-mail security protocols (e.g. SPF, DKIM, DMARC) and controls.Intermediate or advanced proficiency with a scripting language (e.g. Python, Bash, or similar).Knowledge of modern adversary tactics, techniques, and procedures.Ability to empathize and collaborate with colleagues, independently manage and run projects, and prioritize efforts for risk reduction..About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Security Engineer
Software Engineering
Apply
Hidden link
OpenAI.jpg

Enterprise Security Engineer

OpenAI
USD
0
260000
-
325000
US.svg
United States
Full-time
Remote
true
About the TeamWithin the OpenAI Security organization, our IT team works to ensure our team of researchers, engineers, and staff have the tools they need to work comfortably, securely, and with minimal interruptions. As an Enterprise Security Engineer, you will work in a highly technical and employee-focused environment.Our IT team is a small and nimble team, where you’ll have the opportunity to dive into a wide breadth of areas and build from the ground up. We’re well supported and well resourced, and have a mandate to deliver a world-class enterprise security program to our teams.About the RoleAs an Enterprise Security Engineer, you will be responsible for implementing and managing the security of OpenAI's internal information systems’ infrastructure and processes. You will work closely with our IT and Security teams to develop security capabilities, enforce security policies, and monitor internal systems for security threats.This role is open to remote employees, or relocation assistance is available to San Francisco.In this role, you will:Develop and implement security measures to protect our company's information assets against unauthorized access, disclosure, or misuse.Monitor internal and external systems for security threats and respond to alerts.Contribute to and enforce our company's IT and Security policies and procedures.Work closely with our IT department to harden our infrastructure using best practices in AzureAD, GSuite, Github, and other SaaS tooling.Advise our employees on best practices for maintaining the security of their endpoints, and office AV and network infrastructure.Devise novel sharing controls and associated monitoring to protect company data, including  intelligent groups management, Data Loss Prevention (DLP) and other security controls as appropriate.Employ forward-thinking models like “secure by default” and “zero trust” to create sustainably secure environments for knowledge workers and developers.Identify and remediate vulnerabilities in our internal systems, adhering to best practices for data security.Use our own AI-driven models to develop systems for improved security detection and response, data classification, and other security-related tasks.Educate employees on the importance of data security, and advise them on best practices for maintaining a secure environment.Contribute to OpenAI's endpoint and cloud security roadmaps by staying up to date with the latest security threats, and making recommendations for improving our security posture.You might thrive in this role if you have: Experience in protecting and managing macOS fleets.Experience deploying and managing endpoint security solutions (e.g. management frameworks, EDR tools).Experience with public cloud service providers (e.g. Amazon AWS, Microsoft Azure).Experience with identity and access management frameworks and protocols, including SAML, OAUTH, and SCIM.Experience with e-mail security protocols (e.g. SPF, DKIM, DMARC) and controls.Intermediate or advanced proficiency with a scripting language (e.g. Python, Bash, or similar).Knowledge of modern adversary tactics, techniques, and procedures.Ability to empathize and collaborate with colleagues, independently manage and run projects, and prioritize efforts for risk reduction..About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
X.jpg

Site Reliability Engineer

X AI
-
US.svg
United States
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the Role As a Data Center Site Reliability Engineer (SRE) at xAI, you will play a pivotal role in ensuring the reliability, scalability, and performance of our state-of-the-art data center infrastructure, including the Colossus supercluster in Memphis—the world's largest AI training cluster with over 100,000 liquid-cooled Nvidia GPUs and plans for expansion to 1 million. This infrastructure powers advanced AI workloads, massive-scale model training, and products like Grok, enabling breakthroughs in understanding the universe. You will collaborate with cross-functional teams to automate operations, enhance observability, and maintain high availability for large-scale distributed systems. This is a hands-on technical position in a dynamic environment, offering the opportunity to tackle complex challenges at the intersection of AI, data center operations, and software reliability. Key Responsibilities Maintain and improve the reliability and uptime of xAI’s on-premises and cloud-based data center environments, including high-density GPU clusters for AI training. Design, implement, and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, PagerDuty). Develop and maintain infrastructure-as-code (Pulumi, Terraform) and continuous deployment pipelines (Buildkite, ArgoCD). Participate in on-call rotations, respond to incidents, perform root cause analysis, and drive post-mortem processes. Analyze system performance, forecast capacity needs, and optimize resource utilization for massive AI/ML workloads. Collaborate with hardware, networking, and software engineering teams to design and implement resilient, scalable solutions, such as RDMA fabrics and liquid-cooling systems. Create and maintain documentation and standard operating procedures. Contribute to the efficiency of AI training pipelines by identifying and mitigating bottlenecks in compute, storage, and networking at unprecedented scales. Required Qualifications Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience). 5+ years in site reliability engineering, data center operations, or large-scale infrastructure management. Expert-level knowledge of Kubernetes (on-prem and cloud), infrastructure-as-code tools (Pulumi, Terraform), and CI/CD systems (Buildkite, ArgoCD). Proficiency in at least one systems programming language (Rust, C++, Go) and strong scripting/automation skills. Deep understanding of monitoring and observability technologies. Strong troubleshooting skills across hardware, networking, and distributed software systems. Proven experience with incident response, including on-call rotations, rapid incident resolution, root cause analysis, and implementation of preventative measures. Excellent communication and documentation skills, with the ability to share knowledge concisely and accurately. Preferred Qualifications Experience supporting AI/ML workloads or high-density compute environments, including large-scale GPU clusters and HPC systems. Familiarity with data center electrical, cooling, and network systems, such as liquid-cooling and high-bandwidth interconnects. Certifications in SRE, Kubernetes, or data center operations. Experience with both on-premises and cloud infrastructure at scale. xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
X.jpg

Connectivity Systems Engineer

X AI
-
US.svg
United States
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.Job Summary The Connectivity Systems Engineer will be responsible for designing, specifying, and implementing structured cabling and fiber optic systems for xAI’s hyperscale data center build-outs. This role requires expertise in high-density cabling solutions, optical network design, and collaboration with cross-functional teams to deliver scalable, high-performance infrastructure that supports xAI’s advanced AI and GPU cluster workloads. You will play a critical role in ensuring the reliability and efficiency of our data center connectivity to drive our mission forward. Responsibilities Design and develop structured cabling systems, including copper (Cat6/6A) and fiber optic solutions (OM4, OM5, single-mode), for xAI’s data center environments, ensuring compliance with industry standards such as TIA-942 and BICSI. Create detailed optical system architectures, including high-density fiber optic networks (MPO/MTP, LC, DWDM), to support high-bandwidth, low-latency connectivity for AI and GPU clusters. Produce comprehensive design documentation, including schematics, rack elevations, pathway plans, and bills of materials (BOMs), using tools like AutoCAD, Revit, or Visio. Collaborate with network, electrical, and mechanical engineering teams to integrate cabling and optical designs into overall data center layouts, optimizing for power, cooling, and space constraints. Conduct capacity planning to ensure cabling infrastructure supports current and future AI workload demands, including 400G/800G network deployments. Oversee installation, testing, and validation of cabling and optical systems, using tools like OTDR, power meters, and VFL to ensure performance and reliability. Provide technical expertise during procurement, evaluating vendors and selecting cabling components, optical transceivers, and connectivity hardware. Develop and maintain cabling standards, labeling schemes, and documentation processes to ensure consistency across multi-site data center deployments. Support rapid design iterations to meet aggressive project timelines, addressing challenges such as high-density rack configurations and advanced cooling systems (e.g., liquid cooling). Troubleshoot and resolve cabling-related issues during commissioning and operations to ensure minimal downtime for mission-critical systems. Contribute to process improvements and automation initiatives (e.g., Python scripting for design validation) to enhance cabling design workflows. Qualifications Basic Qualifications Bachelor’s degree in Electrical Engineering, Telecommunications Engineering, or a related field; or equivalent professional experience. 3+ years of experience in structured cabling and fiber optic system design, preferably in data center or hyperscale environments. Proficiency in design tools such as AutoCAD, Revit, or Visio for creating detailed schematics and layouts. Strong knowledge of fiber optic technologies, including single-mode, multi-mode, DWDM, and high-density connectivity solutions (MPO/MTP, LC). Familiarity with data center standards (e.g., TIA-942, Uptime Institute) and networking protocols (e.g., Ethernet, InfiniBand). Preferred Qualifications 5+ years of experience in data center cabling design and deployment for hyperscale or cloud provider environments. Experience with AI/HPC infrastructure, GPU cluster connectivity, or liquid-cooled data center designs. Proficiency in scripting or automation tools (e.g., Python, Bash) for design or testing workflows. Certifications such as RCDD (Registered Communications Distribution Designer), DCDC (Data Center Design Consultant), or CFOT (Certified Fiber Optic Technician). Knowledge of network hardware (e.g., NVIDIA, Cisco, Arista) and optical transceivers (QSFP-DD, OSFP). xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
X.jpg

Site Reliability Engineer - Storage

X AI
USD
180000
-
440000
US.svg
United States
Full-time
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the role As a Site Reliability Storage Engineer, you will play a pivotal role in designing, building, and operating exascale storage systems to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure our heterogenous storage systems in on-prem + cloud are reliable and performant.  We’re seeking engineers with expertise in exascale data management systems or distributed filesystems to join our mission-driven team. What you’ll do Develop and optimize software to manage exascale data, enabling efficient and reliable access for xAI researchers working on advanced AI models. Enhance the reliability, performance, and cost-effectiveness of xAI’s storage infrastructure to support large-scale AI research workloads. Collaborate closely with researchers to understand their data use cases and tailor storage solutions to meet their needs. Implement robust security measures to safeguard critical datasets, ensuring data integrity and confidentiality. Ideal Experience You’d be an exceptional candidate if you possess some (or all) of the following: Writing scalable, high-performance code in Rust or Go for storage-related applications or tooling. Managing storage infrastructure with IaC tools like Pulumi, Terraform, or Ansible. Past experience working with storage vendors facilitating partnership alignment, and integrating their tooling within xAI’s Infrastructure.   Familiarity with Kubernetes storage primitives (e.g., Persistent Volumes, CSI drivers) and integrating storage with containerized workloads. Bonus: Experience with AI/ML data pipelines, including handling large datasets for training and inference. Tech Stack Kubernetes Pulumi Rust and Go Interview Process After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 45 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews: Coding assessment in Python, Golang, or Rust Systems hands-on: Demonstrate practical skills in a live problem-solving session. Coding assessment or system design discussion based on the candidate's background.  Project deep-dive: Present your past exceptional work to a small audience. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet. We do not condone usage of AI in interviews and have tools to detect AI usage. Benefits Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. Annual Salary Range $180,000 - $440,000 USDxAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
OpenAI.jpg

Security Engineer, Agent Security

OpenAI
USD
0
325000
-
405000
US.svg
United States
Full-time
Remote
false
About the Team The team’s mission is to accelerate the secure evolution of agentic AI systems at OpenAI. To achieve this, the team designs, implements, and continuously refines security policies, frameworks, and controls that defend OpenAI’s most critical assets—including the user and customer data embedded within them—against the unique risks introduced by agentic AI.About the Role As a Security Engineer on the Agent Security Team, you will be at the forefront of securing OpenAI’s cutting-edge agentic AI systems. Your role will involve designing and implementing robust security frameworks, policies, and controls to safeguard OpenAI’s critical assets and ensure the safe deployment of agentic systems. You will develop comprehensive threat models, partner tightly with our Agent Infrastructure group to fortify the platforms that power OpenAI’s most advanced agentic systems, and lead efforts to enhance safety monitoring pipelines at scale.We are looking for a versatile engineer who thrives in ambiguity and can make meaningful contributions from day one. You should be prepared to ship solutions quickly while maintaining a high standard of quality and security.We’re looking for people who can drive innovative solutions that will set the industry standard for agent security. You will need to bring your expertise in securing complex systems and designing robust isolation strategies for emerging AI technologies, all while being mindful of usability. You will communicate effectively across various teams and functions, ensuring your solutions are scalable and robust while working collaboratively in an innovative environment. In this fast-paced setting, you will have the opportunity to solve complex security challenges, influence OpenAI’s security strategy, and play a pivotal role in advancing the safe and responsible deployment of agentic AI systems.This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.You’ll be responsible for:Architecting security controls for agentic AI – design, implement, and iterate on identity, network, and runtime-level defenses (e.g., sandboxing, policy enforcement) that integrate directly with the Agent Infrastructure stack.Building production-grade security tooling – ship code that hardens safety monitoring pipelines across agent executions at scale.Collaborating cross-functionally – work daily with Agent Infrastructure, product, research, safety, and security teams to balance security, performance, and usability.Influencing strategy & standards – shape the long-term Agent Security roadmap, publish best practices internally and externally, and help define industry standards for securing autonomous AI.We’re looking for someone with:Strong software-engineering skills in Python or at least one systems language (Go, Rust, C/C++), plus a track record of shipping and operating secure, high-reliability services.Deep expertise in modern isolation techniques – experience with container security, kernel-level hardening, and other isolation methods.Hands-on network security experience – implementing identity-based controls, policy enforcement, and secure large-scale telemetry pipelines.Clear, concise communication that bridges engineering, research, and leadership audiences; comfort influencing roadmaps and driving consensus.Bias for action & ownership – you thrive in ambiguity, move quickly without sacrificing rigor, and elevate the security bar company-wide from day one.Cloud security depth on at least one major provider (Azure, AWS, GCP), including identity federation, workload IAM, and infrastructure-as-code best practices.Familiarity with AI/ML security challenges – experience addressing risks associated with advanced AI systems (nice-to-have but valuable).About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
E2B.jpg

Platform Engineer

E2B
-
US.svg
United States
Full-time
Remote
false
About the roleYou will be building the next cloud platform for running AI software — a cloud where AI apps are building other software apps.Your job will be:Building the E2B backendSolving general infrastructure problemsWorking with Kubernetes or NomadCollaborating with our Systems Engineers and Distributed Systems EngineersWe’re looking for a skilled developer with a diabolical obsession with making things run fast and efficiently and running A LOT of them at the same time.If this sounds exciting to you, we want to hear from you!What we’re looking for5+ years building infrastructure, especially distributed systemsExperience building infrastructure on scaleExcited to work in person from San Francisco on a devtool productDetail oriented with a great tasteExcited to work closely with our usersNot being afraid to take ownership of the part of our productIf you join E2B, you’ll get a lot of freedom. We expect you to be proactive and take ownership. You’ll be taking projects from 0 to 1 with the support of the rest of the team.What it’s like to work at E2BWork at a fast growing startup at an early team (we grow 20%-100% MoM)We ship fast but don’t release junkWe like hard work and problems. Challenges mean potential value.We have a long runway and can offer a competitive salary for the startup at our stageWork closely with other AI companies on the edge of what’s possible todayDogfooding our own product on projects like FragmentsNo meetings, highly writing and transparent cultureYou’re the decision maker in day-to-day, important product and roadmap decisions are on Vasek (CEO) and Tomas (CTO)Spend 10-20% of the roadmap on highly experimental projectsHiring processWe aim to have the whole process done in 7-10 days. We understand that it’s important to move fast and try to follow up in 24 hours after each stage.30-minute call with Vasek (CEO). We’ll go over your past work experience and what you’re looking for to make sure this would be a good fit for both of us.First technical interview with Tomas (CTO). About 1 hour long call. You’ll get asked thorough technical questions. Often these are questions about problems that we ourselves experienced while building E2B.Second technical interview. Another 1-2 hours long call. Expect live coding on this call. We’ll ask you to solve specific problems (don’t worry, it’s not a leet code) that are related to your role.One day of in-person hacking at our office (paid). We invite you to our office to work on the product with us. This is a great opportunity for all of us to try how it’s working together and for you to meet the team.Last call with Vasek. Last 30-minute call with the CEO to talk more about the role and answer any of your questions.Decision and potential offer.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Anthropic.jpg

Capacity Engineer, Compute

Anthropic
USD
320000
-
405000
US.svg
United States
Full-time
Remote
false
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the role The mission of the Compute team is to provide input into our company-wide cloud infrastructure strategy and efficiency deliverables, advise on key decisions affecting budget, and provide capacity planning and performance expertise to various anthropic-wide stakeholders in finance and engineering leadership. As an early member of this team, you would be required to work with engineering teams to ensure optimal operation and growth of our infrastructure from both a cost and technology perspective and collaborate cross-functionally with finance and data science partners to analyze and forecast growth. Responsibilities: Develop self-service tools and processes to enable anthropic engineers to understand their capacity, efficiency, and costs Design, develop, and lead necessary automation to help capacity plan for both near and long term outcomes Institute and design governance workflows to help manage additional capacity request approvals Investigate new capacity requests to ensure the best use of resources and that instances are sized appropriately Build and drive cost to serve analytics programs to guide engineering, finance, and leadership on the total cost (TCO) and infrastructure impact of our scaling factors. Inform pricing conversations through customer profile sensitive gross margin analysis. Tech lead with outside vendors to manage anthropic capacity needs Proactively identify infrastructure inefficiency opportunities, document proposal and be a key contributor in driving a positive outcome Serve as an advisor to engineering and finance functions and executive team for one of the largest areas of expenditure Work closely with TPMs on special efficiency projects and help deliver committed outcomes You may be a good fit if you:  5+ years experience in capacity engineering 5+ years experience in a technical role Intermediate knowledge of various public cloud providers Experience with data modeling for public cloud Experience with budgeting, capacity planning experience, and cloud efficiency optimization workflows Experience in scripting and building automation tools  Self-disciplined and thrives in fast paced environments Excellent communication skills Familiarity with cloud compute, storage, network, and services Attention to detail and a passion for correctness Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position is:Annual Salary:$320,000—$405,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
No job found
There is no job in this category at the moment. Please try again later