Top AI MLOps / DevOps Engineer Jobs Openings in 2025
Looking for opportunities in AI MLOps / DevOps Engineer? This curated list features the latest AI MLOps / DevOps Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.
Edit filters
Latest AI Jobs
Showing 61 – 79 of 79 jobs
Tag
HW/SW Co-Design Engineer
OpenAI
5000+
USD
0
350000
-
530000
United States
Full-time
Remote
false
About the TeamOur mission at OpenAI is to discover and enact the path to safe, beneficial AGI. To do this, we believe that many technical breakthroughs are needed in generative modeling, reinforcement learning, large scale optimization, active learning, among other topics.The Scaling team builds robust and scalable software to support our research efforts. It also offers core development services for mission critical goals and applications. We’re forming a new team to work with our partners on hardware optimization and co-design, and are looking for founding engineers. This team will be responsible for working with partners to optimize their hardware for our workloads, identifying promising new deep learning accelerators, and bringing those hardware platforms to production.About the RoleAs an Engineer on our Hardware Team, you will co-design future hardware from different vendors for programmability and performance. You will work with our kernel, compiler and machine learning engineers to understand their needs related to ML techniques, algorithms, numerical approximations, programming expressivity, and compiler optimizations. You will evangelize these constraints with various vendors to develop future hardware architectures amenable for efficient training and inference. If you are excited about maximizing HBM bandwidth, optimizing for low arithmetic intensity, expressive SIMD ISA, low-precision formats, optimizing for memory hierarchies, simulating workloads at various resolutions of the hardware and evangelizing these ideas with hardware engineers, this is the perfect opportunity!This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.In this role, you will:Co-design future hardware for programmability and performance with our hardware vendorsAssist hardware vendors in developing optimal kernels and add support for it in our compilerDevelop performance estimates for critical kernels for different hardware configurations Work with machine learning engineers, kernel engineers and compiler developers to understand their vision and needs from high performance acceleratorsManage communication and coordination with internal and external partnersInfluence the roadmap of hardware partners to optimize them for OpenAI’s workloads.Evaluate potential partners’ accelerators and platforms.Build simulations and performance models to progressively improve decision making fidelity.As the scope of the role and team grows, understand and influence roadmaps for hardware partners for our datacenter networks, racks, and buildings.You might thrive in this role if you:4+ years of industry experience, including experience harnessing compute at scale or building semiconductors.Strong experience in software/hardware co-designDeep understanding of GPU and/or other AI acceleratorsExperience with CUDA or a related accelerator programming languageExperience driving Machine Learning accuracy with low precision formatsExperience aligning future hardware with a well established HPC infrastructureAre familiar with the fundamentals of deep learning computing and chip microarchitecture.Able to actively collaborate with ML engineers, kernel writers and compiler developersThese attributes are nice to have:PhD in Computer Science and Engineering with a specialization in Computer Architecture, Parallel Computing. Compilers or other SystemsStrong coding skills in C/C++ and PythonStrong understanding of LLMs and challenges related to their training and inference Benefits and PerksMedical, dental, and vision insurance for you and your familyMental health and wellness support401(k) plan with 4% matchingUnlimited time off and 18+ company holidays per yearPaid parental leave (20 weeks) and family-planning supportAnnual learning & development stipend ($1,500 per year)To comply with U.S. export control laws and regulations, candidates for this role may need to meet certain legal status requirements as provided in those laws and regulations.About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 17, 2025
Staff Security Engineer, Container & VM Security
Anthropic
1001-5000
USD
320000
-
485000
United States
Full-time
Remote
false
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the Role At Anthropic, we're building frontier AI systems that require unprecedented levels of security and isolation. We're seeking a Staff Security Engineer specializing in container and VM security to help us design and implement robust sandboxing solutions that protect our AI infrastructure from untrusted workloads while maintaining performance and usability. In this role, you'll be at the forefront of securing our compute infrastructure, working with cutting-edge virtualization and containerization technologies. You'll architect secure-by-default systems that leverage Linux kernel isolation mechanisms, design threat models for complex distributed systems, and build defenses that can withstand sophisticated attacks. Your work will be critical in ensuring that our systems remain secure as we scale to support increasingly powerful models and diverse use cases. Responsibilities Design and implement secure sandboxing architectures using virtualization (KVM, Xen, Firecracker, Cloud Hypervisor) and container technologies (OCI containers, gVisor, Kata Containers) to isolate untrusted workloads Develop deep expertise in Linux kernel isolation mechanisms including namespaces, cgroups, seccomp, capabilities, and LSMs (SELinux/AppArmor) to build defense-in-depth strategies Create comprehensive threat models for our sandboxing infrastructure, identifying attack vectors and designing mitigations for container escapes, VM breakouts, and side-channel attacks Build and maintain security policies and configurations for multi-tenant cloud environments, ensuring strong isolation between different workloads Partner with infrastructure teams to implement secure-by-default patterns for deploying and managing containerized and virtualized workloads at scale Develop monitoring and detection capabilities to identify potential security breaches or anomalous behavior within our sandboxed environments Lead security reviews of new sandboxing technologies and provide guidance on their adoption within our infrastructure Mentor other engineers on secure coding practices and sandboxing best practices Contribute to our security incident response efforts, particularly for isolation-related security events Collaborate with research teams to understand the unique security requirements of AI workloads and develop appropriate isolation strategies You may be a good fit if you: Have 8+ years of experience in systems security, with deep expertise in virtualization and containerization security Possess expert-level knowledge of Linux kernel isolation mechanisms and have experience implementing them in production environments Have a proven track record of securing untrusted workloads in cloud settings, including both public cloud and private infrastructure Are proficient in multiple programming languages (e.g., Go, Rust, C/C++, Python) with experience in systems programming Have hands-on experience with container runtimes (Docker, containerd, CRI-O) and orchestration platforms (Kubernetes) Understand hypervisor internals and have experience with VM security (QEMU/KVM, Xen, VMware, Hyper-V) Can design and articulate complex threat models for distributed systems Have experience with cloud provider security models and their isolation guarantees Thrive in ambiguous environments and can balance security requirements with performance and usability needs Communicate effectively with both technical and non-technical stakeholders about security risks and mitigations Strong candidates may also have: Experience with microVM technologies (Firecracker, Cloud Hypervisor) and their security properties Knowledge of hardware-based security features (Intel TDX, AMD SEV, SGX) and their application to confidential computing Contributions to open-source security projects related to containerization or virtualization Experience with eBPF for security monitoring and enforcement Understanding of AI/ML workload characteristics and their unique security requirements Track record of identifying and responsibly disclosing security vulnerabilities in virtualization or container platforms Experience building security tooling and automation for large-scale infrastructure Background in formal verification or security research Representative projects: Designing a multi-layered sandboxing architecture that combines VMs and containers to safely execute untrusted AI-generated code Implementing runtime security policies using seccomp, AppArmor, and SELinux to minimize container attack surface Building a threat detection system that identifies potential container escape attempts using eBPF and kernel audit logs Creating secure defaults and guardrails for Kubernetes deployments to prevent privilege escalation and lateral movement Developing automated security testing for our sandboxing infrastructure to continuously validate isolation properties Architecting network isolation strategies using CNI plugins and cloud-native firewalling to segment workloads Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position is:Annual Salary:$320,000—$485,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 17, 2025
Security Engineer
Lovable
201-500
-
Sweden
Full-time
Remote
false
TL;DR – We’re looking for a Security Engineer to raise the bar for how fast-moving AI teams stay secure. You’ll work with infra, ML, and product teams to protect our stack without slowing us down. This is a high-leverage role for someone who wants to build security into the core of everything we ship.Why Lovable?Lovable lets anyone and everyone build software with plain English. From solopreneurs to Fortune 100 teams, millions of people use Lovable to transform raw ideas into real products - fast. We are at the forefront of a foundational shift in software creation, which means you have an unprecedented opportunity to change the way the digital world works. Over 2 million people in 200+ countries already use Lovable to launch businesses, automate work, and bring their ideas to life. And we’re just getting started.We’re a small, talent-dense team building a generation-defining company from Stockholm. We value extreme ownership, high velocity and low-ego collaboration. We seek out people who care deeply, ship fast, and are eager to make a dent in the world.What we’re looking for5+ years of experience securing modern cloud-native environments, ideally at product-focused tech companies or early-stage startups or top-tier AI labs.Deep knowledge of securing engineering infrastructure: CI/CD pipelines, secrets, service-to-service auth, containerized environments, and public cloud.Strong systems mindset - you're comfortable diving into infra code, writing tools, and contributing directly to engineering workflows.Track record and intuition to design pragmatic security controls that don’t slow teams down.Bonus: You’ve shipped internal security tools or open-sourced relevant infra.What you’ll doIn one sentence: Help us build and ship AI products at extremely high velocity without compromising on security.Identify high-leverage surfaces across Lovable’s engineering stack, from CI/CD and auth to product endpoints.Design and implement tooling that helps developers go the secure route by default.Own detection, triage, and response for vulnerabilities and incidents - internal and external.Collaborate with infra, ML, platform and product engineers to embed security into everything we build.Track emerging risks in AI infra, LLM pipelines, and third-party integrations, and close the gaps before they matter.Our tech stackWe're building with tools that both humans and AI love:Frontend: React for lightning-fast interfacesBackend: Golang and Rust for serious performanceCloud: Cloudflare, Google Cloud, AWS, TerraformDevOps & Tooling: CI/CD pipelines, observability, infra-as-codeAnd always on the lookout for what's next!How we hireFill in a short form then jump on an intro call with the team.Complete the take-home exerciseShow us how you approach problems during two technical interviewsJoin us for trial work lasting 2 days preferably on-site. We'll see how you tick and you get to meet the team and explore whether joining Lovable feels right for you.About your applicationPlease submit your application in English - it’s our company language so you’ll be speaking lots of it if you joinWe treat all candidates equally - if you’re interested please apply through our careers portal
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 17, 2025
Platform Engineer, Model Shaping
Together AI
201-500
USD
0
200000
-
290000
United States
Full-time
Remote
false
About Model Shaping The Model Shaping team at Together AI works on products and research for tailoring open foundation models to downstream applications. We build services that allow machine learning developers to choose the best models for their tasks and further improve these models using domain-specific data. In addition to that, we develop new methods for more efficient model training and evaluation, drawing inspiration from a broad spectrum of ideas across machine learning, natural language processing, and ML systems. About the Role As a Platform Engineer at Model Shaping, you will work on the foundational layers of Together’s platform for model customization and evaluation. You will design the infrastructure and backend services that will allow us to sustainably and reliably scale the systems powering production workflows launched by our users, as well as internal research experiments. You will operate in a cross-functional environment, collaborating with other engineers and researchers in the team to improve the infrastructure based on the needs of projects they work on. You will also interact with other engineering teams at Together (such as Commerce, Data Engineering, and Cloud Infrastructure) to integrate the services developed by Model Shaping with systems developed by those teams. Responsibilities Design and build Together’s systems and infrastructure for model customization, including user-facing features and internal improvements Contribute to reliability improvements for the platform, participating in an on-call rotation and improving processes for incident response Create and improve internal tooling for deployment, continuous integration, and observability Build a job orchestration platform spanning multiple data centers, supporting a highly heterogeneous hardware landscape Partner with teams developing internal services, co-designing these services and incorporating them in systems built by Model Shaping Requirements 3+ years of experience in building infrastructure or backend components of production services Comfortable with the fundamentals of Linux environments and modern container/orchestration stacks (e.g., Docker and Kubernetes) Strong software engineering background in Python or Go Experienced with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD) Skilled with analyzing non-trivial issues of complex software systems and documenting your findings Have cloud environment (e.g., AWS/GCP/Azure) administration experience, preferably with a hybrid bare-metal/cloud environment Strong communication skills, willing to document systems and processes and collaborate with peers of varying technical expertise Stand-out experience Developing large-scale production systems with high reliability requirements Pipeline orchestration frameworks (e.g., Kubeflow, Argo Workflows, Flyte) Managing GPU workloads on HPC clusters, ideally with hands-on experience in operating NVIDIA’s networking stack (e.g., NCCL, Mellanox firmware, GPUDirect RDMA) Deployment of services for AI training or inference Maintaining or contributing to open-source projects About Together AI Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancements such as FlashAttention, RedPajama, SWARM Parallelism, and SpecExec. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. Compensation We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is $200,000 - $290,000. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. Equal Opportunity Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our privacy policy at https://www.together.ai/privacy
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 16, 2025
Enterprise Security Engineer
OpenAI
5000+
USD
260000
-
325000
United States
Full-time
Remote
true
About the TeamWithin the OpenAI Security organization, our IT team works to ensure our team of researchers, engineers, and staff have the tools they need to work comfortably, securely, and with minimal interruptions. As an Enterprise Security Engineer, you will work in a highly technical and employee-focused environment.Our IT team is a small and nimble team, where you’ll have the opportunity to dive into a wide breadth of areas and build from the ground up. We’re well supported and well resourced, and have a mandate to deliver a world-class enterprise security program to our teams.About the RoleAs an Enterprise Security Engineer, you will be responsible for implementing and managing the security of OpenAI's internal information systems’ infrastructure and processes. You will work closely with our IT and Security teams to develop security capabilities, enforce security policies, and monitor internal systems for security threats.This role is open to remote employees, or relocation assistance is available to Seattle.In this role, you will:Develop and implement security measures to protect our company's information assets against unauthorized access, disclosure, or misuse.Monitor internal and external systems for security threats and respond to alerts.Contribute to and enforce our company's IT and Security policies and procedures.Work closely with our IT department to harden our infrastructure using best practices in AzureAD, GSuite, Github, and other SaaS tooling.Advise our employees on best practices for maintaining the security of their endpoints, and office AV and network infrastructure.Devise novel sharing controls and associated monitoring to protect company data, including intelligent groups management, Data Loss Prevention (DLP) and other security controls as appropriate.Employ forward-thinking models like “secure by default” and “zero trust” to create sustainably secure environments for knowledge workers and developers.Identify and remediate vulnerabilities in our internal systems, adhering to best practices for data security.Use our own AI-driven models to develop systems for improved security detection and response, data classification, and other security-related tasks.Educate employees on the importance of data security, and advise them on best practices for maintaining a secure environment.Contribute to OpenAI's endpoint and cloud security roadmaps by staying up to date with the latest security threats, and making recommendations for improving our security posture.You might thrive in this role if you have: Experience in protecting and managing macOS fleets.Experience deploying and managing endpoint security solutions (e.g. management frameworks, EDR tools).Experience with public cloud service providers (e.g. Amazon AWS, Microsoft Azure).Experience with identity and access management frameworks and protocols, including SAML, OAUTH, and SCIM.Experience with e-mail security protocols (e.g. SPF, DKIM, DMARC) and controls.Intermediate or advanced proficiency with a scripting language (e.g. Python, Bash, or similar).Knowledge of modern adversary tactics, techniques, and procedures.Ability to empathize and collaborate with colleagues, independently manage and run projects, and prioritize efforts for risk reduction..About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Security Engineer
Software Engineering
Apply
July 16, 2025
Enterprise Security Engineer
OpenAI
5000+
USD
0
260000
-
325000
United States
Full-time
Remote
true
About the TeamWithin the OpenAI Security organization, our IT team works to ensure our team of researchers, engineers, and staff have the tools they need to work comfortably, securely, and with minimal interruptions. As an Enterprise Security Engineer, you will work in a highly technical and employee-focused environment.Our IT team is a small and nimble team, where you’ll have the opportunity to dive into a wide breadth of areas and build from the ground up. We’re well supported and well resourced, and have a mandate to deliver a world-class enterprise security program to our teams.About the RoleAs an Enterprise Security Engineer, you will be responsible for implementing and managing the security of OpenAI's internal information systems’ infrastructure and processes. You will work closely with our IT and Security teams to develop security capabilities, enforce security policies, and monitor internal systems for security threats.This role is open to remote employees, or relocation assistance is available to San Francisco.In this role, you will:Develop and implement security measures to protect our company's information assets against unauthorized access, disclosure, or misuse.Monitor internal and external systems for security threats and respond to alerts.Contribute to and enforce our company's IT and Security policies and procedures.Work closely with our IT department to harden our infrastructure using best practices in AzureAD, GSuite, Github, and other SaaS tooling.Advise our employees on best practices for maintaining the security of their endpoints, and office AV and network infrastructure.Devise novel sharing controls and associated monitoring to protect company data, including intelligent groups management, Data Loss Prevention (DLP) and other security controls as appropriate.Employ forward-thinking models like “secure by default” and “zero trust” to create sustainably secure environments for knowledge workers and developers.Identify and remediate vulnerabilities in our internal systems, adhering to best practices for data security.Use our own AI-driven models to develop systems for improved security detection and response, data classification, and other security-related tasks.Educate employees on the importance of data security, and advise them on best practices for maintaining a secure environment.Contribute to OpenAI's endpoint and cloud security roadmaps by staying up to date with the latest security threats, and making recommendations for improving our security posture.You might thrive in this role if you have: Experience in protecting and managing macOS fleets.Experience deploying and managing endpoint security solutions (e.g. management frameworks, EDR tools).Experience with public cloud service providers (e.g. Amazon AWS, Microsoft Azure).Experience with identity and access management frameworks and protocols, including SAML, OAUTH, and SCIM.Experience with e-mail security protocols (e.g. SPF, DKIM, DMARC) and controls.Intermediate or advanced proficiency with a scripting language (e.g. Python, Bash, or similar).Knowledge of modern adversary tactics, techniques, and procedures.Ability to empathize and collaborate with colleagues, independently manage and run projects, and prioritize efforts for risk reduction..About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
July 16, 2025
Site Reliability Engineer
X AI
5000+
-
United States
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. About the Role
As a Data Center Site Reliability Engineer (SRE) at xAI, you will play a pivotal role in ensuring the reliability, scalability, and performance of our state-of-the-art data center infrastructure, including the Colossus supercluster in Memphis—the world's largest AI training cluster with over 100,000 liquid-cooled Nvidia GPUs and plans for expansion to 1 million. This infrastructure powers advanced AI workloads, massive-scale model training, and products like Grok, enabling breakthroughs in understanding the universe. You will collaborate with cross-functional teams to automate operations, enhance observability, and maintain high availability for large-scale distributed systems. This is a hands-on technical position in a dynamic environment, offering the opportunity to tackle complex challenges at the intersection of AI, data center operations, and software reliability. Key Responsibilities Maintain and improve the reliability and uptime of xAI’s on-premises and cloud-based data center environments, including high-density GPU clusters for AI training. Design, implement, and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, PagerDuty). Develop and maintain infrastructure-as-code (Pulumi, Terraform) and continuous deployment pipelines (Buildkite, ArgoCD). Participate in on-call rotations, respond to incidents, perform root cause analysis, and drive post-mortem processes. Analyze system performance, forecast capacity needs, and optimize resource utilization for massive AI/ML workloads. Collaborate with hardware, networking, and software engineering teams to design and implement resilient, scalable solutions, such as RDMA fabrics and liquid-cooling systems. Create and maintain documentation and standard operating procedures. Contribute to the efficiency of AI training pipelines by identifying and mitigating bottlenecks in compute, storage, and networking at unprecedented scales. Required Qualifications Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience). 5+ years in site reliability engineering, data center operations, or large-scale infrastructure management. Expert-level knowledge of Kubernetes (on-prem and cloud), infrastructure-as-code tools (Pulumi, Terraform), and CI/CD systems (Buildkite, ArgoCD). Proficiency in at least one systems programming language (Rust, C++, Go) and strong scripting/automation skills. Deep understanding of monitoring and observability technologies. Strong troubleshooting skills across hardware, networking, and distributed software systems. Proven experience with incident response, including on-call rotations, rapid incident resolution, root cause analysis, and implementation of preventative measures. Excellent communication and documentation skills, with the ability to share knowledge concisely and accurately. Preferred Qualifications Experience supporting AI/ML workloads or high-density compute environments, including large-scale GPU clusters and HPC systems. Familiarity with data center electrical, cooling, and network systems, such as liquid-cooling and high-bandwidth interconnects. Certifications in SRE, Kubernetes, or data center operations. Experience with both on-premises and cloud infrastructure at scale. xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Apply
July 15, 2025
Connectivity Systems Engineer
X AI
5000+
-
United States
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.Job Summary The Connectivity Systems Engineer will be responsible for designing, specifying, and implementing structured cabling and fiber optic systems for xAI’s hyperscale data center build-outs. This role requires expertise in high-density cabling solutions, optical network design, and collaboration with cross-functional teams to deliver scalable, high-performance infrastructure that supports xAI’s advanced AI and GPU cluster workloads. You will play a critical role in ensuring the reliability and efficiency of our data center connectivity to drive our mission forward. Responsibilities Design and develop structured cabling systems, including copper (Cat6/6A) and fiber optic solutions (OM4, OM5, single-mode), for xAI’s data center environments, ensuring compliance with industry standards such as TIA-942 and BICSI. Create detailed optical system architectures, including high-density fiber optic networks (MPO/MTP, LC, DWDM), to support high-bandwidth, low-latency connectivity for AI and GPU clusters. Produce comprehensive design documentation, including schematics, rack elevations, pathway plans, and bills of materials (BOMs), using tools like AutoCAD, Revit, or Visio. Collaborate with network, electrical, and mechanical engineering teams to integrate cabling and optical designs into overall data center layouts, optimizing for power, cooling, and space constraints. Conduct capacity planning to ensure cabling infrastructure supports current and future AI workload demands, including 400G/800G network deployments. Oversee installation, testing, and validation of cabling and optical systems, using tools like OTDR, power meters, and VFL to ensure performance and reliability. Provide technical expertise during procurement, evaluating vendors and selecting cabling components, optical transceivers, and connectivity hardware. Develop and maintain cabling standards, labeling schemes, and documentation processes to ensure consistency across multi-site data center deployments. Support rapid design iterations to meet aggressive project timelines, addressing challenges such as high-density rack configurations and advanced cooling systems (e.g., liquid cooling). Troubleshoot and resolve cabling-related issues during commissioning and operations to ensure minimal downtime for mission-critical systems. Contribute to process improvements and automation initiatives (e.g., Python scripting for design validation) to enhance cabling design workflows. Qualifications Basic Qualifications Bachelor’s degree in Electrical Engineering, Telecommunications Engineering, or a related field; or equivalent professional experience. 3+ years of experience in structured cabling and fiber optic system design, preferably in data center or hyperscale environments. Proficiency in design tools such as AutoCAD, Revit, or Visio for creating detailed schematics and layouts. Strong knowledge of fiber optic technologies, including single-mode, multi-mode, DWDM, and high-density connectivity solutions (MPO/MTP, LC). Familiarity with data center standards (e.g., TIA-942, Uptime Institute) and networking protocols (e.g., Ethernet, InfiniBand). Preferred Qualifications 5+ years of experience in data center cabling design and deployment for hyperscale or cloud provider environments. Experience with AI/HPC infrastructure, GPU cluster connectivity, or liquid-cooled data center designs. Proficiency in scripting or automation tools (e.g., Python, Bash) for design or testing workflows. Certifications such as RCDD (Registered Communications Distribution Designer), DCDC (Data Center Design Consultant), or CFOT (Certified Fiber Optic Technician). Knowledge of network hardware (e.g., NVIDIA, Cisco, Arista) and optical transceivers (QSFP-DD, OSFP). xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 15, 2025
Site Reliability Engineer - Storage
X AI
5000+
USD
180000
-
440000
United States
Full-time
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the role As a Site Reliability Storage Engineer, you will play a pivotal role in designing, building, and operating exascale storage systems to manage our cutting-edge AI research data with unparalleled scalability and reliability across multiple regions. This role's core responsibility is to make sure our heterogenous storage systems in on-prem + cloud are reliable and performant. We’re seeking engineers with expertise in exascale data management systems or distributed filesystems to join our mission-driven team. What you’ll do Develop and optimize software to manage exascale data, enabling efficient and reliable access for xAI researchers working on advanced AI models. Enhance the reliability, performance, and cost-effectiveness of xAI’s storage infrastructure to support large-scale AI research workloads. Collaborate closely with researchers to understand their data use cases and tailor storage solutions to meet their needs. Implement robust security measures to safeguard critical datasets, ensuring data integrity and confidentiality. Ideal Experience You’d be an exceptional candidate if you possess some (or all) of the following: Writing scalable, high-performance code in Rust or Go for storage-related applications or tooling. Managing storage infrastructure with IaC tools like Pulumi, Terraform, or Ansible. Past experience working with storage vendors facilitating partnership alignment, and integrating their tooling within xAI’s Infrastructure. Familiarity with Kubernetes storage primitives (e.g., Persistent Volumes, CSI drivers) and integrating storage with containerized workloads. Bonus: Experience with AI/ML data pipelines, including handling large datasets for training and inference. Tech Stack Kubernetes Pulumi Rust and Go Interview Process After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 45 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews: Coding assessment in Python, Golang, or Rust Systems hands-on: Demonstrate practical skills in a live problem-solving session. Coding assessment or system design discussion based on the candidate's background. Project deep-dive: Present your past exceptional work to a small audience. Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet. We do not condone usage of AI in interviews and have tools to detect AI usage. Benefits Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. Annual Salary Range $180,000 - $440,000 USDxAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 15, 2025
Staff Systems Engineer, Agents Infrastructure
Anthropic
1001-5000
USD
320000
-
485000
United States
Full-time
Remote
false
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: Anthropic is seeking a Linux OS and System Programming Subject Matter Expert to join our Infrastructure team. In this role, you'll lead efforts to accelerate and optimize our virtualization and VM workloads that power our AI infrastructure. Your deep expertise in low-level system programming, kernel optimization, and virtualization technologies will be crucial in ensuring Anthropic is able to scale our compute infrastructure efficiently and reliably for training and serving frontier AI models. Responsibilities: Lead optimization initiatives for our virtualization stack, improving performance, reliability, and efficiency of our VM environments Design and implement custom kernel modules, drivers, and system-level components to enhance our compute infrastructure Troubleshoot complex performance bottlenecks in virtualized environments and develop solutions Collaborate with cloud engineering teams to optimize interactions between our workloads and underlying hardware Develop tooling for monitoring and improving virtualization performance Work with our ML engineers to understand their computational needs and optimize our systems accordingly Contribute to the design and implementation of our next-generation compute infrastructure Mentor other engineers on low-level systems programming and Linux kernel internals Partner closely with cloud providers to influence hardware and platform features for AI workloads You may be a good fit if you: Have 5+ years of experience with Linux kernel development, system programming, or related low-level software engineering Possess deep understanding of virtualization technologies (KVM, Xen, QEMU, etc.) and their performance characteristics Have experience optimizing system performance for compute-intensive workloads Are familiar with modern CPU architectures and memory systems Have strong C/C++ programming skills and experience with systems languages like Rust Understand the intricacies of Linux resource management, scheduling, and memory management Have experience profiling and debugging complex system-level performance issues Are comfortable diving into unfamiliar codebases and technical domains Are results-oriented, with a bias towards practical solutions and measurable impact Care about the societal impacts of AI and are passionate about building safe, reliable systems Strong candidates may also have experience with: GPU virtualization and acceleration technologies Cloud infrastructure at scale (AWS, GCP) Container technologies and their underlying implementation (Docker, containerd, runc, OCI) eBPF programming and kernel tracing tools OS-level security hardening and isolation techniques Developing custom scheduling algorithms for specialized workloads Performance optimization for ML/AI specific workloads Network stack optimization and high-performance networking Experience with TPUs, custom ASICs, or other ML accelerators Representative projects: Optimizing kernel parameters and VM configurations to reduce inference latency for large language models Implementing custom memory management schemes for large-scale distributed training Developing specialized I/O schedulers to prioritize ML workloads Creating lightweight virtualization solutions tailored for AI inference Building monitoring and instrumentation tools to identify system-level bottlenecks Enhancing communication between VMs for distributed training workloads Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position is:Annual Salary:$320,000—$485,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 15, 2025
Security Engineer, Agent Security
OpenAI
5000+
USD
0
325000
-
405000
United States
Full-time
Remote
false
About the Team
The team’s mission is to accelerate the secure evolution of agentic AI systems at OpenAI. To achieve this, the team designs, implements, and continuously refines security policies, frameworks, and controls that defend OpenAI’s most critical assets—including the user and customer data embedded within them—against the unique risks introduced by agentic AI.About the Role
As a Security Engineer on the Agent Security Team, you will be at the forefront of securing OpenAI’s cutting-edge agentic AI systems. Your role will involve designing and implementing robust security frameworks, policies, and controls to safeguard OpenAI’s critical assets and ensure the safe deployment of agentic systems. You will develop comprehensive threat models, partner tightly with our Agent Infrastructure group to fortify the platforms that power OpenAI’s most advanced agentic systems, and lead efforts to enhance safety monitoring pipelines at scale.We are looking for a versatile engineer who thrives in ambiguity and can make meaningful contributions from day one. You should be prepared to ship solutions quickly while maintaining a high standard of quality and security.We’re looking for people who can drive innovative solutions that will set the industry standard for agent security. You will need to bring your expertise in securing complex systems and designing robust isolation strategies for emerging AI technologies, all while being mindful of usability. You will communicate effectively across various teams and functions, ensuring your solutions are scalable and robust while working collaboratively in an innovative environment. In this fast-paced setting, you will have the opportunity to solve complex security challenges, influence OpenAI’s security strategy, and play a pivotal role in advancing the safe and responsible deployment of agentic AI systems.This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.You’ll be responsible for:Architecting security controls for agentic AI – design, implement, and iterate on identity, network, and runtime-level defenses (e.g., sandboxing, policy enforcement) that integrate directly with the Agent Infrastructure stack.Building production-grade security tooling – ship code that hardens safety monitoring pipelines across agent executions at scale.Collaborating cross-functionally – work daily with Agent Infrastructure, product, research, safety, and security teams to balance security, performance, and usability.Influencing strategy & standards – shape the long-term Agent Security roadmap, publish best practices internally and externally, and help define industry standards for securing autonomous AI.We’re looking for someone with:Strong software-engineering skills in Python or at least one systems language (Go, Rust, C/C++), plus a track record of shipping and operating secure, high-reliability services.Deep expertise in modern isolation techniques – experience with container security, kernel-level hardening, and other isolation methods.Hands-on network security experience – implementing identity-based controls, policy enforcement, and secure large-scale telemetry pipelines.Clear, concise communication that bridges engineering, research, and leadership audiences; comfort influencing roadmaps and driving consensus.Bias for action & ownership – you thrive in ambiguity, move quickly without sacrificing rigor, and elevate the security bar company-wide from day one.Cloud security depth on at least one major provider (Azure, AWS, GCP), including identity federation, workload IAM, and infrastructure-as-code best practices.Familiarity with AI/ML security challenges – experience addressing risks associated with advanced AI systems (nice-to-have but valuable).About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 15, 2025
AI Infra Engineer
Perplexity
1001-5000
USD
190000
-
250000
United States
Full-time
Remote
false
Perplexity is an AI-powered answer engine founded in December 2022 and growing rapidly as one of the world’s leading AI platforms. Perplexity has raised over $1B in venture investment from some of the world’s most visionary and successful leaders, including Elad Gil, Daniel Gross, Jeff Bezos, Accel, IVP, NEA, NVIDIA, Samsung, and many more. Our objective is to build accurate, trustworthy AI that powers decision-making for people and assistive AI wherever decisions are being made. Throughout human history, change and innovation have always been driven by curious people. Today, curious people use Perplexity to answer more than 780 million queries every month–a number that’s growing rapidly for one simple reason: everyone can be curious. We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will work in a hybrid SRE/Dev Engineering capacity, partnering closely with our Infrastructure and Research teams to build, deploy, and optimize our large-scale AI training and inference clusters. Responsibilities Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training workloads Manage and optimize Slurm-based HPC environments for distributed training of large language models Develop robust APIs and orchestration systems for both training pipelines and inference services Implement resource scheduling and job management systems across heterogeneous compute environments Benchmark system performance, diagnose bottlenecks, and implement improvements across both training and inference infrastructure Build monitoring, alerting, and observability solutions tailored to ML workloads running on Kubernetes and Slurm Respond swiftly to system outages and collaborate across teams to maintain high uptime for critical training runs and inference services Optimize cluster utilization and implement autoscaling strategies for dynamic workload demands Qualifications Strong expertise in Kubernetes administration, including custom resource definitions, operators, and cluster management Hands-on experience with Slurm workload management, including job scheduling, resource allocation, and cluster optimization Experience with deploying and managing distributed training systems at scale Deep understanding of container orchestration and distributed systems architecture High level familiarity with LLM architecture and training processes (Multi-Head Attention, Multi/Grouped-Query, distributed training strategies) Experience managing GPU clusters and optimizing compute resource utilization Required Skills Expert-level Kubernetes administration and YAML configuration management Proficiency with Slurm job scheduling, resource management, and cluster configuration Python and C++ programming with focus on systems and infrastructure automation Hands-on experience with ML frameworks such as PyTorch in distributed training contexts Strong understanding of networking, storage, and compute resource management for ML workloads Experience developing APIs and managing distributed systems for both batch and real-time workloads Solid debugging and monitoring skills with expertise in observability tools for containerized environments Preferred Skills Experience with Kubernetes operators and custom controllers for ML workloads Advanced Slurm administration including multi-cluster federation and advanced scheduling policies Familiarity with GPU cluster management and CUDA optimization Experience with other ML frameworks like TensorFlow or distributed training libraries Background in HPC environments, parallel computing, and high-performance networking Knowledge of infrastructure as code (Terraform, Ansible) and GitOps practices Experience with container registries, image optimization, and multi-stage builds for ML workloads Required Experience Demonstrated experience managing large-scale Kubernetes deployments in production environments Proven track record with Slurm cluster administration and HPC workload management Previous roles in SRE, DevOps, or Platform Engineering with focus on ML infrastructure Experience supporting both long-running training jobs and high-availability inference services Ideally, 3-5 years of relevant experience in ML systems deployment with specific focus on cluster orchestration and resource management The cash compensation range for this role is $190,000 - $250,000. Final offer amounts are determined by multiple factors, including, experience and expertise, and may vary from the amounts listed above. Equity: In addition to the base salary, equity may be part of the total compensation package.
Benefits: Comprehensive health, dental, and vision insurance for you and your dependents. Includes a 401(k) plan.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 15, 2025
Senior Infrastructure Engineer
Evenup
501-1000
-
Canada
Full-time
Remote
false
EvenUp is one of the fastest-growing generative AI startups in history, on a mission to level the playing field for personal injury victims, which range from motor vehicle accidents to child abuse cases. Our products empower law firms to secure faster settlements, higher payouts, and better outcomes for those who need it most.EvenUp's Infrastructure (Infra) engineering team is seeking an Engineer to lead the development & enhancement of our tooling and workflows. In this role, you will play a critical part in delivering value to our customers and ensuring that personal injury victims receive the rightful compensation they deserve. This is an exciting opportunity for someone passionate about understanding user needs and addressing them directly with captivating user interfaces for AI-generated document creation. As part of our team, you'll drive future innovations that make a significant impact across the business.
What you'll do:75% doing system design and contributing infrastructure, starting with shipping solution within 2 weeks!25% collaborating with stakeholders and mentoring, lunch and learns, and moreLeverage a self-starter mindset by taking a product concept and building the feature end to end (whether it’s a component of the system or a significant piece of functionality).Collaborate with the team to scale the tech stack based on our rapidly growing user base!What we look for:5+ years of infrastructure engineering/DevOps/SRE experience working with cloud-native environmentsInterest in making the world a fairer place (we don’t get paid unless we’re helping injured victims and/or their attorneys)Familiarity with relevant technology stacks a plus (ie. AWS/GCP, Kubernetes, Docker, Terraform, Python, NodeJS, CI/CD, Cloud Security, Monitoring Logging and Altering, MLOPS, QA) (minimum 5 required)Understand the value of having a high-quality code-based infrastructure that is simple, understandable, and reusable.Enjoy navigating technical challenges and delivering solutions that track your estimatesCan communicate technical ideas or issues in easy-to-understand and actionable termsLearn quickly and are seeking opportunities to work cross-functionally (including data engineering, DevOps…) and with a diverse group of peopleEnjoy working in cross-functional teams to understand their requirements and develop solutions that meet their needsNice to have:Familiarity with monitoring tools and framework such as Prometheus, Grafana, or TensorBoardUnderstanding and practical experience with data tooling, BI tools, and systems such as wandb, DBT, BigQuery, and ElasticsearchNotice to Candidates:EvenUp has been made aware of fraudulent job postings and unaffiliated third parties posing as our recruiting team – please know that we have no affiliation or connection to these situations. We only post open roles on our career page (https://jobs.ashbyhq.com/evenup) or reputable job boards like our official LinkedIn or Indeed pages, and all official EvenUp recruitment emails will come from the domains @evenuplaw.com, @evenup.ai, @ext-evenuplaw.com or no-reply@ashbyhq.com email address. If you receive communication from someone you believe is impersonating EvenUp, please report it to us by emailing talent-ops-team@evenuplaw.com. Examples of fraudulent email domains include “careers-evenuplaw.com” and “careers-evenuplaws.com”. Benefits & Perks:Our goal is to empower every team member to contribute to our mission of fostering a more just world, regardless of their role, location, or level of experience. To that end, here is a preview of what we offer:Choice of medical, dental, and vision insurance plans for you and your familyFlexible paid time off10 US observed holidays, and Canadian statutory holidays by provinceA home office stipend401(k) for US-based employeesPaid parental leaveSabbatical programA meet-up program to get together in person with colleagues in your areaOffices in San Francisco and TorontoPlease note the above benefits & perks are for full-time employees About EvenUp:EvenUp is on a mission to level the playing field in personal injury cases. EvenUp applies machine learning and its AI model known as Piai™ to reduce manual effort and maximize case outcomes across the personal injury value chain. Combining in-house human legal expertise with proprietary AI and software to analyze records. The Claims Intelligence Platform™ provides rich business insights, AI workflow automation, and best-in-class document creation for injury law firms. EvenUp is the trusted partner of personal injury law firms. Backed by top VCs, including Bessemer Venture Partners, Bain Capital Ventures (BCV), SignalFire, NFX, DCM, and more, EvenUp’s customers range from top trial attorneys to America’s largest personal injury firms. EvenUp was founded in late 2019 and is headquartered in San Francisco. Learn more at www.evenuplaw.com.EvenUp is an equal opportunity employer. We are committed to diversity and inclusion in our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
July 14, 2025
Senior Platform Engineer
Lovable
201-500
-
Sweden
Full-time
Remote
false
TL;DR - We’re looking for an exceptional platform engineer to design and scale the infrastructure behind the future of AI software engineering. You’ll own key parts of our backend, devops, and platform tooling—ensuring speed, reliability, and scalability across our stack.Why Lovable?Lovable lets anyone and everyone build software with plain English. From solopreneurs to Fortune 100 teams, millions of people use Lovable to transform raw ideas into real products - fast. We are at the forefront of a foundational shift in software creation, which means you have an unprecedented opportunity to change the way the digital world works. Over 2 million people in 200+ countries already use Lovable to launch businesses, automate work, and bring their ideas to life. And we’re just getting started.We’re a small, talent-dense team building a generation-defining company from Stockholm. We value extreme ownership, high velocity and low-ego collaboration. We seek out people who care deeply, ship fast, and are eager to make a dent in the world.What we’re looking for5–7+ years building production-grade infrastructure as a Platform/DevOps/Site Reliability Engineer.Proven startup experience – you've scaled platforms at global tech startups and scale-ups.Expertise in distributed systems, CI/CD pipelines, internal tooling, observability, and cloud infra.Hands-on with Docker, Kubernetes, and modern infra practices.Strong coding chops and a proven track record boosting dev velocity and system reliability.A proactive problem-solver who thrives in ambiguity and can ship high-leverage systems fast.Balances security, stability, and speed – knowing when to optimize versus when to move quickly.Based in Stockholm or ready to relocate – this is a 5-day on-site opportunity.What you’ll doIn one sentence: Own and scale the platform that makes AI engineering work for everyoneBuild and maintain the systems that power our AI product—from backend performance to cloud deploymentsImprove developer experience: streamline our CI/CD pipelines, internal tools, and observability stackHarden our infrastructure against failures, downtime, and slowdownsCollaborate across teams - from product, frontend and backend - to ensure fast and reliable deliveryOwn your roadmap, architect systems, and carry them through to deploymentOur tech stackWe're building with tools that both humans and AI love:Frontend: React for lightning-fast interfacesBackend: Golang and Rust for serious performanceCloud: Cloudflare, Fly.io, Google Cloud Run, AWS, TerraformDevOps & Tooling: CI/CD pipelines, observability, infra-as-codeAnd always on the lookout for what's next!How we hireFill in a short form then jump on an intro call with the team.Complete the take-home exerciseShow us how you approach problems during two technical interviewsJoin us for trial work lasting 2 days preferably on-site. We'll see how you tick and you get to meet the team and explore whether joining Lovable feels right for you.About your applicationPlease submit your application in English - it’s our company language so you’ll be speaking lots of it if you joinWe treat all candidates equally - if you’re interested please apply through our careers portal
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 13, 2025
Senior HPC Systems Architect
Lambda AI
501-1000
USD
0
255000
-
340000
United States
Full-time
Remote
false
Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services – servicing government, researchers, startups and Enterprises world-wide.
If you'd like to build the world's best deep learning cloud, join us.
*Note: This position requires presence in our San Jose office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.
Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.
We're looking for a Senior HPC Systems Architect with extensive experience designing, developing, and testing large-scale high-performance computing (HPC) infrastructures. This strategic role focuses on crafting cutting-edge liquid-cooled HPC solutions.What You'll Do:Design and architect advanced HPC systems optimized for large-scale computational workloads and AI applications.Collaborate with internal teams and stakeholders to define system requirements and performance goals.Develop comprehensive testing frameworks to rigorously assess system performance, scalability, and reliability.Evaluate emerging technologies and architectural approaches to continuously enhance infrastructure capabilities.Create detailed architectural plans, documentation, and blueprints to guide implementation teams.Provide technical leadership and mentoring to engineering teams, fostering best practices in HPC architecture.About You:8+ years of experience designing and architecting large-scale HPC and distributed computing systems.Expert-level knowledge of HPC hardware including GPU clusters, compute nodes, high-speed networking (InfiniBand, Ethernet), and distributed storage.Hands-on experience with direct-to-chip liquid cooling systems.Proven expertise in creating robust performance benchmarks, capacity planning, and system validation.Exceptional skills in system architecture, design documentation, and technical specifications.Ability to work collaboratively across teams, ensuring alignment of technical solutions with business objectives.Self-motivated, strategic thinker with strong analytical and problem-solving capabilities.Nice to Have:Prior experience with AI infrastructure design.Familiarity with cloud computing environments and hybrid cloud HPC architectures.Knowledge of automation and orchestration tools (Ansible, Terraform, Kubernetes).Salary Range InformationBased on market data and other factors, the annual salary range for this position is $255,000-$340,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.About LambdaFounded in 2012, ~350 employees (2024) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 11, 2025
Application Security Engineer (Zürich)
Lakera AI
51-100
-
Switzerland
Full-time
Remote
true
Lakera is hiring its first dedicated Application Security Engineer to partner with Engineering and embed security into every stage of our SDLC.You will work closely with backend and infrastructure engineers, turn threat models into guardrails, harden our Python services, and make it easy for teams to ship secure code. Your main focus is proactive product security: secure-SDLC integration, automated testing pipelines, and hands-on code guidance. Because we are a lean team you will also help with incident response and audit preparation when required.If you’re excited by the idea of shaping security strategy at a company working on the frontier of AI security, this is your chance to make outsized impact from day one.About LakeraLakera is on a mission to ensure AI does what we want it to do. We are heading towards a future where AI agents run our businesses and personal lives. Here at Lakera, we're not just dreaming about the future; we're building the security foundation for it. We empower security teams and builders so that their businesses can adopt AI technologies and unleash the next phase of intelligent computing.We work with Fortune 500 companies, startups, and foundation model providers to protect them and their users from adversarial misalignment. We are also the company behind Gandalf, the world’s most popular AI security game.Lakera has offices in San Francisco and Zurich.We move fast and work with intensity. We act as one team but expect everyone to take substantial ownership and accountability. We prioritize transparency at every level and are committed to always raising the bar in everything we do. We promote diversity of thought as we believe that creates the best outcomes.What You’ll DoIntegrate Security into the SDLCIntegrate and maintain SAST, dependency scanning, and IaC checks in the CI pipeline.Perform threat models and drive secure-by-design patterns with engineers.Run secure code reviews and pair with developers to remediate findings.Champion SecurityDeliver just-in-time training, secure-coding guidelines, and short demos.Build self-service security tooling and templates that reduce friction.Cloud & Infrastructure HardeningReview AWS/Kubernetes configurations, IAM policies, and Cloudflare rules.Support infrastructure teams with infrastructure-as-code guardrails.Continuous ImprovementTrack security metrics, drive post-incident reviews, and propose roadmap items.Stay up to date with emerging threats, vulnerabilities, and industry best practices across SaaS, open-source, and cloud environments.What You’ll BringAt least three years in product or application security or a closely related DevSecOps role.Hands-on experience securing Python, Node, Go, or similar web applications and APIs.Ability to code. Comfortable writing or improving small tools and infrastructure-as-code in Python and Terraform.Ability to read and review code, spot vulnerabilities, and communicate fixes clearly.Ability to implement SAST, DAST, and CI/CD security controls (GitHub Actions, GitLab CI, or similar).Working knowledge of AWS security fundamentals.Strong collaboration skills and the ability to influence without authority in a fast-moving startup.Excellent communication skills, both verbal and written, enabling clear and effective interactions with internal stakeholders, auditors, and customers.Nice to haves:Familiarity with Auth0, OAuth 2 / OIDC flows, and multi-tenant SaaS authentication patterns.Exposure to compliance frameworks such as SOC 2 or ISO 27001 or evidence-collection tooling.Relevant certifications (for example OSCP) or a degree in Computer Science or a related field.👉 Let's stay connected! Follow us on LinkedIn, Twitter & Instagram to learn more about what is happening at Lakera.ℹ️ Join us on Momentum, the slack community for AI Safety and Security everything.❗To remove your information from our recruitment database, please email privacy@lakera.ai.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 11, 2025
Platform Engineer
E2B
11-50
-
United States
Full-time
Remote
false
About the roleYou will be building the next cloud platform for running AI software — a cloud where AI apps are building other software apps.Your job will be:Building the E2B backendSolving general infrastructure problemsWorking with Kubernetes or NomadCollaborating with our Systems Engineers and Distributed Systems EngineersWe’re looking for a skilled developer with a diabolical obsession with making things run fast and efficiently and running A LOT of them at the same time.If this sounds exciting to you, we want to hear from you!What we’re looking for5+ years building infrastructure, especially distributed systemsExperience building infrastructure on scaleExcited to work in person from San Francisco on a devtool productDetail oriented with a great tasteExcited to work closely with our usersNot being afraid to take ownership of the part of our productIf you join E2B, you’ll get a lot of freedom. We expect you to be proactive and take ownership. You’ll be taking projects from 0 to 1 with the support of the rest of the team.What it’s like to work at E2BWork at a fast growing startup at an early team (we grow 20%-100% MoM)We ship fast but don’t release junkWe like hard work and problems. Challenges mean potential value.We have a long runway and can offer a competitive salary for the startup at our stageWork closely with other AI companies on the edge of what’s possible todayDogfooding our own product on projects like FragmentsNo meetings, highly writing and transparent cultureYou’re the decision maker in day-to-day, important product and roadmap decisions are on Vasek (CEO) and Tomas (CTO)Spend 10-20% of the roadmap on highly experimental projectsHiring processWe aim to have the whole process done in 7-10 days. We understand that it’s important to move fast and try to follow up in 24 hours after each stage.30-minute call with Vasek (CEO). We’ll go over your past work experience and what you’re looking for to make sure this would be a good fit for both of us.First technical interview with Tomas (CTO). About 1 hour long call. You’ll get asked thorough technical questions. Often these are questions about problems that we ourselves experienced while building E2B.Second technical interview. Another 1-2 hours long call. Expect live coding on this call. We’ll ask you to solve specific problems (don’t worry, it’s not a leet code) that are related to your role.One day of in-person hacking at our office (paid). We invite you to our office to work on the product with us. This is a great opportunity for all of us to try how it’s working together and for you to meet the team.Last call with Vasek. Last 30-minute call with the CEO to talk more about the role and answer any of your questions.Decision and potential offer.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 10, 2025
Cloud Engineer
TensorWave
51-100
-
United States
Full-time
Remote
false
At TensorWave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.About the Role:As a Cloud Engineer on our Customer Experience team, you’ll be the front line of a high-performance support model that bridges traditional L1/L2 roles with the technical rigor demanded by AI infrastructure customers.You’ll help enterprise ML engineers, DevOps teams, and CTOs solve complex challenges involving GPU workloads, infrastructure orchestration, and deployment at scale. Your mission? Deliver timely, precise, and white-glove technical support that builds trust and accelerates adoption with the most exciting customer partners in the industry.TensorWave is growing fast, and this role is a rare opportunity to get in early, grow with the team, and carve out your path as we scale.Responsibilities:Triaging & Resolution: Act as first and second line of defense for technical issues (VMs, networking, API errors, orchestration tools, GPU utilization, etc.).Customer Communications: Manage tickets, live chat, and calls across Premium/Platinum support tiers. Communicate clearly and empathetically to both technical and non-technical users..Collaborate Cross-Functionally: Escalate critical bugs, provide logs + context to engineering, and contribute to product improvement feedback loops.Documentation & Enablement: Write clear, technically accurate documentation and playbooks to improve support efficiency and self-service.Tooling & Automation: Help us scale support by improving diagnostics tooling, chatbots, and macros to reduce MTTR.Success Partnership: Support QBRs and onboarding sessions for top-tier customers alongside Customer Success Managers.Essential Skills & Qualifications:2–4 years in technical support, cloud operations, or SRE environments.Hands-on experience with:Linux environments (logs, bash, process/thread management)Cloud compute platforms (AWS/GCP/Azure or similar).Containerization or orchestration tools (e.g., Kubernetes, Docker, Terraform, SLURM).AI/ML workloads (inference, training, fine-tuning).Troubleshooting APIs and REST/GraphQL calls.Strong communicator with the ability to simplify complex technical issues.Comfort in a fast-paced startup environment where priorities shift rapidly.Familiarity with GPU-accelerated cloud environments.We’re looking for resilient, adaptable people to join our team—folks who enjoy collaborating and tackling tough challenges. We’re all about offering real opportunities for growth, letting you dive into complex problems and make a meaningful impact through creative solutions. If you're a driven contributor, we encourage you to explore opportunities to make an impact at TensorWave. Join us as we redefine the possibilities of intelligent computing.What We Bring:In addition to a competitive salary, we offer a variety of benefits to support your needs, including:Stock Options100% paid Medical, Dental, and Vision insuranceLife and Voluntary Supplemental InsuranceShort Term Disability InsuranceFlexible Spending Account401(k)Flexible PTOPaid HolidaysParental LeaveMental Health Benefits through Spring Health
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 10, 2025
Capacity Engineer, Compute
Anthropic
1001-5000
USD
320000
-
405000
United States
Full-time
Remote
false
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the role The mission of the Compute team is to provide input into our company-wide cloud infrastructure strategy and efficiency deliverables, advise on key decisions affecting budget, and provide capacity planning and performance expertise to various anthropic-wide stakeholders in finance and engineering leadership. As an early member of this team, you would be required to work with engineering teams to ensure optimal operation and growth of our infrastructure from both a cost and technology perspective and collaborate cross-functionally with finance and data science partners to analyze and forecast growth. Responsibilities: Develop self-service tools and processes to enable anthropic engineers to understand their capacity, efficiency, and costs Design, develop, and lead necessary automation to help capacity plan for both near and long term outcomes Institute and design governance workflows to help manage additional capacity request approvals Investigate new capacity requests to ensure the best use of resources and that instances are sized appropriately Build and drive cost to serve analytics programs to guide engineering, finance, and leadership on the total cost (TCO) and infrastructure impact of our scaling factors. Inform pricing conversations through customer profile sensitive gross margin analysis. Tech lead with outside vendors to manage anthropic capacity needs Proactively identify infrastructure inefficiency opportunities, document proposal and be a key contributor in driving a positive outcome Serve as an advisor to engineering and finance functions and executive team for one of the largest areas of expenditure Work closely with TPMs on special efficiency projects and help deliver committed outcomes You may be a good fit if you: 5+ years experience in capacity engineering 5+ years experience in a technical role Intermediate knowledge of various public cloud providers Experience with data modeling for public cloud Experience with budgeting, capacity planning experience, and cloud efficiency optimization workflows Experience in scripting and building automation tools Self-disciplined and thrives in fast paced environments Excellent communication skills Familiarity with cloud compute, storage, network, and services Attention to detail and a passion for correctness Deadline to apply: None. Applications will be reviewed on a rolling basis. The expected salary range for this position is:Annual Salary:$320,000—$405,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
July 10, 2025
System Integrator (Communication Systems)
helsing
201-500
-
Germany
Full-time
Remote
false
Who we are Helsing is a defence AI company. Our mission is to protect our democracies. We aim to achieve technological leadership, so that open societies can continue to make sovereign decisions and control their ethical standards. As democracies, we believe we have a special responsibility to be thoughtful about the development and deployment of powerful technologies like AI. We take this responsibility seriously. We are an ambitious and committed team of engineers, AI specialists and customer-facing programme managers. We are looking for mission-driven people to join our European teams – and apply their skills to solve the most complex and impactful problems. We embrace an open and transparent culture that welcomes healthy debates on the use of technology in defence, its benefits, and its ethical implications. The role As a System Integrator (Communication), you will be responsible for implementing all aspects of communication systems within one of Helsing's programs, which are crucial to the success of the forces. In this role, you will collaborate with industry partners, procurement agencies, and land forces. Your work will be central to creating complex systems that address the challenges of future battlefields. This will involve a deep understanding of state-of-the-art communication systems, with a focus on land defense systems. We have assembled a distinctive partnerships-and-programmes team across various fields of expertise and backgrounds. Together we lift ambitions and shape the thinking of our industry partners and customers on software and AI in defence, national security, and intelligence. You will work with and learn from leading experts to build lasting industry and customer relationships. The day-to-day Determine the performance of Radio Frequency (RF) communication links and develop link budgets using analytical methods or simulation tools Advise stakeholders and recommend solutions to complex problems by leveraging expertise in communication technologies Propose innovative, scalable, and adaptive communication design solutions Provide expertise in integrating communication systems within a multidisciplinary environment Plan and execute the integration of communication systems into both new and existing systems Develop and manage Interface Control Documents (ICDs) Suggest communications performance measures Recommend and conduct studies on communication systems You should apply if you Have 3 years of experience in communication system integration or a comparable field Hold a relevant degree such as a Bachelor's or Master's in Engineering, Science, Technology, Math, Computer Science, Electrical Engineering, Physics, Communications, or a similar discipline Possess experience in communications systems engineering and development, particularly in military communications, or have equivalent operational experience using Commercial Off-The-Shelf (COTS) hardware components Have experience in requirements development, system design, problem analysis and resolution, and/or integration and testing Have hands-on experience with mobile networking components, wireless technologies, or cellular networks Can work effectively in a team environment and actively contribute to team efforts Are able to translate customer operational requirements into technical requirements necessary for defining mobile converged IP network solutions Note: We operate in an industry where women, as well as other minority groups, are systematically under-represented. We encourage you to apply even if you don’t meet all the listed qualifications; ability and impact cannot be summarised in a few bullet points. Join Helsing and work with world-leading experts in their fields Helsing’s work is important. You’ll be directly contributing to the protection of democratic countries while balancing both ethical and geopolitical concerns The work is unique. We operate in a domain that has highly unusual technical requirements and constraints, and where robustness, safety, and ethical considerations are vital. You will face unique Engineering and AI challenges that make a meaningful impact in the world Our work frequently takes us right up to the state of the art in technical innovation, be it reinforcement learning, distributed systems, generative AI, or deployment infrastructure. The defence industry is entering the most exciting phase of the technological development curve. Advances in our field of world are not incremental: Helsing is part of, and often leading, historic leaps forward In our domain, success is a matter of order-of-magnitude improvements and novel capabilities. This means we take bets, aim high, and focus on big opportunities. Despite being a relatively young company, Helsing has already been selected for multiple significant government contracts We actively encourage healthy, proactive, and diverse debate internally about what we do and how we choose to do it. Teams and individual engineers are trusted (and encouraged) to practise responsible autonomy and critical thinking, and to focus on outcomes, not conformity. At Helsing you will have a say in how we (and you!) work, the opportunity to engage on what does and doesn’t work, and to take ownership of aspects of our culture that you care deeply about What we offer A focus on outcomes, not time-tracking Competitive compensation and stock options Relocation support Social and education allowances Regular company events and all-hands to bring together employees as one team across Europe Helsing is an equal opportunities employer. We are committed to equal employment opportunity regardless of race, religion, sexual orientation, age, marital status, disability or gender identity. Please do not submit personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data concerning your health, or data concerning your sexual orientation. Helsing's Candidate Privacy and Confidentiality Regime can be found here.
MLOps / DevOps Engineer
Data Science & Analytics
Solutions Architect
Software Engineering
Apply
July 10, 2025
No job found
Your search did not match any job. Please try again
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.