Top MLOps / DevOps Engineer Jobs Openings in 2025

Looking for opportunities in MLOps / DevOps Engineer? This curated list features the latest MLOps / DevOps Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.

Anduril Industries.jpg

Senior Machine Learning/MLOps Engineer

Anduril
USD
0
146000
-
194000
US.svg
United States
Full-time
Remote
false
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.ABOUT THE TEAM The Corp Tech Acquisition team scopes and manages the implementation of Anduril’s acquired companies. We help enable the new acquisitions to build, ship, and deploy products at scale with Anduril’s systems and processes. As we continue to acquire companies and expand our capabilities, we are seeking a highly skilled Acquisition Program Manager specializing in Mergers & Acquisitions (M&A). This role will lead and coordinate the acquisition process, work with leadership and cross-functional teams to ensure a smooth integration, and manage all aspects of program planning and execution. ABOUT THE JOB   Oversee the acquisition program lifecycle from due diligence, integration, and adoption to completion across multiple acquisitions Work closely with cross-functional stakeholders (IT, Legal, HR, Supply Chain, Manufacturing, Mission Operations, Finance, Product, Deployments) to root cause problems and scope key requirements, milestones, and dependencies for acquisition implementation success Own building the program management foundation for the acquisition team Own defining, managing, and improving program management processes for all acquisition implementations Help implement risk management strategies, identifying potential issues and developing contingency plans Manage the program timeline across all related acquisitions, ensuring milestones are met and programs stay on track Define program scope, goals, and deliverables in collaboration with stakeholders and senior management Facilitate communication and collaboration across cross-functional teams and departments Provide regular updates and/or risks to the appropriate management channels and escalate issues, as necessary, according to each acquisitions integration plan Analyze each program status and, when necessary, revise the scope, schedule, or resources to ensure that program requirements can be met Establish and maintain relationships with relevant stakeholders, providing day-to-day contact on program status and changes REQUIRED QUALIFICATIONS   50%+ travel required insanely high execution bar, and will see all programs through from conception to tactical completion to move Anduril forward 5+ years of program management experience, preferably with managing complex systems and operations implementations 5+ years of experience with managing executive communication, board of director goals or driving cross company initiatives Excellent written and verbal communication skills and strong presentation skills, able to clearly articulate needs to leadership team and a wide variety of cross-functional stakeholders Collaborate across teams, strategizing how to bridge different parts of the organization to achieve cross-functional outcomes Ability to observe and anticipate potential risks across programs, milestones, timelines, etc. You are incredibly organized, detail-oriented, and and excel in strategic planning You have both high ownership and low ego, approaching everything with strong outcome orientation and high humility You’re discerning and an incredibly fast learner U.S. Persons status is required as this position needs to access export-controlled data    US Salary Range$146,000—$194,000 USD  The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including:  Healthcare Benefits  US Roles: Comprehensive medical, dental, and vision plans at little to no cost to you.  UK & AUS Roles: We cover full cost of medical insurance premiums for you and your dependents.  IE Roles: We offer an annual contribution toward your private health insurance for you and your dependents.  Additional Benefits  Income Protection: Anduril covers life and disability insurance for all employees.  Generous time off: Highly competitive PTO plans with a holiday hiatus in December. Caregiver & Wellness Leave is available to care for family members, bond with a new baby, or address your own medical needs.  Family Planning & Parenting Support: Coverage for fertility treatments (e.g., IVF, preservation), adoption, and gestational carriers, along with resources to support you and your partner from planning to parenting.  Mental Health Resources: Access free mental health resources 24/7, including therapy and life coaching. Additional work-life services, such as legal and financial support, are also available.  Professional Development: Annual reimbursement for professional development  Commuter Benefits: Company-funded commuter benefits based on your region.  Relocation Assistance: Available depending on role eligibility.  Retirement Savings Plan  US Roles: Traditional 401(k), Roth, and after-tax (mega backdoor Roth) options.  UK & IE Roles: Pension plan with employer match.  AUS Roles: Superannuation plan.  The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.  To view Anduril's candidate data privacy policy, please visit https://anduril.com/applicant-privacy-notice/. 
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Speechify.jpg

Senior Software Engineer, Windows/Desktop Applications

Speechify
USD
140000
-
200000
US.svg
United States
Full-time
Remote
true
PLEASE APPLY THROUGH THIS LINK: https://job-boards.greenhouse.io/speechify/jobs/5287658004  DO NOT APPLY BELOW The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its App of the Day. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. This is a key role and ideal for someone who thinks strategically, enjoys fast-paced environments, passionate about making product decisions, and has experience building great user experiences that delight users. We are a flat organization that allows anyone to become a leader by showing excellent technical skills and delivering results consistently and fast. Work ethic, solid communication skills, and obsession with winning are paramount.  Our interview process involves several technical interviews and we aim to complete them within 1 week.  What You’ll Do Work alongside machine learning researchers, engineers, and product managers to bring our AI Voices to their customers for a diverse range of use cases Deploy and operate the core ML inference workloads for our AI Voices serving pipeline Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our deployed models Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues An Ideal Candidate Should Have Experience shipping Python-based services Experience being responsible for the successful operation of a critical production service Experience with public cloud environments, GCP preferred Experience with Infrastructure such as Code, Docker, and containerized deployments. Preferred: Experience deploying high-availability applications on Kubernetes. Preferred: Experience deploying ML models to production What We Offer A dynamic environment where your contributions shape the company and its products A team that values innovation, intuition, and drive Autonomy, fostering focus and creativity The opportunity to have a significant impact in a revolutionary industry Competitive compensation, a welcoming atmosphere, and a commitment to an exceptional asynchronous work culture The privilege of working on a product that changes lives, particularly for those with learning differences like dyslexia, ADD, and more An active role at the intersection of artificial intelligence and audio – a rapidly evolving tech domain Salary The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job?  Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit?  Refer them!  Speechify is committed to a diverse and inclusive workplace.  Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Lambda.jpg

Data Center Substation/Utility Engineering - Electrical

Lambda AI
USD
0
185000
-
327000
US.svg
United States
Full-time
Remote
false
Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU. If you'd like to build the world's best deep learning cloud, join us.  Note: This position prefers presence in our Bay Area office locations, but is open to remote presence for the right candidate.About the JobLambda is seeking a Data Center Substation/Utility Electrical Engineer who provides strategic, technical, and executive leadership for Lambda’s high-voltage substation, transmission, and power-generation programs across North America. This role oversees multi-site, multi-state programs involving 69kV–500kV substation development, transmission line interconnections, grid integration, protection and control systems, and utility interface management critical to Lambda’s AI/Cloud power infrastructure.This key role works heavily with the Energy team and sets program strategy, defines engineering and construction standards, ensures regulatory and utility compliance, and drives the delivery of complex power infrastructure safely, on time, and within budget. This position requires a deep command of HV electrical systems, utility coordination, EPC oversight, and lifecycle management for mission-critical substations supporting Lambda or developer data centers.What You’ll DoTechnical Leadership in HV Substation & Transmission SystemsProvide technical oversight and governance for all HV substation designs (69kV–500kV), including bus configurations, breaker schemes, transformer sizing, reactive compensation, protection systems, SCADA integration, and grounding. Also provide guidance with large-scale diesel and natural gas generator plants (microgrid). Possess a strong understanding of Battery Energy Storage Systems (BESS) and renewable resource integration (solar, fuel cells, etc.). Supporting site selection with powered land, as well as self-build infrastructure.Oversee the design and performance criteria for:Generator control systemsLoad sharing, load shedding, and black start capabilitiesGenerator transient response and dynamic stabilityEmissions systems (SCR, DEF, CO/NOx compliance)Utility parallel operationDemand response and curtailment strategiesDetermine technical standards for:Transmission interconnections and line tap arrangementsRelay protection and control philosophiesMetering schemes and revenue-quality instrumentationInsulation coordination and equipment BIL requirementsSubstation communication protocols (IEC 61850, DNP3, Modbus)Arc-flash, fault current, and short-circuit design considerationsLead the review and approval of all major engineering deliverables, such as:One-line diagramsPhysical, Civil, and Protection & control schematicsRelay settings and coordination studiesHV switching, grounding, and lightning protection plansGuide technical investigations and root-cause analysis for power system events, outages, and equipment failures.Transmission & Utility Interconnection StrategyEstablish and maintain relationships with utilities, ISOs/RTOs, and transmission owners; lead interconnection negotiations and technical discussions.Oversee all aspects of utility interconnection from feasibility through energization, including:Load flow and stability studiesShort circuit and protection coordinationTransmission planning requirementsInterconnection application strategy and milestone trackingEnsure that program decisions align with utility standards, NERC/FERC requirements, and state regulatory frameworks.Program Planning, Delivery & ExecutionDirect the execution of multiple HV substation and transmission projects, ensuring engineering integrity, equipment standardization, and construction quality.Oversee:EPC contractor selection (behind the meter or traditional utility interconnection)Factory acceptance testing (FAT) for HV equipmentField acceptance testing (FAT/SAT) for protection and control systemsCommissioning procedures and energization plansEnsure long-lead equipment procurement strategies for transformers, breakers, relays, controls, GIS/AIS gear, and transmission structures.Construction & Field OversightProvide executive and technical oversight for construction sequencing, clearance planning, switching coordination, and commissioning safety.Ensure adherence to construction standards related to:High-voltage safety and switching proceduresTransmission structure erection and conductor installationGround grid installation, testing, and validationRelay testing (end-to-end, point-to-point, functional)PermittingResolve complex field issues with EPCs, utilities, and commissioning teams.Senior/Executive-Level DutiesLead substation program standards, templates, modular designs, and equipment specifications to ensure repeatability and scale across core markets.Drive decision-making related to transformer sizing, redundancy (N-1, N+1), load growth, and grid capacity planning.Approve acceptance criteria for protective relay settings, verifying alignment with utility and internal standards.Orchestrate system modeling and analytical studies for AI load support (ETAP, SKM, ASPEN, PSLF, PSS/E).Provide executive reporting on technical risk, system reliability, NERC compliance impacts, and substation performance KPIs.YouBachelor’s degree in Electrical Engineering.8+ years in HV substation, transmission engineering, EPC leadership, or power delivery program management. Power generation experience is a plus.Extensive knowledge of HV electrical systems.Understanding of utility interconnection processes, NERC requirements, RTO/ISO rules, and state regulatory protocols.Demonstrated experience delivering large-scale, multi-site, mission-critical HV infrastructure projects.Prior responsibility for technical approval of designs, engineering packages, relay settings, FAT/SAT, and energization.Nice to HaveAbility to translate highly technical concepts for executive audiences.Strong commercial acumen with experience negotiating EPC agreements, equipment contracts, and utility service arrangements.Excellent leadership, risk-management, and cross-functional communication skills.Proficient in reading, analyzing, and interpreting technical specifications, financial reports, and legal documentsCombination of PE, MS, PMP or PgMP preferred.Salary Range InformationThe annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.About LambdaFounded in 2012, ~400 employees (2025) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Replit.jpg

SOC Engineer

Replit
USD
0
180000
-
250000
US.svg
United States
Full-time
Remote
false
Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.We are looking for a SOC Engineer to join our Security Operations team and help defend a fast-moving, cloud-native AI vibe-coding platform. In this role, you will stay on top of emerging threats—from 0-days and active exploitation campaigns to bug bounty findings and customer-reported issues—and rapidly determine their relevance and potential impact to Replit. You will conduct investigations, analyze signals across our environment, and collaborate with Security, SRE, and Engineering teams to develop and drive effective containment and mitigation strategies.This is a hands-on, investigative role requiring strong technical depth, understanding of modern software engineering and CI/CD systems, familiarity with cloud-native infrastructure (especially GCP), and the ability to work across multiple teams in a fast-paced environment.ResponsibilitiesThreat Awareness & Rapid AssessmentContinuously monitor emerging threats, including bad actor activity, 0-day vulnerabilities, public exploitation campaigns, bug bounty reports, and customer-reported security issuesQuickly assess the applicability of these threats to Replit’s cloud infrastructure, SaaS services, internal tooling, and platform components. Investigation & Impact AnalysisConduct targeted investigations to determine whether Replit is already impacted by a newly discovered threat, vulnerability, or exploit.Analyze logs, telemetry, and system behaviors using SIEM, metrics, Cloud Logging, and related tools.Identify gaps or weaknesses in existing detection or visibility and propose improvements. Containment, Mitigation & Cross-Team CollaborationResearch potential impact paths and develop mitigation strategies for confirmed or applicable threats.Partner closely with Security, SRE, and Engineering teams to coordinate and implement containment, patches, configuration updates, or code-level fixes.Document findings, mitigations, and follow-up actions clearly for internal teams.Required Skills & ExperienceStrong understanding of software engineering fundamentals, including code structure, build systems, dependencies, and package ecosystems—enabling effective partnership with Engineering teams.Understanding of CI/CD pipelines and DevOps workflows, enabling collaboration with Infrastructure and DevOps teams.Solid knowledge of cloud architecture, especially Google Cloud Platform (GCP) services used in modern cloud-native deployments.Familiarity with SaaS architectures, identity systems, and integration patterns for effective collaboration with Cloud Security teams.Hands-on experience with SIEM, Cloud Logging, and log-based investigation workflows.Ability to perform investigations using log data, behavioral indicators, and threat intelligence.General understanding of vulnerability lifecycles, exploitability analysis, and common attack vectors.Preferred QualificationsExperience with threat intelligence, security research, or vulnerability analysis.Familiarity with Kubernetes, containers, serverless infrastructure, or modern distributed systems.Ability to write scripts or small tools for investigation or automation (Python, Go, Bash).Experience working with bug bounty programs or coordinated vulnerability disclosure workflows.Experience in fast-paced, cloud-native, or AI/ML-driven environments. What We ValueCuriosity & initiative: Strong desire to understand attacker behaviors, emerging threats, and how they apply to real-world systems.Speed & analytical rigor: Ability to quickly assess high-risk vulnerabilities with clear, evidence-based reasoning.Collaboration: Comfort working across cross-functional teams spanning Security, SRE, Engineering, and Infrastructure.Clear communication: Ability to explain findings, risks, and mitigation strategies to stakeholders at all levels.Ownership mindset: Takes initiative to drive investigations, improvements, and remediations to completionContinuous learning: Passion for staying up to date on new vulnerabilities, exploit trends, and cloud-native security best practices.This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.Full-Time Employee Benefits Include:💰 Competitive Salary & Equity💹 401(k) Program⚕️ Health, Dental, Vision and Life Insurance🩼 Short Term and Long Term Disability🚼 Paid Parental, Medical, Caregiver Leave🚗 Commuter Benefits📱 Monthly Wellness Stipend🧑‍💻 Autonoumous Work Environement🖥 In Office Set-Up Reimbursement🏝 Flexible Time Off (FTO) + Holidays🚀 Quarterly Team Gatherings☕ In Office AmenitiesWant to learn more about what we are up to?Meet the Replit AgentReplit: Make an app for thatReplit BlogAmjad TED TalkInterviewing + Culture at ReplitOperating PrinciplesReasons not to work at ReplitTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Replit.jpg

Security Operations Lead

Replit
USD
0
220000
-
325000
US.svg
United States
Full-time
Remote
false
Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.We are looking for a Security Operations Lead (SOC Lead) to build, mature, and operate our 24/7 detection and response capabilities across a modern cloud-native and AI-driven environment. This role leads the global SOC function—monitoring, SIEM ownership, detection engineering, alert triage, and operational readiness—while also evaluating and integrating emerging AI-based SOC products and autonomous response platforms.You will oversee monitoring across multi-cloud environments (GCP primary, AWS/Azure secondary), Kubernetes, SaaS services, endpoints, developer tools, and AI workloads. You’ll collaborate closely with Cloud Security, Compliance/GRC, SRE, Platform Engineering, IT/Endpoint teams, and AI Infrastructure to ensure our detection strategy scales and stays ahead of evolving threats.This is a hands-on leadership role perfect for someone who wants to shape the SOC of the future while solving complex challenges in a high-scale AI setting.What You’ll DoSOC Leadership & 24/7 MonitoringLead, mentor, and scale a global SOC team responsible for 24/7 monitoring, alert intake, triage, correlation, and escalation.Build operational rigor: processes, runbooks, SLAs, metrics, and quality standards for high-scale environments.Cover monitoring across:Cloud infrastructure (GCP, AWS, Azure)Kubernetes/GKE/EKS/AKS clustersSaaS platforms (Google Workspace, GitHub, Slack, Okta, etc.)Endpoints (macOS, Linux, Windows) including EDR/XDR telemetryDeveloper platforms + CI/CD pipelinesAI/ML systems and model-serving workflows AI-Based SOC Integration & InnovationEvaluate, adopt, and integrate AI-native SOC technologies for triaging, detection, and correlationIdentify opportunities to automate triage, investigations, enrichment, and reporting.Serve as the internal expert on the capabilities and limitations of AI-based SOC tooling. SIEM & Telemetry OwnershipOwn the entire SIEM ecosystem—ingestion, normalization, correlation, enrichment, tuning, dashboards, and metrics.Expand telemetry across:Cloud logs, API logs, system eventsSaaS audit logs and admin eventsIdentity providers (Okta, Google, Azure AD)Endpoint EDR/XDR event streamsStandardize data schemas and improve detection signal quality across sources. Detection EngineeringDevelop high-fidelity detections for:Cloud-native attacksIdentity threats and lateral movementSaaS misconfigurations and privilege abuseEndpoint malware/behavior anomaliesInsider threats and account takeover patternsUse MITRE ATT&CK, MITRE Cloud Matrix, and threat intel to drive detection coverage.Collaborate with Engineering, Cloud Security, and SRE to ensure telemetry supports detection use cases. Triage, Threat Analysis & EscalationLead day-to-day triage and threat analysis activities, ensuring accurate categorization and prioritization.Drive complex investigations involving correlated events across cloud, SaaS, endpoints, and developer platforms.Guide root cause analysis and work with owners to drive remediation and architectural improvements.Continuously refine logic, reduce false positives, and improve signal quality. Cross-Functional CollaborationPartner with Cloud Security on cloud posture and preventative controls.Work with Compliance/GRC to support SOC 2, ISO 27001, and audit readiness.Collaborate with SRE and Engineering to instrument new services with structured logs and detection hooks.Coordinate with IT / Endpoint teams to ensure full endpoint telemetry and EDR response readiness.Communicate threats, gaps, and trends to leadership and engineering stakeholders. Required Skills & Experience7+ years of experience in Security Operations, with 3+ years in a senior or lead capacity.Experience leading or collaborating with 24/7 SOC environments (internal, hybrid, or MSSP).Strong experience with SIEM platforms (Chronicle, Splunk, Elastic, Sentinel, Panther, etc.).Deep understanding of:Cloud security monitoring (GCP required; AWS/Azure preferred)SaaS security monitoring (Okta, Google Workspace, GitHub, Slack, etc.)Endpoint security telemetry (EDR/XDR tools such as CrowdStrike, SentinelOne, or Defender)Kubernetes and container detectionHands-on detection engineering skills, event correlation, threat hunting, and log analysis.Familiarity with AI-based SOC platforms and LLM-driven detection/triage tools.Strong understanding of identity security, OAuth/OIDC, and API telemetry patterns.Experience with SOAR and scripting (Python, Go, Bash).Knowledge of MITRE ATT&CK, cloud kill chains, behavioral detections, and detection lifecycle management.Preferred QualificationsExperience with UBA/UEBA, ML-driven anomaly detection, or autonomous remediation systems.Previous experience at a high-growth tech company.Security certifications (GCIH, GCIA, GCTI, GCDA, GCFA, etc.).What We ValueOperational excellence: Building reliable, scalable SOC systems.Analytical rigor: Capable of making sense of large, complex, multi-source telemetry.Leadership: Mentorship and guidance of analysts and engineers.Adaptability: Comfortable evaluating and integrating next-gen AI-based SOC tools.Clear communication: Able to articulate risk, incidents, and recommendations to both technical and executive audiences.Automation mindset: Focused on reducing manual toil via SOAR, scripting, and AI augmentation. Curiosity: Passion for learning, experimenting, and staying ahead of evolving threats—especially those targeting cloud-native and AI systems.This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.Full-Time Employee Benefits Include:💰 Competitive Salary & Equity💹 401(k) Program⚕️ Health, Dental, Vision and Life Insurance🩼 Short Term and Long Term Disability🚼 Paid Parental, Medical, Caregiver Leave🚗 Commuter Benefits📱 Monthly Wellness Stipend🧑‍💻 Autonoumous Work Environement🖥 In Office Set-Up Reimbursement🏝 Flexible Time Off (FTO) + Holidays🚀 Quarterly Team Gatherings☕ In Office AmenitiesWant to learn more about what we are up to?Meet the Replit AgentReplit: Make an app for thatReplit BlogAmjad TED TalkInterviewing + Culture at ReplitOperating PrinciplesReasons not to work at ReplitTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Replit.jpg

Cloud Security Lead

Replit
USD
0
220000
-
325000
US.svg
United States
Full-time
Remote
false
Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.Join us at the forefront of AI and cloud-native security as we work to secure one of the most innovative developer platforms in the world. As the Cloud Security Lead, you will shape the cloud and infrastructure security program that protects millions of developers, enables safe AI-assisted development, and ensures organizations can confidently bring our platform into enterprise environments.In this role, you will own cloud security across GCP (primary) and supplemental environments in AWS and Azure, as well as containerized systems, SaaS platforms, and our multi-tenant AI infrastructure. You’ll improve our security posture through strong architecture, posture management, secure-by-default development practices, and close partnership with Engineering, Compliance, Security Architecture, and Platform teams.This is a highly impactful, hands-on leadership role—perfect for someone who wants to solve complex security challenges at scale while influencing product, engineering, and go-to-market teams.What You’ll Do:Cloud Security EngineeringLead configuration hardening across GCP, with additional oversight of workloads and integrations running in AWS and Azure.Own and optimize CSPM platforms across multi-cloud environments—establishing configuration baselines, guardrails, and remediation workflows.Secure critical SaaS platforms, ensuring proper configurations, access controls, and engineering integrations.Lead infrastructure vulnerability management across multi-cloud systems, containers, registries, and platform services.Enhance security across containerized and Kubernetes (GKE/EKS/AKS) workloads, including runtime protections, network policies, and workload identity.Assess secure logging configurations across cloud/SaaS providers, ensuring audit logs, retention, and routing meet monitoring and architecture needs. Secure Development & Architecture EnablementPartner with engineering teams to make services secure by default, embedding security into development workflows, CI/CD pipelines, and cloud-native deployments. Cross-Functional ResponsibilitiesCollaborate with Security Monitoring, Compliance/GRC, Architecture, DevOps, Platform Engineering, and ML Infrastructure.Participate in communicating security advisories, best practices, and updates to Replit’s customers.Support incident investigations as a cloud security subject-matter expert.Required Skills & Experience:7+ years of experience in cloud engineering, with 3+ years in a senior or lead role.Hands-on experience with CSPM tools (Wiz, Lacework, Prisma, Orca, SCC, etc.).Deep expertise in GCP security (IAM, VPC, KMS, GKE, Cloud Logging).Experience securing and governing SaaS platforms and identity integrations.Operational experience with infrastructure vulnerability management across cloud and container environments.Working knowledge of AWS and/or Azure security services and configurations.Experience with container and Kubernetes security across GKE, EKS, or AKS.Strong IaC security experience with Terraform, Pulumi, or similar tooling.Familiarity with compliance standards (SOC 2, ISO 27001, PCI DSS).Preferred Qualifications:Experience supporting engineering teams in building secure-first, cloud-native or PaaS environments.Background securing AI/ML pipelines, model-serving infrastructure, or developer platform services.Experience in high-growth technology or cloud-native product companies.Experience with securing AI/agentic systems and sensitive data pipelines.Automation/scripting with Python.Relevant certifications (e.g., GCP Professional Cloud Security Engineer, AWS/Azure security certs).What We Value:Problem-solving mindset — Ability to break down complex security and operational challenges into clear engineering solutions.Autonomy — Comfortable leading initiatives, collaborating effectively, and driving outcomes with minimal oversight.Communication excellence — Able to translate deep technical concepts for engineers, executives, and enterprise customers.Continuous learning — Passion for staying current with AI security, cloud-native advances, and emerging threats.Automation-first approach — Belief in reducing operational toil and building scalable, self-healing systems.This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.Full-Time Employee Benefits Include:💰 Competitive Salary & Equity💹 401(k) Program⚕️ Health, Dental, Vision and Life Insurance🩼 Short Term and Long Term Disability🚼 Paid Parental, Medical, Caregiver Leave🚗 Commuter Benefits📱 Monthly Wellness Stipend🧑‍💻 Autonoumous Work Environement🖥 In Office Set-Up Reimbursement🏝 Flexible Time Off (FTO) + Holidays🚀 Quarterly Team Gatherings☕ In Office AmenitiesWant to learn more about what we are up to?Meet the Replit AgentReplit: Make an app for thatReplit BlogAmjad TED TalkInterviewing + Culture at ReplitOperating PrinciplesReasons not to work at ReplitTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Symbolica AI.jpg

Talent Sourcer – AI & ML Research

Symbolica AI
-
US.svg
United States
Full-time
Remote
false
About Us Symbolica is an AI research lab pioneering the application of category theory to enable logical reasoning in machines. We’re a well-resourced, nimble team of experts on a mission to bridge the gap between theoretical mathematics and cutting-edge technologies, creating symbolic reasoning models that think like humans – precise, logical, and interpretable. While others focus on scaling data-hungry neural networks, we’re building AI that understands the structures of thought, not just patterns in data. Our approach combines rigorous research with fast-paced, results-driven execution. We’re reimagining the very foundations of intelligence while simultaneously developing product-focused machine learning models in a tight feedback loop, where research fuels application. Founded in 2022, we’ve raised over $30M from leading Silicon Valley investors, including Khosla Ventures, General Catalyst, Abstract Ventures, and Day One Ventures, to push the boundaries of applying formal mathematics and logic to machine learning. Our vision is to create AI systems that transform industries, empowering machines to solve humanity’s most complex challenges with precision and insight. Join us to redefine the future of AI by turning groundbreaking ideas into reality.About the Role As a DevOps Engineering Lead working closely with our Head of ML Engineering, you will lead the design, build, and optimize the infrastructure and tools that enable us to take our research and development efforts from the lab into a highly reliable, performant and secure software stack in production. You'll help accelerate the processes involved in going from research prototypes into production and enterprise ready platforms with security, availability and reliability in mind. Your work will be at the intersection of research and engineering, ensuring our R&D team has the robust platform they need to push the boundaries of AI, working with our GPU vendors, cloud providers, and on-prem servers. 📍 This is an onsite role that is based in our SF office (345 California St.) Key Responsibilities - Focus on improving the reliability and performance of our Lambda cluster and model training pipeline. - Assist in managing multiple Kubernetes environments across cloud providers - Maintain and build the internal observability platform across all environments, covering everything from GPUs, AI applications and distributed backend systems. - Take ownership of our model training and deployment systems, bringing them to a more scalable, production-ready state. - Aid in building comprehensive CI tests for GitOps repositories and promotion systems - Build and maintain different environments for research and client facing products according to best practices About You - 5+ years of experience in DevOps, or infrastructure roles, with at least 2 years in machine learning infrastructure or MLOps. It would be a benefit if you have either built, maintained, or managed ML infrastructure using DevOps practices in the past. - Proficient in cloud-native architectures, with the ability to make the right tradeoffs where necessary - Experienced with Linux, containers, GPU management, Nix, Kubernetes and an interest in making sure the infrastructure behind our models is secure by design. - Exceptional problem-solving skills with the ability to nimbly solve edge-cases with minimum disruption. - Solid software engineering skills in Rust, Golang or Python What We Offer Competitive salary and early-stage equity package. A high-trust, execution-first culture with minimal bureaucracy. Direct ownership of meaningful projects with real business impact. A rare opportunity to sit at the interface between deep research and real-world productization. Read more about Symbolica: https://fortune.com/2024/04/09/vinod-khosla-former-tesla-autopilot-engineer-ai-models/ https://venturebeat.com/ai/move-over-deep-learning-symbolicas-structured-approach-could-transform-ai/ Symbolica is an equal opportunities employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of race, gender, age, religion, disability, or sexual orientation.  Symbolica is an equal opportunities employer. We celebrate diversity and are committed to creating an inclusive environment for all employees, regardless of race, gender, age, religion, disability, or sexual orientation.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Figure.jpg

Legal Intern [Summer 2026]

Figure AI
USD
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
webAI.jpg

DevSecOps Engineer

webAI
-
US.svg
United States
Full-time
Remote
false
About Us:webAI is pioneering the future of artificial intelligence by establishing the first distributed AI infrastructure dedicated to personalized AI. We recognize the evolving demands of a data-driven society for scalability and flexibility, and we firmly believe that the future of AI lies in distributed processing at the edge, bringing computation closer to the source of data generation. Our mission is to build a future where a company's valuable data and intellectual property remain entirely private, enabling the deployment of large-scale AI models directly on standard consumer hardware without compromising the information embedded within those models. We are developing an end-to-end platform that is secure, scalable, and fully under the control of our users, empowering enterprises with AI that understands their unique business. We are a team driven by truth, ownership, tenacity, and humility, and we seek individuals who resonate with these core values and are passionate about shaping the next generation of AI.About the Role:We are seeking a DevOps/Compliance Engineer to support our Public Sector initiatives by building, securing, and maintaining compliant infrastructure environments for deploying AI models within government and regulated systems. This role bridges modern DevOps practices with the strict compliance and security standards required for federal engagements. You will play a critical role in designing infrastructure automation, ensuring FedRAMP and NIST compliance, and helping deliver secure, auditable, containerized AI solutions to our public sector partners.Responsibilities:Design, implement, and maintain scalable, secure cloud and edge infrastructure for AI workloads in government environments.Manage containerization and orchestration technologies such as Docker and Kubernetes, optimizing for performance, isolation, and compliance.Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, or Pulumi to automate secure, compliant infrastructure provisioning.Implement and manage CI/CD pipelines with integrated security controls, encryption, and vulnerability scanning.Ensure compliance with federal security frameworks such as NIST SP 800-53, FedRAMP, and DISA STIGs.Collaborate with Security, Legal, and Public Sector teams to maintain continuous compliance posture and generate audit-ready evidence.Package and deliver software artifacts (containers, binaries, configurations) for deployment in restricted or air-gapped environments.Configure and maintain monitoring, logging, and observability tools to ensure system reliability and compliance visibility.Support MLOps workflows to productionize AI models with consistent, secure automation.Contribute to documentation and knowledge sharing on infrastructure and compliance best practices.Qualifications:Active US Security clearance or eligibility and willingness to obtain a US Security clearance5+ years of experience in DevOps, Site Reliability, or Infrastructure Engineering.Proficiency with Docker, Kubernetes, and cloud-native deployment tools.Strong experience implementing Infrastructure as Code with Terraform, Ansible, or Pulumi.Deep understanding of security and compliance frameworks such as NIST SP 800-53, FedRAMP, and DISA STIGs.Experience with MLOps tools and practices for automating and scaling model deployments.Proficiency in Python, Bash, or Go for automation and scripting.Experience integrating security controls into CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, etc.).Familiarity with observability tools such as Prometheus, Grafana, ELK, or CloudWatch.Preferred SkillsExperience working within FedRAMP Moderate or High environments.Familiarity with AI model deployment pipelines and model governance best practices.Knowledge of Zero Trust architecture and secure identity management.Strong collaboration skills and ability to work cross-functionally across technical, security, and compliance teams.Excellent written and verbal communication skills for interfacing with government stakeholders and auditors. We at webAI are committed to living out the core values we have put in place as the foundation on which we operate as a team. We seek individuals who exemplify the following:Truth - Emphasizing transparency and honesty in every interaction and decision.Ownership - Taking full responsibility for one’s actions and decisions, demonstrating commitment to the success of our clients. Tenacity - Persisting in the face of challenges and setbacks, continually striving for excellence and improvement.Humility - Maintaining a respectful and learning-oriented mindset, acknowledging the strengths and contributions of others.Benefits:Competitive salary and performance-based incentives.Comprehensive health, dental, and vision benefits package.401k Match (US-based only)$200/mos Health and Wellness Stipend$400/year Continuing Education Credit$500/year Function Health subscription (US-based only)Free parking, for in-office employeesUnlimited Approved PTOParental Leave for Eligible EmployeesSupplemental Life Insurance webAI is an Equal Opportunity Employer and does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We adhere to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, it is the policy of webAI to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Figure.jpg

Robot Network Software Engineer

Figure AI
USD
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Figure.jpg

Security Engineer, Application Security

Figure AI
USD
0
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Speechify.jpg

Tech Lead, Web Core Product & Chrome Extension - Fayetteville, USA

Speechify
USD
0
140000
-
200000
US.svg
United States
Full-time
Remote
true
PLEASE APPLY THROUGH THIS LINK: https://job-boards.greenhouse.io/speechify/jobs/5287658004  DO NOT APPLY BELOW The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its App of the Day. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. This is a key role and ideal for someone who thinks strategically, enjoys fast-paced environments, passionate about making product decisions, and has experience building great user experiences that delight users. We are a flat organization that allows anyone to become a leader by showing excellent technical skills and delivering results consistently and fast. Work ethic, solid communication skills, and obsession with winning are paramount.  Our interview process involves several technical interviews and we aim to complete them within 1 week.  What You’ll Do Work alongside machine learning researchers, engineers, and product managers to bring our AI Voices to their customers for a diverse range of use cases Deploy and operate the core ML inference workloads for our AI Voices serving pipeline Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our deployed models Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues An Ideal Candidate Should Have Experience shipping Python-based services Experience being responsible for the successful operation of a critical production service Experience with public cloud environments, GCP preferred Experience with Infrastructure such as Code, Docker, and containerized deployments. Preferred: Experience deploying high-availability applications on Kubernetes. Preferred: Experience deploying ML models to production What We Offer A dynamic environment where your contributions shape the company and its products A team that values innovation, intuition, and drive Autonomy, fostering focus and creativity The opportunity to have a significant impact in a revolutionary industry Competitive compensation, a welcoming atmosphere, and a commitment to an exceptional asynchronous work culture The privilege of working on a product that changes lives, particularly for those with learning differences like dyslexia, ADD, and more An active role at the intersection of artificial intelligence and audio – a rapidly evolving tech domain Salary The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job?  Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit?  Refer them!  Speechify is committed to a diverse and inclusive workplace.  Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Speechify.jpg

Tech Lead, Web Core Product & Chrome Extension - Glendale, USA

Speechify
USD
140000
-
200000
US.svg
United States
Full-time
Remote
true
PLEASE APPLY THROUGH THIS LINK: https://job-boards.greenhouse.io/speechify/jobs/5287658004  DO NOT APPLY BELOW The mission of Speechify is to make sure that reading is never a barrier to learning. Over 50 million people use Speechify’s text-to-speech products to turn whatever they’re reading – PDFs, books, Google Docs, news articles, websites – into audio, so they can read faster, read more, and remember more. Speechify’s text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named Speechify its App of the Day. Today, nearly 200 people around the globe work on Speechify in a 100% distributed setting – Speechify has no office. These include frontend and backend engineers, AI research scientists, and others from Amazon, Microsoft, and Google, leading PhD programs like Stanford, high growth startups like Stripe, Vercel, Bolt, and many founders of their own companies. This is a key role and ideal for someone who thinks strategically, enjoys fast-paced environments, passionate about making product decisions, and has experience building great user experiences that delight users. We are a flat organization that allows anyone to become a leader by showing excellent technical skills and delivering results consistently and fast. Work ethic, solid communication skills, and obsession with winning are paramount.  Our interview process involves several technical interviews and we aim to complete them within 1 week.  What You’ll Do Work alongside machine learning researchers, engineers, and product managers to bring our AI Voices to their customers for a diverse range of use cases Deploy and operate the core ML inference workloads for our AI Voices serving pipeline Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our deployed models Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues An Ideal Candidate Should Have Experience shipping Python-based services Experience being responsible for the successful operation of a critical production service Experience with public cloud environments, GCP preferred Experience with Infrastructure such as Code, Docker, and containerized deployments. Preferred: Experience deploying high-availability applications on Kubernetes. Preferred: Experience deploying ML models to production What We Offer A dynamic environment where your contributions shape the company and its products A team that values innovation, intuition, and drive Autonomy, fostering focus and creativity The opportunity to have a significant impact in a revolutionary industry Competitive compensation, a welcoming atmosphere, and a commitment to an exceptional asynchronous work culture The privilege of working on a product that changes lives, particularly for those with learning differences like dyslexia, ADD, and more An active role at the intersection of artificial intelligence and audio – a rapidly evolving tech domain Salary The United States base salary range for this full-time position is $140,000-$200,000 + bonus + equity depending on experience Think you’re a good fit for this job?  Tell us more about yourself and why you're interested in the role when you apply. And don’t forget to include links to your portfolio and LinkedIn. Not looking but know someone who would make a great fit?  Refer them!  Speechify is committed to a diverse and inclusive workplace.  Speechify does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Shield AI.jpg

Staff Engineer, Systems Test (R4151)

Shield AI
USD
140000
-
210000
US.svg
United States
Full-time
Remote
false
Founded in 2015, Shield AI is a venture-backed deep-tech company with the mission of protecting service members and civilians with intelligent systems. Its products include the V-BAT and X-BAT aircraft, Hivemind Enterprise, and the Hivemind Vision product lines. With nine offices and facilities across the U.S., Europe, the Middle East, and the Asia-Pacific, Shield AI’s technology actively supports operations worldwide. For more information, visit www.shield.ai. Follow Shield AI on LinkedIn, X, Instagram, and YouTube. Job Description:We’re seeking a Staff Integration & Test Engineer to lead advanced integration and test activities for Shield AI’s Hivemind autonomy systems in Frisco, TX. You’ll define and execute test strategies that span Hivemind Software integration, simulation, hardware-in-the-loop, vehicle-in-the-loop, and live flight operation, ensuring robust system performance and reliability in real-world mission environments.  As a senior technical leader, you’ll architect test infrastructure, collaborate with cross-functional teams, mentor teammates and drive continuous improvement in validation methodologies and test automation. This role is deeply hands-on and highly collaborative, working across software, hardware, systems, and flight test disciplines to ensure seamless integration of Hivemind autonomy on platforms such as VBAT.  Shield AI is scaling and growing rapidly, the ideal candidate will demonstrate adaptability, a growth mindset, and a willingness to learn new technologies and methodologies quickly in a fast-paced, evolving environment. This is an opportunity to grow alongside a company that is changing the world, building something insanely great with a mission-driven culture, a sense of urgency, and an unwavering commitment to protecting those who serve. What you'll do:Lead system-level integration, test planning, and validation for advanced autonomous aircraft systemsDefine and implement test architectures, methodologies, and strategies across simulation, HIL, VIL, and flight environments. Own and manage comprehensive test plans, defining objectives, success criteria, procedures, and resource needs. Architect and evolve test infrastructure and automation frameworks that enable scalable and repeatable validation. Define Hivemind Software test release processes and quality release gates. Collaborate closely with software, hardware, and systems engineering teams to ensure robust integration and system readiness. Conduct hands-on debugging and validation of autonomy, avionics, and control systems. Lead flight test preparation, system configuration, and real-time troubleshooting during live events.Develop tools and utilities in Python (and optionally C++) to support automation, data analysis, and telemetry validation.Establish test documentation standards, ensuring traceability, repeatability, and knowledge sharing across teams. Mentor and provide technical direction to junior and senior engineers, fostering a culture of technical rigor and continuous improvement. Partner with program and mission teams to communicate test readiness, progress, and system performance effectively.Required qualification:Bachelor’s or Master’s degree in Engineering, Computer Science, Robotics, Aerospace Engineering, or related technical discipline.8+ years of experience in system integration, test planning, and validation of complex systems—ideally within robotics, aerospace, or autonomy.Proven expertise in test planning, including test plan creation, test case design, and validation tracking and software quality release processes.Deep understanding of test infrastructure, automation, and validation methodologies.Strong proficiency in Python for scripting, automation, and analysis; working knowledge of C++ preferred.Experience architecting and maintaining Hardware-in-the-Loop (HIL), Vehicle-in-the-Loop (VIL), or similar real-time test systems.Proven ability to troubleshoot complex, multidisciplinary systems involving software, hardware, and controls.Demonstrated success leading technical projects, mentoring engineers, and defining test strategies for multi-system programs.Excellent communication and cross-functional collaboration skills.Adaptability, growth mindset, and willingness to learn new technologies quickly in a scaling, fast-paced environment.Self-starter with strong sense of urgency, initiative, and comfort operating in ambiguity.U.S. Citizenship and ability to obtain and maintain a SECRET clearance.Preferred qualifications:Experience testing or integrating autonomous air, ground or sea vehicles.Background in defense, aerospace, or mission-critical robotics systems.Experience developing test infrastructure and automation frameworks at organizational scale.Familiarity with simulation and modeling tools for system-level validation.Knowledge of configuration management, verification processes, and data analytics for test reporting.Experience supporting flight test operations, including safety, instrumentation, and post-flight analysis. 140,000 - 210,000 a year#LI-LD1#LD Full-time regular employee offer package: Pay within range listed + Bonus + Benefits + Equity Temporary employee offer package: Pay within range listed above + temporary benefits package (applicable after 60 days of employment) Salary compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. All offers are contingent on a cleared background and possible reference check. Military fellows and part-time employees are not eligible for benefits. Please speak to your talent acquisition representative for more information. ### Shield AI is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know. 
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Robotics Engineer
Software Engineering
Apply
Hidden link
Figure.jpg

Systems Architect - Active Safety

Figure AI
USD
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Cohere Health.jpg

Staff Software Engineer, GPU Infrastructure (HPC)

Cohere
-
CA.svg
Canada
Full-time
Remote
true
Who are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.Join us on our mission and shape the future!Why this team?The internal infrastructure team is responsible for building world-class infrastructure and tools used to train, evaluate and serve Cohere's foundational models. By joining our team, you will work in close collaboration with AI researchers to support their AI workload needs on the cutting edge, with a strong focus on stability, scalability, and observability. You will be responsible for building and operating superclusters across multiple clouds. Your work will directly accelerate the development of industry-leading AI models that power Cohere's platform North. We’re hiring software engineers at multiple levels. Whether you’re early in your career or a seasoned staff engineer, you’ll find opportunities to grow and make an impact here.Please Note: All of our infrastructure roles require participating in a 24x7 on-call rotation, where you are compensated for your on-call schedule. As a Staff Software Engineer, you will:Build and scale ML-optimized HPC infrastructure: Deploy and manage Kubernetes-based GPU/TPU superclusters across multiple clouds, ensuring high throughput and low-latency performance for AI workloads.Optimize for AI/ML training: Collaborate with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, leveraging technologies like RDMA, NCCL, and high-speed interconnects.Troubleshoot and resolve complex issues: Proactively identify and resolve infrastructure bottlenecks, performance degradation, and system failures to ensure minimal disruption to AI/ML workflows.Enable researchers with self-service tools: Design intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently.Drive innovation in ML infrastructure: Work closely with AI researchers to understand emerging needs (e.g., JAX, PyTorch, distributed training) and translate them into robust, scalable infrastructure solutions.Champion best practices: Advocate for observability, automation, and infrastructure-as-code (IaC) across the organization, ensuring systems are maintainable and resilient.Mentorship and collaboration: Share expertise through code reviews, documentation, and cross-team collaboration, fostering a culture of knowledge transfer and engineering excellence. You may be a good fit if you have:Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments.Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads.Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions.Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads.Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges.Self-directed problem-solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast-paced environment.If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.Full-Time Employees at Cohere enjoy these Perks:🤝 An open and inclusive culture and work environment 🧑‍💻 Work closely with a team on the cutting edge of AI research 🍽 Weekly lunch stipend, in-office lunches & snacks🦷 Full health and dental benefits, including a separate budget to take care of your mental health 🐣 100% Parental Leave top-up for up to 6 months🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement🏙 Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend✈️ 6 weeks of vacation (30 working days!)
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Lambda.jpg

Manager - Security Architecture

Lambda AI
USD
297000
-
495000
US.svg
United States
Full-time
Remote
false
Lambda, The Superintelligence Cloud, builds Gigawatt-scale AI Factories for Training and Inference. Lambda’s mission is to make compute as ubiquitous as electricity and give every person access to artificial intelligence. One person, one GPU. If you'd like to build the world's best deep learning cloud, join us.  *Note: This position requires presence in our San Francisco, San Jose, or Seattle office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.About the RoleLambda Security protects some of the world's most valuable digital assets: invaluable training data, model weights representing immense computational investments, and the sensitive inputs required to leverage best of breed AI models. We're responsible for securing every byte that powers breakthrough artificial intelligence.Reporting to the Senior Manager of Security, your team serves dual functions: building security for the business and demonstrating that work directly to customers. As security advisors to Product Engineering, Platform Engineering, and IT teams, your team will establish security policies and architecture standards, conduct threat modeling and design reviews for critical systems, and create implementation guidance that engineering teams can adopt. In support of our customers, your team will develop customer-facing security documentation and participate directly in enterprise security discussions. This work ensures the right security decisions get made across Lambda's AI infrastructure while protecting customer data, enabling hypergrowth velocity, and building the trust that closes enterprise deals.As Manager of the Security Architecture team, you'll build and lead a team of 4-5 security engineers with expertise across application security, infrastructure security, and corporate security. You'll hire strong specialists, coach them through complex security problems, set team priorities and architectural direction, and create a culture where security judgment accelerates business velocity rather than creating friction.Your success is measured by the security decisions your team enables across the business: engineering teams building secure-by-default systems, compliance frameworks mapped to technical controls, and customers trusting Lambda's infrastructure with their most valuable AI workloads. Your team will balance proactive architecture work (defining what "good" looks like) with reactive consultation (reviewing designs and answering complex security questions).Your immediate focus will be building your team, establishing processes for design reviews and architecture guidance that scale with Lambda's growth, and developing a 6-12 month roadmap aligned with Lambda's 2026 security strategic plan including compliance initiatives like ISO 27001.We're looking for engineering managers who pair strong people leadership with enough security depth to coach specialists, set architectural direction, and translate security decisions into business value. If you're energized by building high-performing teams, enabling security at scale through excellent judgment rather than brute force, and helping enterprise customers trust their most valuable AI workloads to Lambda's infrastructure, we'd love to talk.We value diverse backgrounds, experiences, and skills, and we are excited to hear from candidates who can bring unique perspectives to our team. If you do not exactly meet this description but believe you may be a good fit, please still apply and help us understand your readiness for this role. Your application is not a waste of our time.What You'll DoTeam Leadership & DevelopmentBuild, hire, and develop a high-performing team of 4-5 security engineers with deep expertise across application security, infrastructure security, and corporate security.Foster a culture where security judgment accelerates business velocity, creating an environment where specialists thrive through clear expectations, regular coaching, and opportunities for growth.Conduct regular one-on-ones and provide constructive feedback that helps your engineers advance their technical depth and expand their cross-functional impact.Set team priorities and architectural direction, ensuring your team focuses on the highest-impact security decisions across Lambda's AI infrastructure.Strategic Architecture & Program ManagementOwn your team's 6-12 month roadmap, balancing proactive architecture work (defining security standards and patterns) with reactive consultation (design reviews and complex security questions).Establish security policies and architecture standards that enable Product Engineering, Platform Engineering, and IT teams to build secure-by-default systems.Define measurable success criteria for your team's work, translating security architecture decisions into business impact that stakeholders understand.Proactively guide the evolution of Lambda's security architecture program as the company matures, ensuring architecture decisions align with compliance commitments and evolving customer security requirements.Cross-Functional Collaboration & Customer EnablementPartner deeply with Product Engineering, Platform Engineering, and IT teams to integrate security architecture guidance at optimal moments in their development cycles.Conduct and oversee threat modeling and design reviews for critical systems, ensuring your team provides actionable recommendations that balance security rigor with development velocity.Enable your team to create implementation guidance and architecture patterns that engineering teams voluntarily adopt because they make secure development easier.Support enterprise sales by developing customer-facing security documentation and coaching your team through direct security discussions with prospective customers evaluating Lambda's infrastructure.Collaborate with peer security teams (Detection & Response, Platform, Program Coordination) to ensure cohesive security architecture across all security functions.What We Think a Candidate Needs to Demonstrate to Succeed5+ years of security engineering or security architecture experience with 3+ years leading technical teams, demonstrating ability to build and develop high-performing security specialists.Proven track record building team cultures where specialists thrive through clear expectations, effective coaching, and career development that expands both technical depth and cross-functional impact.Strong technical background in security architecture, threat modeling, and secure design principles with enough depth to guide team decisions, evaluate complex tradeoffs, and coach engineers through difficult security problems.Experience working across application security, infrastructure security, or corporate security domains, with demonstrated ability to set architectural direction and security standards that engineering teams adopt.Excellent collaboration skills working with highly technical engineering teams both with and without authority, building relationships that enable security architecture guidance at optimal moments in development cycles.Skilled communicator who translates security architecture decisions into business value, helping stakeholders understand how technical security work protects customer data and enables business velocity.Ability to thrive in high-speed, high-ambiguity startup environments where you balance building team capability and security architecture foundations while executing at a fast pace.Nice to HavePrior experience in AI/ML infrastructure companies or cloud service providers where you've navigated the unique security challenges of multi-tenant systems and customer data isolation at scale.Hands-on experience driving compliance audits (SOC 2, ISO 27001, PCI-DSS, HIPAA/HITECH, or FedRAMP) including evidence collection, control mapping, and managing auditor relationships.Deep familiarity with bare metal infrastructure security in addition to cloud platforms, understanding physical security considerations and hardware-level security controls.Experience creating security architecture patterns that were adopted widely across multiple teams or organizations, demonstrating ability to build reusable solutions that scale beyond a single use case.Experience managing security engineers through significant career transitions, such as promoting ICs to lead roles or helping specialists successfully pivot between security domains.Enthusiasm about leveraging Lambda's access to state-of-the-art LLMs to pioneer AI-powered security architecture capabilities—imagine automated threat modeling, intelligent design review assistance, and architecture validation at scale only possible when you host the AI infrastructure yourself.Salary Range InformationThe annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.About LambdaFounded in 2012, ~400 employees (2025) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
MLOps / DevOps Engineer
Data Science & Analytics
Software Engineer
Software Engineering
Apply
Hidden link
Figure.jpg

Firmware Intern [Summer 2026]

Figure AI
USD
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
H Company.jpg

Member of technical staff (Infrastructure)

H Company
-
FR.svg
France
GB.svg
United Kingdom
Full-time
Remote
false
About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.About the Team: The Infrastructure team aims to make it seamless for our researchers and engineers to access and use the infrastructure they need to do their job. The team also ensures the underlying infrastructure for our public services is robust, reliable and scalable. Members of the Infra team are uniquely positioned to impact all areas of H, from building everything from our foundational models to our agents, all the way to our public services.Key Responsibilities:Designing and managing the infrastructure to supportResearch efforts in Model and Agent development incl. training infrastructure, data pipelines and inference.Product Engineering efforts on H Company’s agent platform including client-facing APIs and agent runtimes within various deployment scenarios (multi-tenant and on-prem).Setup and maintain observability and monitoring strategies.Requirements:MUST HAVEObservability and monitoring (Datadog, Prometheus, Grafana, …)Good knowledge of a modern programming language (ideally Python or JS/Typescript)NICE TO HAVEML Ops or Data EngineeringExperience architecting and deploying distributed systems on public cloud (AWS, Azure, GCP)Containerization and orchestration tools (Docker, Kubernetes, …)Infrastructure as code (CDK, Terraform, ...)CICD management experience (Github Actions, Gitlab CI, TeamCity, ...).Location:Paris or London.This role is hybrid, and you are expected to be in the office 3 days a week on average.What We Offer:Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startupsCollaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environmentEnjoy a competitive salaryUnlock opportunities for professional growth, continuous learning, and career developmentIf you want to change the status quo in AI, join us.
MLOps / DevOps Engineer
Data Science & Analytics
Data Engineer
Data Science & Analytics
Apply
Hidden link
H Company.jpg

Senior Member of technical staff (Infrastructure)

H Company
-
FR.svg
France
GB.svg
United Kingdom
Full-time
Remote
false
About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.About the Team: The Infrastructure team aims to make it seamless for our researchers and engineers to access and use the infrastructure they need to do their job. The team also ensures the underlying infrastructure for our public services is robust, reliable and scalable. Members of the Infra team are uniquely positioned to impact all areas of H, from building everything from our foundational models to our agents, all the way to our public services.Key Responsibilities:Designing and managing the infrastructure to supportResearch efforts in Model and Agent development incl. training infrastructure, data pipelines and inference.Product Engineering efforts on H Company’s agent platform including client-facing APIs and agent runtimes within various deployment scenarios (multi-tenant and on-prem).Setup and maintain observability and monitoring strategies.Mentor and grow other engineers in infrastructure-related topics as well as general engineering practices.Requirements:MUST HAVEML Ops or Data Engineering relevant experienceExperience architecting and deploying distributed systems on public cloud (AWS, Azure, GCP)Observability and monitoring (Datadog, Prometheus, Grafana, …)Good knowledge of a modern programming language (ideally Python or JS/Typescript)NICE TO HAVEContainerization and orchestration tools (Docker, Kubernetes, …)Infrastructure as code (CDK, Terraform, ...)CICD management experience (Github Actions, Gitlab CI, TeamCity, ...).Location:Paris or London.This role is hybrid, and you are expected to be in the office 3 days a week on average.What We Offer:Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startupsCollaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environmentEnjoy a competitive salaryUnlock opportunities for professional growth, continuous learning, and career developmentIf you want to change the status quo in AI, join us.
MLOps / DevOps Engineer
Data Science & Analytics
Data Engineer
Data Science & Analytics
Apply
Hidden link
No job found
There is no job in this category at the moment. Please try again later