Senior Software Engineer, Applied AI
Partner with internal stakeholders to understand their daily workflows to surface and document those that can be automated with AI. Build autonomous and semi-autonomous agent workflows that interact with browsers, codebases, and APIs to complete complex content operations. Learn and contribute to prompt engineering strategies, with guidance on building agentic workflows, memory systems, and multi-step reasoning. Build and operate MCP Servers that can be called by AI Agents. Design and build proprietary agent libraries, prompting strategies and benchmarking frameworks. Build and operationalize AI solutions with retrieval techniques (e.g., RAG, vector DBs) for context-aware applications. Prototype, test, and optimize AI-powered applications, including retrieval-augmented generation, workflow automation, and agentic experiences. Collaborate with cross-functional teams to align AI features with user needs and business goals, gaining exposure to product and platform thinking. Stay current on advancements in AI technologies, frameworks, and best practices, and evangelize AI capabilities internally and externally. Provide technical support, documentation, and training to facilitate adoption and effective use of AI solutions. Participate in technical discussions, architecture reviews, and sprint planning. Contribute to knowledge sharing and technical documentation.
Engineering Manager - Engine and Platform
The Engineering Manager for the Engine and Platform leads the team responsible for building, maintaining, and deploying the runtime for customers to run, manage, secure, and understand AI tools, enabling advanced agentic use-cases. This role involves scaling the team owning the development of the platform and services, which includes distributed systems engineers and authorization/identity experts developing features like MCP gateways, roles and permissions, and platform-as-service capabilities for tool executions. The manager ensures the team is unblocked, aligns the team's work with the product organization, and stays technically engaged through code reviews, critical contributions, and occasional hands-on coding. Responsibilities include owning deliverables, stability, and uptime, shaping product vision and architecture, owning technical direction and prioritization, hiring and mentoring engineers, defining and delivering platform features, and ensuring reliability, security, and enterprise readiness. The manager also focuses on building leverage into systems through automation and agents to improve efficiency and is expected to navigate ambiguity and evolving standards in AI tools.
Engineering Manager - Tool Development and Developer Experience
As the Engineering Manager for Tool Development & Developer Experience, you will lead the team responsible for the MCP framework, tool catalog, and systems enabling customers to build tools. You will be ultimately responsible for the team's deliverables, stability, and uptime while aligning the team’s work with the product organization and shaping the team's and company’s roadmap. You will hire and mentor engineers, define and deliver new MCP servers, ship high-impact features ensuring reliability, security, and enterprise readiness, and build leverage into the system by automating tasks. While primarily leading people, product, and operations, you are expected to stay technically engaged through reviews, critical-path contributions, and occasional coding to unblock the team. The role involves navigating ambiguity, evolving AI tool standards, and managing scaling challenges.
Site Reliability Engineer, Managed AI
The Site Reliability Engineer is responsible for designing and operating reliable managed AI services focused on serving and scaling large language model workloads. They build automation and reliability tooling to support distributed AI pipelines and inference services, define, measure, and improve SLIs/SLOs across AI workloads to ensure performance and reliability, and collaborate with AI, platform, and infrastructure teams to optimize large-scale training and inference clusters. Additionally, they automate observability by building telemetry and performance tuning strategies for latency-sensitive AI services, investigate and resolve reliability issues in distributed AI systems using telemetry, logs, and profiling, and contribute to the architecture of next-generation distributed systems designed specifically for AI-first environments.
Staff Software Engineer, GPU Infrastructure (HPC)
As a Staff Software Engineer, you will build and scale ML-optimized HPC infrastructure by deploying and managing Kubernetes-based GPU/TPU superclusters across multiple clouds ensuring high throughput and low-latency performance for AI workloads. You will optimize for AI/ML training by collaborating with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, using technologies like RDMA, NCCL, and high-speed interconnects. You will troubleshoot and resolve complex issues by identifying and resolving infrastructure bottlenecks, performance degradation, and system failures to minimize disruption to AI/ML workflows. You will enable researchers with self-service tools by designing intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently. You will drive innovation in ML infrastructure by working closely with AI researchers to understand emerging needs such as JAX, PyTorch, and distributed training and translating them into robust, scalable infrastructure solutions. You will champion best practices by advocating for observability, automation, and infrastructure-as-code (IaC) across the organization to ensure systems are maintainable and resilient. Additionally, you will provide mentorship and collaborate through code reviews, documentation, and cross-team efforts to foster a culture of knowledge transfer and engineering excellence.
Speech Software Engineer
Lead the design and implementation of a scalable, high-availability voice infrastructure that replaces legacy systems. Build and refine multi-threaded server frameworks capable of handling thousands of concurrent, real-time audio streams with minimal jitter and latency. Deploy robust ASR > LLM > TTS pipelines that process thousands of calls concurrently. Develop robust logic for handling media streams, ensuring seamless audio data flow between clients and machine learning models. Build advanced monitoring and load-testing tools specifically designed to simulate high-concurrency voice traffic. Partner with Speech Scientists and Research Engineers to integrate state-of-the-art models into a production-ready environment.
Senior Staff Systems Engineer
Drive the architectural vision for the GenerativeAgent product by designing and building a highly scalable, multi-agent platform for real-time voice and text customer service experiences across various industries. Act as a technical authority and advisor for multiple engineering teams, develop system design and technical roadmaps, and define communication, state management, and orchestration patterns for multi-agent systems. Design and implement scalable, multi-tenant deployment architectures, own and define system-level SLOs/SLIs focusing on latency, cost-efficiency, and fault tolerance, identify systemic risks with proactive mitigation strategies, partner with Security and Compliance teams to meet regulatory and security requirements, lead post-incident analysis and improvements, and collaborate cross-functionally with Product, Customer Engineering, Site Reliability Engineering, TPMs, and Research to translate business requirements into system designs and productionize ML research. Mentor senior engineers and communicate complex technical concepts to both technical and non-technical stakeholders.
Software Engineer, Backend
The backend developer will own major feature development and work directly with founders on product development from end to end. Responsibilities include working with a small interdisciplinary team across hardware, software, and design to build new products from scratch; building new features and designing new architecture to address challenging problems; building backend infrastructure to perform scalable training in the cloud; rethinking and refactoring existing codebases for scale; and continuously improving and maintaining code in production. The role involves full ownership throughout the entire product lifecycle, including idea generation, design, prototyping, execution, and shipping, contributing to multiple parts of the codebase in various programming languages.
Site Reliability Engineer, Inference Infrastructure
As a Site Reliability Engineer on the Model Serving team, you will build self-service systems that automate managing, deploying, and operating services, including custom Kubernetes operators supporting language model deployments. You will automate environment observability and resilience, enabling all developers to troubleshoot and resolve problems, and take steps to ensure defined SLOs are met, including participating in an on-call rotation. Additionally, you will build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback, as well as develop the team through knowledge sharing and an active review process.
Staff Software Engineer, Inference Infrastructure
The role involves building high-performance, scalable, and reliable machine learning systems, specifically working on the Model Serving team to develop, deploy, and operate the AI platform that delivers large language models through API endpoints. Responsibilities include working closely with multiple teams to deploy optimized NLP models to production environments characterized by low latency, high throughput, and high availability. The role also includes interfacing with customers and creating customized deployments to meet their specific needs.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.