Senior AI Engineer - San Mateo, CA
The role involves training, evaluating, and monitoring new and improved LLMs and other algorithmic models. The engineer will test and deploy content moderation models in production and iterate based on real-world performance metrics and feedback loops. They are expected to develop medium to long-term vision for content understanding-related R&D, collaborating with management, product, policy & operations, and engineering teams. The position requires taking ownership of results delivered to customers, advocating for changes in approach where needed, and leading cross-functional execution.
Forward deployed engineer
Serve as WRITER's embedded technical AI specialist onsite, building strong daily relationships with client engineering, AI COE, and business teams, acting as the primary point of contact for all WRITER-related technical matters. Own the end-to-end release workflow from WRITER platform updates through production deployment across all client environments, including coordinating dual-track updates, making critical go/no-go decisions, and providing hands-on troubleshooting. Design, implement, and maintain the client's custom front-end setup, production retrieval, and agentic systems, rigorously testing, validating, and troubleshooting AI-specific features like LLM applications, RAG performance, and prompt engineering. Navigate client security, risk, and compliance approval processes by working with Information Security teams to obtain necessary approvals, documenting compliance artifacts, and streamlining workflows to meet regulatory requirements. Reduce operational pain points, contribute to long-term migration strategies for clients towards standard WRITER-managed environments, and gather real-world deployment feedback to influence WRITER's product direction strategically.
Applied AI Engineer – Agentic Workflows
Work with enterprise customers and internal teams to turn business workflows into scalable, production-ready agentic AI systems. Design and build LLM-powered agents that reason, plan, and act across tools and data sources with enterprise-grade reliability. Balance rapid iteration with enterprise requirements, evolving prototypes into stable, reusable solutions. Define and apply evaluation and quality standards to measure success, failures, and regressions. Debug real-world agent behavior and systematically improve prompts, workflows, tools, and guardrails. Contribute to shared frameworks and patterns that enable consistent delivery across customers.
Forward Deployed Engineer
Design, build, and deploy predictive AI features, including natural language detection, autosuggestions, and intelligent prompt recommendations. Leverage Warp’s extensive user-generated content and team data to continuously refine AI prediction and personalization. Drive substantial improvements in code generation quality, including code completions, diff applications, and SWEbench performance. Implement and iterate specialized agents tailored for specific developer workflows and use cases. Optimize AI models through fine-tuning, advanced prompt engineering, and robust, data-driven feedback loops. Improve context retrieval systems, enabling Warp agents to retain and utilize memory effectively. Collaborate closely with product and engineering teams, rapidly shipping iterative improvements into production. Continuously elevate the user experience by refining interactions between developers and Warp AI.
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Developing and maintaining MCP-compatible evaluation servers; implementing logic to check agent actions against scenario definitions; creating or extending tools that writers and QAs use to test agents; working closely with infrastructure engineers to ensure compatibility; occasionally helping with test writing or debug sessions when needed.
Evaluation Scenario Writer - AI Agent Testing Specialist
Design realistic and structured evaluation scenarios for LLM-based agents by creating test cases that simulate human-performed tasks and defining gold-standard behavior to compare agent actions against. Create structured test cases that simulate complex human workflows, define gold-standard behavior and scoring logic to evaluate agent actions, analyze agent logs, failure modes, and decision paths, work with code repositories and test frameworks to validate scenarios, iterate on prompts, instructions, and test cases to improve clarity and difficulty, and ensure that scenarios are production-ready, easy to run, and reusable.
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Developing and maintaining MCP-compatible evaluation servers; implementing logic to check agent actions against scenario definitions; creating or extending tools that writers and QAs use to test agents; working closely with infrastructure engineers to ensure compatibility; occasionally helping with test writing or debug sessions when needed.
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Develop and maintain MCP-compatible evaluation servers, implement logic to check agent actions against scenario definitions, create or extend tools used by writers and QAs to test agents, work closely with infrastructure engineers to ensure compatibility, and occasionally assist with test writing or debugging sessions.
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Developing and maintaining MCP-compatible evaluation servers; implementing logic to check agent actions against scenario definitions; creating or extending tools that writers and QAs use to test agents; working closely with infrastructure engineers to ensure compatibility; occasionally helping with test writing or debug sessions when needed.
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Developing and maintaining MCP-compatible evaluation servers, implementing logic to check agent actions against scenario definitions, creating or extending tools that writers and QAs use to test agents, working closely with infrastructure engineers to ensure compatibility, and occasionally helping with test writing or debug sessions when needed.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.