
We envision a future where deploying generative AI is reliable, transparent, and accessible across teams beyond just developers. At Gentrace, we are building the foundational infrastructure to transform how AI products are tested, monitored, and improved, so that innovation in AI can be accelerated with confidence and accountability.
By creating a developer platform that bridges the gap between technical and non-technical stakeholders, we empower cross-functional collaboration that raises the standard for AI reliability and safety. Our technology allows users to trace errors and evaluate AI models at scale, making complex generative AI systems understandable and manageable with ease.
Driven by expertise in both DevOps and AI, Gentrace is pioneering the tools that will shape the future of AI development—enabling organizations to build more robust, ethical, and powerful AI applications that serve the needs of users worldwide.
Our Review
We've been watching Gentrace since their 2023 launch, and what strikes us most isn't just their AI testing platform—it's how they're democratizing something that's been locked away in engineering teams for too long. Most companies building with LLMs are flying blind when it comes to testing, and Gentrace is handing them a proper flight control system.
The Problem They Actually Solved
Here's what caught our attention: while everyone's rushing to ship AI features, most teams are still testing generative AI like it's 2019 software. Gentrace gets that LLMs need a completely different approach to evaluation and monitoring.
Their platform tackles the messy reality of AI development—those moments when your chatbot suddenly starts giving weird responses, or your AI summarizer decides to get creative with facts. Instead of crossing your fingers and hoping for the best, you can actually catch these issues before they reach users.
What Impressed Us Most
The Quizlet case study is genuinely impressive. Going from hours of manual testing to under a minute, with 40x more frequent testing? That's not just an improvement—that's a fundamental shift in how fast you can iterate on AI products.
But what we really love is their approach to collaboration. Making AI testing accessible to product managers and subject matter experts isn't just nice-to-have—it's essential. The best AI products come from cross-functional teams, not just engineers in isolation.
The Founders Know Their Stuff
Doug Safreno, Vivek Nair, and Daniel Liem bring serious credibility to this space. Their backgrounds scaling test infrastructure at Uber and Dropbox show they understand what it takes to build reliable systems at scale.
The $8 million Series A from Matrix Partners validates what we're seeing—this isn't just another AI tool, it's infrastructure that companies like Webflow are building their AI strategies around. When your platform becomes part of someone's "AI engineering stack," you know you're solving a real problem.
Developer platform for testing, evaluating, and monitoring generative AI applications
Unit testing and large dataset evaluations for AI models
Automated evaluation and tuning of retrieval systems
Prompt editing for AI performance improvement
Flexible dataset management with built-in tools
Collaboration support for non-technical users like product managers and coaches
Multimodal output support and experiment tracking






