At Sanity.io, we’re building the future of AI-powered Content Operations. Our AI Content Operating System gives teams the freedom to model, create, and automate content the way their business works, accelerating digital development and supercharging content operations efficiency. Companies like SKIMS, Figma, Riot Games, Anthropic, COMPLEX, Nordstrom, and Morningbrew are using Sanity to power and automate their content operations.
As part of our new venture, your work will center on addressing one of AI’s toughest problems: how to help machines truly understand and use human-created content. You’ll build systems that structure and enrich large volumes of information to enable AI agents and LLMs to access the right context at the right time. This means designing and developing tools and pipelines that shape, structure, and connect information and content in innovative ways, and creating new methods to ensure AIs reflect the most accurate, authentic, and up-to-date representation of a business, its brand, products, and knowledge base.
As a Senior Data Engineer you'll architect and optimize the data infrastructure that powers our next generation of AI capabilities. You'll be the engine behind our AI systems, building scalable, efficient data pipelines that process massive volumes of content while maintaining low latency and managing costs intelligently. Your work will directly enable AI agents and LLMs to access the right data at the right time. You'll join a small, cross-functional team where your expertise in data engineering and ML infrastructure will be critical to turning ambitious AI concepts into production-ready systems. If you're passionate about building robust data systems that power cutting-edge AI, obsess over performance optimization, and love solving complex scaling challenges, we'd love to have you on the team.
What you will do:
Design, build, and optimize scalable data pipelines for AI and ML workloads, handling large volumes of structured and unstructured content data.
Architect data processing systems that transform, enrich, and prepare content for LLM consumption, with a focus on latency optimization and cost efficiency.
Build ETL/ELT workflows that extract, transform, and load data from diverse sources to support real-time and batch AI operations.
Implement data quality monitoring and observability systems to ensure pipeline reliability and data accuracy for AI models.
Collaborate with engineers and product teams to understand data requirements and design optimal data architectures that support AI features.
Optimize data storage strategies across data lakes, warehouses, and vector databases to balance performance, cost, and scalability.
Build automated data validation and testing frameworks to maintain data integrity throughout the pipeline.
Stay at the forefront of LLM research, understanding model behaviors, limitations, and capabilities to inform system design decisions.
Monitor and optimize pipeline performance, identifying bottlenecks and implementing solutions to improve throughput and reduce latency.
Create clear documentation of data architectures, pipeline logic, and operational procedures.
About you:
Based in the San Francisco Bay Area and able to work at least 2 days per week in our San Francisco office.
5+ years of data engineering experience, with at least 2 years focused on AI/ML data pipelines or supporting machine learning workloads.
High level of proficiency in Python and SQL.
Strong experience with distributed data processing frameworks like Apache Spark, Dask, or Ray.
Proficiency with GCP and their data services.
Experience with real-time data streaming technologies like Kafka, Redpanda or NATS.
Familiarity with vector databases (e.g., Milvus, ElasticSearch, Vespa) and their role in AI applications.
Experience with data modeling, schema design, and working with both relational and NoSQL databases (PostgreSQL, MongoDB, Cassandra).
Strong focus on performance optimization, cost management, and building systems that scale efficiently.
Experience implementing data observability and monitoring solutions (e.g., Prometheus, ClickHouse).
Ability to write clean, well-documented, maintainable code with proper testing practices.
Excellent problem-solving skills and a data-driven approach to decision making.
Strong communication skills and ability to collaborate effectively with cross-functional teams.
Comfortable with ambiguity and excited about working on undefined problems that require creative solutions.
Familiarity with data pipeline orchestration tools such as Airflow, Dagster, Prefect, or similar frameworks is a nice to have.
What we can offer:
A highly skilled, inspiring, and supportive team.
Positive, flexible, and trust-based work environment that encourages long-term professional and personal growth.
A global, multi-culturally team of colleagues and customers.
Comprehensive health plans and perks.
A healthy work-life balance that accommodates individual and family needs.
Competitive salary and stock options program.
Base Salary Range: $210,000 - $265,000 annually. Final compensation within this range will be determined based on the candidate’s experience and skill set.
Who we are:
Sanity.io is a modern, flexible content operating system that replaces rigid legacy content management systems. One of our big differentiators is treating content as data so that it can be stored in a single source of truth, but seamlessly adapted and personalized for any channel without extra effort. Forward-thinking companies choose Sanity because they can create tailored content authoring experiences, customized workflows, and content models that reflect their business.
Sanity recently raised a $85m Series C led by GP Bullhound and is also backed by leading investors like ICONIQ Growth, Threshold Ventures, Heavybit and Shopify, as well as founders of companies like Vercel, WPEngine, Twitter, Mux, Netlify and Heroku. This funding round has put Sanity in a strong position for accelerated growth in the coming years.
You can only build a great company with a great culture. Sanity is a 200+ person company with highly committed and ambitious people. We are pioneers, we exist for our customers, we are hel ved, and we love type two fun! Read more about our values here!
Sanity.io pledges to be an organization that reflects the globally diverse audience that our product serves. We believe that in addition to hiring the best talent, a diversity of perspectives, ideas, and cultures leads to the creation of better products and services. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or gender identity.





