Role Overview
We are looking for a high-caliber Data Engineer who can architect and scale the data systems that power our AI workflows. You’ll be responsible for building reliable data pipelines, integrating external APIs, maintaining clean and structured data models, and enabling the product and ML teams to iterate quickly.
You should thrive in ambiguous environments, enjoy wearing multiple hats, and be comfortable designing end-to-end data solutions with minimal direction.
What You’ll Own
Design, build, and maintain scalable data pipelines that process and transform large volumes of structured and unstructured data.
Manage ingestion from third-party APIs, internal systems, and customer datasets.
Develop and maintain data models, data schemas, and storage systems optimized for ML and product performance.
Collaborate with ML engineers to prepare model-ready datasets, embeddings, feature stores, and evaluation data.
Implement data quality monitoring, validation, and observability.
Work closely with product engineers to support new features that rely on complex data flows.
Optimize systems for performance, cost, and reliability.
Contribute to early architecture decisions, infrastructure design, and best practices for data governance.
Build tooling that enables the entire team to access clean, well-structured data.
Who You Are
Builder Mentality
You’re a hands-on engineer who thrives in a fast-paced environment, enjoys autonomy, and takes ownership of problems from start to finish.
Strong Communication
You translate technical complexity into clarity. You work well with ML, product, and GTM partners.
Practical, Not Academic
You can design elegant systems but default to shipping solutions that work and can be iterated on.
Detail-Oriented & Reliable
You care about clean pipelines, reproducibility, and data correctness.
What You Bring
3+ years of experience as a Data Engineer, ML Engineer, Backend Engineer, or similar.
Proficiency in Python, SQL, and modern data tooling (dbt, Airflow, Dagster, or similar).
Experience designing and operating ETL/ELT pipelines in production.
Experience with cloud platforms (AWS, GCP, or Azure).
Familiarity with data lakes, warehouses, and vector databases.
Experience integrating APIs and working with semi-structured data (JSON, logs, event streams).
Strong understanding of data modeling and optimization.
Bonus: experience supporting LLMs, embeddings, or ML training pipelines.
Bonus: startup experience or comfort working in fast, ambiguous environments.
What Success Looks Like
Stable, documented, testable pipelines powering ML and product features.
High-quality data consistently available for analytics, modeling, and core product workflows.
Faster iteration cycles for the Engineering and ML teams due to improved tooling.
Clear visibility into data quality and reliability.
Strong cross-functional collaboration and communication.
Why Artisan
Build core systems at the heart of a fast-growing AI company.
High autonomy, high impact, zero bureaucracy.
Work with a talented, ambitious team solving meaningful problems.
Shape the data platform from the ground up.





