The opportunity
Data is core to Radical AI's mission and as a Data Engineer, you'll collaborate with cross-functional teams (software, ML, materials science) to build robust data solutions.
You will be directly involved in designing, building and maintaining infrastructure to support agentic workflows for our AI materials design engine, as well as maturing our existing data infrastructure that relates to our autonomous lab. In this you'll create a best-in-class system to accelerate Radical AI's ability to develop the next evolution of scientific progress and get to work at the intersection of cutting edge models, autonomous experimentation, and materials science.
As a foundational member of the team, you'll have the opportunity to shape the standards, culture, and values of the data engineering team.
Mission
Work side by side with AI researchers and engineers, shaping the next generation of autonomous systems for materials discovery.Own the end-to-end planning, execution, and delivery of largescale data initiatives. Drive a continuous improvement culture through retrospectives, process streamlining, and lean execution experiments.Help define the roadmap for agentic AI automation, enabling intelligent workflows, process automation, and AI-driven decision-making within scientific discovery.Design and implement data pipelines connecting our autonomous lab with operational and product systems, including our AI-driven discovery engine.Build and support scalable, audit-proof architecture.Define and enforce best practices for data modeling, lineage, observability, and reconciliation across finance data domains.Ensure data systems are AI-ready and capable of supporting predictive analytics, autonomous agent workflows, and large-scale automation.
About you
B.S. or M.S. in Computer Science, Information Security, Engineering, or a related field, or equivalent practical experience.6+ years in data engineering, with proven experience building and managing enterprise-scale, auditable ETL pipelines and complex datasets.Strong background in computer science: algorithms, data structures, system design and data modeling.Strong Python and/or Go skills and and experience deploying production-grade code (not just notebooks).Experience with relational SQL/NoSQL databases and data lakehouse/warehouse frameworks (e.g., BigQuery, Snowflake, Redshift, Databricks).Experience with data pipeline frameworks (e.g., Beam, Spark, Kafka, Pulsar, etc.), modern data orchestration frameworks (e.g., Dagster, Prefect), and cloud-native storage (e.g., S3, ADLS).Experience with large scale distributed system design, operation and optimization.Experience with lightweight, embedded analytical databases (e.g., Duckdb, SQLite).Proficient at scaling up and managing high-volume document ingestion pipelines.An experimental mindset: comfortable testing hypotheses, learning from failures, and iterating quickly; doesn’t shy away from problems with no right answer.Proven ability to scope ambiguous problems, develop end-to-end solutions, and communicate outcomes effectively.
Pluses
Experience with vector databases (e.g. Pinecone, pgvector, Qdrant, TurboPuffer, etc.).Knowledge of or experience with aagentic processes and/or frameworks (e.g., LangChain, CrewAI, PydanticAI, AutoGen, etc.).Demonstrated experience with LLMOps (e.g., Google Cloud Vertex AI, Amazon SageMaker, Azure Machine Learning, etc.)Experience with full-stack prototyping.