About Sesame
Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.
About the Role
Vision understanding is a critical addition to conversational AI, bridging the gap between speech and the physical world. We’re looking for a skilled engineer or researcher to build high-value synthetic data pipelines that accelerate vision model development. The ideal candidate will be fluent in classical computer vision techniques while also comfortable leveraging modern machine learning tools across the stack: from neural rendering and diffusion-based image synthesis to transfer learning, domain adaptation, and data-centric evaluation. You’ll collaborate with research, hardware, and product teams to build capture, generation, and rendering systems that combine physical accuracy with visual realism—delivering datasets and simulators that measurably improve downstream computer vision tasks.
Responsibilities:
Build and maintain synthetic data generation pipelines (e.g., neural rendering, diffusion/score-based models, controllable generative priors, procedural assets) with levers for pose, expression, illumination, materials, and sensor characteristics.
Apply transfer learning and domain adaptation (self-supervised pretraining, style/appearance transfer, sim-to-real) to bridge distribution gaps between synthetic and real data.
Integrate off-the-shelf and open-source components where practical; fine-tune or distill models to meet latency, memory, and quality targets on target hardware.
Stand up end-to-end systems—from capture and calibration to generation, data curation, quality gates, rendering/evaluation suites, and deployment.
Define dataset and model evaluation frameworks (coverage, bias, sim-to-real gap, task-level KPIs such as gaze error) and iterate based on quantitative results.
Survey literature across graphics, vision, and generative ML; prototype, adapt, and, where needed, invent new approaches that push facial reconstruction, appearance modeling, and synthetic data quality forward.
Required Qualifications:
Demonstrated experience with 3D reconstruction, photorealistic rendering, appearance modeling, or synthetic data generation for vision tasks.
Ability to navigate and deliver results in high-ambiguity, open-ended problem spaces.
Familiarity with large-scale, multi-camera datasets and the practicalities of curation, annotation, and evaluation.
Excellent communication skills and the ability to work collaboratively across disciplines.
Bachelor’s degree or higher in computer graphics, vision, imaging, machine learning, or a related field.
Preferred Qualifications:
Master’s or Ph.D. in a relevant discipline
Hands-on experience training or adapting neural rendering models (e.g., NeRF/3DGS variants, relighting, inverse rendering) and modern generative models (e.g., diffusion/latent diffusion, controllable text-to-image/video, inpainting/outpainting).
Proficiency in PyTorch, JAX, or other modern ML frameworks.
Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.
Full-time Employee Benefits:
401k matching
100% employer-paid health, vision, and dental benefits
Unlimited PTO and sick time
Flexible spending account matching (medical FSA)
Benefits do not apply to contingent/contract workers