Research Engineer - Performance Optimization

US.svg
United States
Location
Palo Alto United States
Palo Alto United States
Salary
(Yearly)
(Yearly)
(Yearly)
(Yearly)
(Yearly)
Salary information is not provided for this position.
Undisclosed
-
Category
Research Scientist
Date posted
February 20, 2025
Job type
Full-time
Experience level
Mid level

Job Description

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs. 

Responsibilities

  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment

  • Identify and implement optimization techniques for massively parallel and distributed systems

  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code

  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish

  • Build tools to visualize, evaluate and filter datasets

  • Implement cutting-edge product prototypes based on multimodal generative AI

Experience

  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.

  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)

  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.

  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.

  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.

  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.

  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.

  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)


Compensation

  • The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan. 

Your applications are reviewed by real people.

Apply now
Luma AI is hiring a Research Engineer - Performance Optimization. Apply through Homebase and and make the next move in your career!
Apply now
Companies size
101-200
employees
Founded in
2021
Headquaters
San Francisco, CA, United States
Country
United States
Industry
Software Development
Social media
Visit website

Similar AI jobs

Here are other jobs you might want to apply for.

CA.svg
Canada

Research Scholar

Full-time
Research Scientist
US.svg
United States

Research engineer/Scientist- Post Training

Full-time
Research Scientist
CA.svg
Canada

Anthropic AI Safety Fellow, Canada

Contractor
Research Scientist
US.svg
United States

Anthropic AI Safety Fellow, US

Contractor
Research Scientist
No items found.

Research Scientist (Greece)

Full-time
Research Scientist
AU.svg
Australia

AI Researcher

Full-time
Research Scientist