Member of technical staff (Inference)

FR.svg
France
GB.svg
United Kingdom
Location
Paris France
France, United Kingdom
Paris France
France, United Kingdom
Salary
(Yearly)
(Yearly)
(Yearly)
(Yearly)
(Hourly)
Undisclosed
-
Date posted
October 21, 2025
Job type
Full-time
Experience level
Mid level

Job Description

About H:
H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.

H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.


About the Team: The Inference team develops and enhances the inference stack for serving H-models that power our agent technology. The team focuses on optimizing hardware utilization to reach high throughput, low latency and cost efficiency in order to deliver a seamless user experience.

Key Responsibilities:

  • Develop scalable, low-latency and cost effective inference pipelines

  • Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms

  • Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc.

  • Collaborate with H research teams on model architectures to enhance efficiency during inference

  • Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.)

  • Prioritize and implement state-of-the-art inference techniques

Requirements:

  • Technical skills:

    • MS or PhD in Computer Science, Machine Learning or related fields

    • Proficient in at least one of the following programming languages: Python, Rust or C/C++

    • Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc.

    • Experience in model compression and quantization techniques

  • Soft skills

    • Collaborative mindset, thriving in dynamic, multidisciplinary teams

    • Strong communication and presentation skills

    • Eager to explore new challenges

  • Bonuses:

    • Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc.

    • Experience with CUDA kernel programming and NCCL

    • Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.)

Location:

  • Paris or London.

  • This role is hybrid, and you are expected to be in the office 3 days a week on average.

  • The final decision for this will lie with the hiring manager for each individual role

What We Offer:

  • Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups

  • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment

  • Enjoy a competitive salary

  • Unlock opportunities for professional growth, continuous learning, and career development

If you want to change the status quo in AI, join us.

Apply now
H Company is hiring a Member of technical staff (Inference). Apply through Homebase and and make the next move in your career!
Apply now
Companies size
201-500
employees
Founded in
Headquaters
Country
Industry
Computer Software
Social media
Visit website

Similar AI jobs

Here are other jobs you might want to apply for.

ES.svg
Spain

Freelance Mechanical/Automotive Engineering Consultant - QA/ AI Trainer

Part-time
Machine Learning Engineer
AU.svg
Australia

Freelance Automotive/Mechanical Engineering - QA / AI Trainer

Part-time
Machine Learning Engineer
BR.svg
Brazil

Freelance Automotive/Mechanical Engineering - QA / AI Trainer

Part-time
Machine Learning Engineer
GB.svg
United Kingdom

Freelance Automotive/Mechanical Engineering - QA / AI Trainer

Part-time
Machine Learning Engineer
SG.svg
Singapore

Freelance Automotive/Mechanical Engineering - QA / AI Trainer

Part-time
Machine Learning Engineer
FR.svg
France

Freelance Automotive/Mechanical Engineering - QA / AI Trainer

Part-time
Machine Learning Engineer
Open Modal