AI Researcher & Engineer - Multimodal (Real-time Audio and Video)

US.svg
United States
Location
Palo Alto United States
Palo Alto United States
Salary
(Yearly)
(Yearly)
(Yearly)
(Yearly)
(Yearly)
Salary information is not provided for this position.
Undisclosed
USD
180000
-
440000
Category
Machine Learning Engineer
Date posted
July 25, 2025
Job type
Full-time
Experience level
Mid level

Job Description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

The reasoning team at xAI creates magical AI experiences beyond text, enabling the understanding and generation of content across various modalities, including image, video, and audio. Our team is pushing the frontier of multimodal intelligence through Grok Voice, our advanced multimodal AI assistant that is able to listen, see, and respond to you in real time. We actively work to develop novel audio and video understanding capabilities that solve user problems in both the physical and digital worlds.

As a Researcher & Engineer on the Reasoning team specializing in real-time audio and video, you'll lead the advancement of multimodal capabilities across data, modeling, serving infrastructure, and product integration. Collaborating closely with pre-training, post-training, and product teams, you'll drive innovations that expand the boundaries of model performance and elevate end-to-end user experiences. Ideal candidates thrive at the intersection of cutting-edge research and engineering.

What You'll Do

  • Research, design, and implement algorithms to enhance audio and video understanding and generation, whether through developing new models, systems, or tools.
  • ​​Collaborate closely with product and engineering teams to carry multimodal capabilities from initial concept through production deployment, proactively monitoring and addressing issues along the way.
  • Improve data quality by curating robust datasets, developing data filtering and generation techniques, building scalable data pipelines, and analyzing user interactions to inform product improvements.
  • Create evaluation frameworks, internal benchmarks, and metrics to systematically measure and improve real-world model performance, proactively identifying and resolving user-facing challenges.
  • Manage the complete experimental lifecycle: from designing experiments and training models to deployment and iterative refinement based on feedback and data.

Ideal Experience

You're an exceptional candidate if you have some (or all) of the following:

  • A proven track record of leading research or engineering efforts that have significantly enhanced neural network capabilities and performance.
  • Hands-on experience building and deploying large-scale distributed machine learning systems and backend services.
  • Expertise in reinforcement learning, agentic models, or real-world multimodal AI applications.
  • Strong engineering skills combined with the ability and enthusiasm to rapidly navigate and master complex, unfamiliar codebases.
  • Demonstrated excellence in systematic experiment design, model debugging, performance analysis, and iterative improvements.
  • A pragmatic, execution-oriented approach: you proactively solve problems and prioritize getting things done efficiently.

Location

  • The role is based in Palo Alto. Our team usually works from the office 5 days a week but allow work-from-home days when required. Candidates are expected to be located near Palo Alto or open to relocation.

Interview Process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15-minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:

  • One-on-one research discussion & coding interviews (three meetings total)
  • Project deep-dive: Present your past exceptional work and your vision with xAI to a small audience.
  • Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.

Annual Salary Range

$180,000 - $440,000 USD

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Companies size
5000+
employees
Founded in
2008
Headquaters
Alberic, Spain
Country
Spain
Industry
Software Development
Social media
Visit website

Similar AI jobs

Here are other jobs you might want to apply for.

US.svg
United States

AI Researcher & Engineer - Multimodal (Real-time Audio and Video)

Full-time
Machine Learning Engineer
US.svg
United States

AI Engineer & Researcher - Search

Full-time
Machine Learning Engineer
US.svg
United States

AI Engineer & Researcher - Multimodal Post-training

Full-time
Machine Learning Engineer
US.svg
United States

AI Engineer & Researcher - AI Experts

Full-time
Machine Learning Engineer
US.svg
United States

AI Engineer & Researcher - Coding Agents

Machine Learning Engineer
US.svg
United States

AI Engineer & Researcher - Reasoning Efficiency

Full-time
Machine Learning Engineer