Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.
We are looking for an Helix AI Intern, Speech for Winter 2026 to contribute to the design and optimization of the real-time speech pipeline that powers natural voice interaction with our humanoid robot. This role offers hands-on experience at the intersection of audio systems, AI, and robotics—working on challenges such as low-latency audio streaming, speech enhancement, and real-time speech understanding.
This internship is designed for students in their final year of an undergraduate or master’s program, as well as recent graduates who are on track to complete their degree by the end of 2026, or the following year.
Responsibilities:
- Support the development and testing of real-time audio and speech streaming pipelines
- Contribute to the integration of low-latency, full-duplex audio systems using WebRTC or similar frameworks
- Assist in evaluating or deploying AI-based components that improve speech quality, intelligibility, or responsiveness
- Collaborate with AI, audio, and robotics engineers to enhance the reliability and performance of speech systems
- Help build tools for monitoring, debugging, and visualizing live audio and speech pipeline performance
Requirements:
- Undergraduate student (Senior) or recent graduate in Computer Science, Electrical Engineering, or a related field
- Minimum 10 weeks internship, 1 to 2 terms preferred
- Strong programming skills in Python or C++
- Familiarity with real-time communication frameworks (WebRTC, gRPC, or WebSockets)
- Understanding of digital audio fundamentals (sampling, latency, buffering, SNR, AEC)
- Basic knowledge of machine learning concepts and experience deploying or using pre-trained models
- Strong verbal and written communication skills
Bonus Qualifications:
- Experience with audio ML frameworks (PyTorch, torchaudio, ONNX Runtime)
- Familiarity with speech enhancement or ASR/TTS systems
- Knowledge of asynchronous or multithreaded programming (asyncio, coroutines, or similar)
- Exposure to cloud or edge-based audio processing systems
- Interest in humanoid robots and real-time human–robot communication
The US hourly range for this internship position is between $40 - $50 per hour.
The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.