Software Development in Test Intern
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together's inference stack, including kernel backends, speculative decoding (e.g., ATLAS), and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate reinforcement learning (RL) and post-training pipelines (including RLHF, RLAIF, GRPO, DPO-style methods, and reward modeling) where most of the cost is inference, jointly optimizing algorithms and systems. Make RL and post-training workloads more efficient with inference-aware training loops such as asynchronous RL rollouts and speculative decoding techniques. Use these pipelines to train, evaluate, and iterate on frontier models based on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, identifying bottlenecks in training engines, inference engines, data pipelines, and user-facing layers. Run ablations and scale-up experiments to study trade-offs between model quality, latency, throughput, and cost and integrate findings into model, RL, and system design. Profile, debug, and optimize inference and post-training services under production workloads. Drive roadmap items requiring real engine modifications, including changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training. Mentor engineers and researchers on full-stack ML systems work and performance engineering.
Machine Learning Operations Engineer
Optimize orchestration processes to ensure efficient deployment and management of AI models. Implement cost-saving strategies to minimize infrastructure expenses while maximizing performance. Upgrade throughput to enhance scalability and responsiveness of AI systems. Collaborate with cross-functional teams to identify bottlenecks and implement solutions to improve workflow efficiency. Ship new features and updates rapidly while maintaining high levels of quality and reliability. Deploy and monitor machine learning models produced by deep learning engineers. Design, deploy, and maintain performant and scalable processes for data acquisition and manipulation to enhance dataset accessibility. Participate actively in the team's software development process, including design reviews, code reviews, and brainstorming sessions. Maintain accurate and updated software development documentation.
Software Engineering Manager, Autonomous
As an Engineering Manager on the Autonomous team, you will lead and scale a high-calibre team of engineers dedicated to defining the future of AI agent development and advancing AI and backend systems. You will oversee the technical roadmap for the team by translating architectural complexity into clear product strategies, mentor and support the professional growth of a diverse group of engineers, and partner closely with Product and Design to ensure the agent-building tools remain intuitive and technically robust. You will champion a "show > tell" culture to ensure rapid shipping while maintaining high technical stability and user experience standards, and clear technical and operational roadblocks to enable the team to operate with high agency and clarity. You will act as the bridge between product vision and technical execution.
Manager Information Security
You will be responsible for defining operational domains and evaluating the reliability of the AI capabilities developed in-house. You will develop and extend the state-of-the-art in uncertainty quantification and uncertainty calibration. This involves understanding the AI systems built, interfacing with them, and evaluating their robustness in real-world and adversarial scenarios. You will contribute to impactful projects and collaborate with people across several teams and backgrounds.
Lead AI/ML Engineer
Lead the design and implementation of scalable ML/AI systems focused on large language models, vector databases, and retrieval-based architectures. Integrate and apply foundation models from providers like OpenAI, AWS Bedrock, and Anthropic for prototyping and production use cases. Adapt, evaluate, and optimize large language models for domain-specific enterprise applications. Build and maintain infrastructure for AI model experimentation, deployment, and monitoring in production. Improve model performance and inference workflows addressing latency, cost, and reliability. Provide technical leadership by mentoring engineers and promoting best ML engineering practices. Partner with product and cross-functional stakeholders to translate requirements into scalable ML solutions. Contribute to the evolution of internal standards for AI experimentation, evaluation, and deployment. Lead the design and delivery of end-to-end voice AI solutions combining large language models with speech technologies including speech-to-text, text-to-speech, and real-time streaming audio pipelines, architecting low-latency, highly reliable conversational voice systems and guiding a team through ambiguity toward production excellence. Understand and apply constraints of voice experiences such as latency, turn-taking, interruption handling, streaming inference, and audio quality to create scalable, enterprise-grade systems.
Software Engineer, Inference Platform
The Software Engineer for the Inference Platform at Fluidstack will own inference deployments end-to-end, including initial configuration, performance tuning, production SLA maintenance, and incident response. They will drive measurable improvements in throughput, time-to-first-token (TTFT), and cost-per-token across diverse model families and customer workload patterns. Responsibilities include building and operating key-value (KV) cache and scheduling infrastructure to maximize utilization across concurrent requests, implementing and validating disaggregated prefill/decode pipelines, and managing Kubernetes-based orchestration at scale. The role requires profiling and resolving bottlenecks at compute, memory, and communication layers, instrumenting deployments for end-to-end observability, partnering with customers to translate model architectures, access patterns, and latency requirements into deployment configurations, and contributing to the inference platform architecture and roadmap focused on reducing deployment complexity, improving hardware utilization, and expanding support for new model classes and accelerators. Additionally, participation in an on-call rotation (up to one week per month) to maintain reliability and SLA commitments of production deployments is required.
Scientist/Sr Scientist, Display Technology (Contract)
The job responsibilities include having industry experience as a research engineer in an AI-related company and being excited to work, learn, and teach within a collaborative team on challenging problems.
Forward Deployed Engineer - ML
As a Forward Deployed ML Engineer at Modal, you will work hands-on with companies like Suno, Lovable, Cognition, and Meta to architect and optimize production AI workloads on Modal. You will contribute to open-source projects, publish technical content demonstrating Modal's capabilities across the AI stack, and collaborate with Modal's product and sales teams as both an engineer and a product stakeholder. Additionally, you will build trusted relationships with technical leaders at companies doing frontier AI work and conduct technical demos, experiments, and proof-of-concepts that highlight Modal's performance advantages.
Research Product Manager — Structured AI Systems
The Research Product Manager is responsible for advancing foundational work in tabular data learning, structured and relational representation learning, compression-aware AI, hybrid symbolic, relational, and neural systems, and large-scale systems, linking these research efforts to real production systems managing petabytes of data. The role involves productionizing structured AI models by collaborating with Research and Systems teams to design training on Parquet/Iceberg/Delta data, define training infrastructure requirements, inference architectures, and maintenance loops, while understanding storage and compute trade-offs, data layout, compute scheduling, model lifecycle, infrastructure bottlenecks, and evaluation pipelines. The role also involves defining economic value extraction by identifying buyers, economic value sources, quantification methods, and converting research advances into revenue and platform advantages, requiring strong enterprise infrastructure economic intuition. Additionally, the Research Product Manager identifies viable modeling advances for production, terminates non-viable research directions, defines integration paths into enterprise workloads, and works with the Chief Research Scientist on research agenda prioritization. The position requires deep understanding of large AI model training, deployment, and maintenance in production systems, as well as translating foundational modeling advances into economically valuable infrastructure, shaping technical execution and economic strategy.
Global Hardware Sourcing & Supply Manager
Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines including kernel backends, speculative decoding, and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate RL and post-training pipelines optimizing algorithms and systems where most cost is inference. Make RL and post-training workloads more efficient with inference-aware training loops, async RL rollouts, speculative decoding, and other techniques to reduce rollout collection and evaluation costs. Use these pipelines to train, evaluate, and iterate on frontier models. Co-design algorithms and infrastructure tightly coupling objectives, rollout collection, and evaluation to efficient inference, and identify bottlenecks across training engine, inference engine, data pipeline, and user-facing layers. Run experiments to understand trade-offs between model quality, latency, throughput, and cost, feeding insights back into design. Profile, debug, and optimize inference and post-training services under production workloads. Drive roadmap items requiring engine modifications such as kernel, memory layout, scheduling logic, and API changes. Establish metrics, benchmarks, and experimentation frameworks for rigorous validation of improvements. Provide technical leadership by setting technical direction for cross-team efforts in inference, RL, and post-training; mentor engineers and researchers on full-stack ML systems and performance engineering.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Need help with something? Here are our most frequently asked questions.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.