PyTorch AI Jobs

Discover the latest remote and onsite PyTorch AI roles across top active AI companies. Updated hourly.

Check out 362 new PyTorch AI roles opportunities posted on The Homebase

Software Development in Test Intern

New
Top rated
Together AI
Full-time
Full-time
Posted

Advance inference efficiency end-to-end by designing and prototyping algorithms, architectures, and scheduling strategies for low-latency, high-throughput inference. Implement and maintain changes in high-performance inference engines such as SGLang- or vLLM-style systems and Together's inference stack, including kernel backends, speculative decoding (e.g., ATLAS), and quantization. Profile and optimize performance across GPU, networking, and memory layers to improve latency, throughput, and cost. Design and operate reinforcement learning (RL) and post-training pipelines (including RLHF, RLAIF, GRPO, DPO-style methods, and reward modeling) where most of the cost is inference, jointly optimizing algorithms and systems. Make RL and post-training workloads more efficient with inference-aware training loops such as asynchronous RL rollouts and speculative decoding techniques. Use these pipelines to train, evaluate, and iterate on frontier models based on the inference stack. Co-design algorithms and infrastructure to tightly couple objectives, rollout collection, and evaluation with efficient inference, identifying bottlenecks in training engines, inference engines, data pipelines, and user-facing layers. Run ablations and scale-up experiments to study trade-offs between model quality, latency, throughput, and cost and integrate findings into model, RL, and system design. Profile, debug, and optimize inference and post-training services under production workloads. Drive roadmap items requiring real engine modifications, including changing kernels, memory layouts, scheduling logic, and APIs. Establish metrics, benchmarks, and experimentation frameworks to rigorously validate improvements. Provide technical leadership by setting technical direction for cross-team efforts at the intersection of inference, RL, and post-training. Mentor engineers and researchers on full-stack ML systems work and performance engineering.

$200,000 – $280,000
Undisclosed
YEAR

(USD)

San Francisco
Maybe global
Onsite
Python
PyTorch
TensorFlow
Reinforcement Learning
MLOps

Machine Learning Operations Engineer

New
Top rated
Haydenai
Full-time
Full-time
Posted

Optimize orchestration processes to ensure efficient deployment and management of AI models. Implement cost-saving strategies to minimize infrastructure expenses while maximizing performance. Upgrade throughput to enhance scalability and responsiveness of AI systems. Collaborate with cross-functional teams to identify bottlenecks and implement solutions to improve workflow efficiency. Ship new features and updates rapidly while maintaining high levels of quality and reliability. Deploy and monitor machine learning models produced by deep learning engineers. Design, deploy, and maintain performant and scalable processes for data acquisition and manipulation to enhance dataset accessibility. Participate actively in the team's software development process, including design reviews, code reviews, and brainstorming sessions. Maintain accurate and updated software development documentation.

$135,699 – $190,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Remote
Python
C++
PyTorch
TensorFlow
MLflow

Software Engineering Manager, Autonomous

New
Top rated
Magical
Full-time
Full-time
Posted

As an Engineering Manager on the Autonomous team, you will lead and scale a high-calibre team of engineers dedicated to defining the future of AI agent development and advancing AI and backend systems. You will oversee the technical roadmap for the team by translating architectural complexity into clear product strategies, mentor and support the professional growth of a diverse group of engineers, and partner closely with Product and Design to ensure the agent-building tools remain intuitive and technically robust. You will champion a "show > tell" culture to ensure rapid shipping while maintaining high technical stability and user experience standards, and clear technical and operational roadblocks to enable the team to operate with high agency and clarity. You will act as the bridge between product vision and technical execution.

Undisclosed

()

Toronto, Canada
Maybe global
Hybrid
Python
Docker
Kubernetes
AWS
CI/CD

Manager Information Security

New
Top rated
helsing
Full-time
Full-time
Posted

You will be responsible for defining operational domains and evaluating the reliability of the AI capabilities developed in-house. You will develop and extend the state-of-the-art in uncertainty quantification and uncertainty calibration. This involves understanding the AI systems built, interfacing with them, and evaluating their robustness in real-world and adversarial scenarios. You will contribute to impactful projects and collaborate with people across several teams and backgrounds.

Undisclosed

()

Munich
Maybe global
Onsite
Python
Java
C++
PyTorch
TensorFlow

Data Engineer - Foundational

New
Top rated
Harmattan AI
Full-time
Full-time
Posted

As a Data Engineer on the Foundational team, you will build ETL/ELT pipelines to extract, decode, and store raw Electro-Optical (EO) and Infrared (IR) video into optimized formats like WebDataset, TFRecords, or Parquet. You will develop algorithms to synchronise EO and IR frames temporally and spatially for model training inputs. Architect storage-to-GPU pipelines to ensure multi-node training clusters maintain over 90% GPU utilisation without I/O bottlenecks. Write and optimise distributed data processing jobs using Apache Spark, Ray, or Apache Beam to handle thousands of hours of tactical video logs. Implement automated quality checks to filter corrupted or blank frames and maintain reproducible training runs with versioning and lineage tracking. Evaluate and implement advanced storage solutions such as MinIO or S3 tiering to manage datasets while optimising cost and latency.

Undisclosed

()

Paris, France
Maybe global
Onsite
Python
PyTorch
JAX
Data Pipelines
ETL

Lead AI/ML Engineer

New
Top rated
ASAPP
Full-time
Full-time
Posted

Lead the design and implementation of scalable ML/AI systems focused on large language models, vector databases, and retrieval-based architectures. Integrate and apply foundation models from providers like OpenAI, AWS Bedrock, and Anthropic for prototyping and production use cases. Adapt, evaluate, and optimize large language models for domain-specific enterprise applications. Build and maintain infrastructure for AI model experimentation, deployment, and monitoring in production. Improve model performance and inference workflows addressing latency, cost, and reliability. Provide technical leadership by mentoring engineers and promoting best ML engineering practices. Partner with product and cross-functional stakeholders to translate requirements into scalable ML solutions. Contribute to the evolution of internal standards for AI experimentation, evaluation, and deployment. Lead the design and delivery of end-to-end voice AI solutions combining large language models with speech technologies including speech-to-text, text-to-speech, and real-time streaming audio pipelines, architecting low-latency, highly reliable conversational voice systems and guiding a team through ambiguity toward production excellence. Understand and apply constraints of voice experiences such as latency, turn-taking, interruption handling, streaming inference, and audio quality to create scalable, enterprise-grade systems.

$170,000 – $190,000
Undisclosed
YEAR

(USD)

New York or Mountain View, United States
Maybe global
Hybrid
Python
PyTorch
TensorFlow
OpenAI API
RAG

Software Engineer, Inference Platform

New
Top rated
Fluidstack
Full-time
Full-time
Posted

The Software Engineer for the Inference Platform at Fluidstack will own inference deployments end-to-end, including initial configuration, performance tuning, production SLA maintenance, and incident response. They will drive measurable improvements in throughput, time-to-first-token (TTFT), and cost-per-token across diverse model families and customer workload patterns. Responsibilities include building and operating key-value (KV) cache and scheduling infrastructure to maximize utilization across concurrent requests, implementing and validating disaggregated prefill/decode pipelines, and managing Kubernetes-based orchestration at scale. The role requires profiling and resolving bottlenecks at compute, memory, and communication layers, instrumenting deployments for end-to-end observability, partnering with customers to translate model architectures, access patterns, and latency requirements into deployment configurations, and contributing to the inference platform architecture and roadmap focused on reducing deployment complexity, improving hardware utilization, and expanding support for new model classes and accelerators. Additionally, participation in an on-call rotation (up to one week per month) to maintain reliability and SLA commitments of production deployments is required.

$165,000 – $500,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
Go
PyTorch
JAX
Kubernetes

AI Researcher

New
Top rated
Maincode
Full-time
Full-time
Posted

The AI Researcher will work across the model development loop including designing and testing architecture changes and training regimes for large language models, running controlled experiments at scale to isolate causal effects, studying failure modes in reasoning, generalisation, robustness, and representation, shaping objectives, data mixtures, and optimisation choices that influence model behaviour, building and refining evaluations that measure capability and reliability, analysing training dynamics using logs, metrics, and model outputs, collaborating with ML systems engineers on distributed training and training operations, and writing clear internal notes to translate experimental results into design decisions. The role requires substantial time spent in code, training runs, logs, and evaluation outputs aiming for clarity about what improves the model and why.

A$150,000 – A$180,000
Undisclosed
YEAR

(AUD)

Melbourne, Australia
Maybe global
Onsite
Python
PyTorch
JAX
Transformers
Model Evaluation

AI Software Engineer (Model Training)

New
Top rated
Maincode
Full-time
Full-time
Posted

You will build and maintain the systems that support large scale model training, including designing and maintaining distributed training pipelines for large language models, building data ingestion and preprocessing systems for large training datasets, developing tooling for experiment management, checkpointing, and reproducibility, monitoring and debugging long running training jobs across clusters, improving reliability and observability across the training stack, optimizing training throughput across compute, memory, and data pipelines, working closely with researchers to translate experimental ideas into training runs, and diagnosing failures across infrastructure, training loops, and data pipelines. The work requires spending time inside code, logs, dashboards, and experiment outputs to make large scale training reliable.

Undisclosed

()

Melbourne, Australia
Maybe global
Onsite
Python
PyTorch
JAX
MLOps
Distributed Systems

Scientist/Sr Scientist, Display Technology (Contract)

New
Top rated
Xaira
Contractor
Full-time
Posted

The job responsibilities include having industry experience as a research engineer in an AI-related company and being excited to work, learn, and teach within a collaborative team on challenging problems.

Undisclosed

()

London, United Kingdom
Maybe global
Hybrid
Python
PyTorch
TensorFlow
Distributed Training
Model Evaluation

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are PyTorch AI jobs?","answer":"PyTorch AI jobs focus on building, training, and deploying deep learning models for applications like computer vision, natural language processing, and generative AI. These positions involve creating custom neural networks, research prototyping with dynamic computation graphs, and transitioning models to production using tools like TorchScript and TorchServe. These roles typically exist in research labs, tech companies, and AI-driven startups."},{"question":"What roles commonly require PyTorch skills?","answer":"Roles that commonly require PyTorch skills include AI researchers, machine learning engineers, data scientists, and deep learning specialists. These professionals develop custom neural networks, implement computer vision solutions, create NLP models, and design predictive analytics systems. They often work on research prototyping and transitioning models to production environments through REST APIs or cloud platforms."},{"question":"What skills are typically required alongside PyTorch?","answer":"Python programming is essential as the framework is deeply integrated with the language. Professionals also need strong foundations in deep learning concepts, familiarity with neural network architectures like CNNs and RNNs, and experience with NumPy. Additional valuable skills include GPU programming with CUDA, distributed training techniques, cloud platforms integration, and knowledge of deployment tools like TorchServe and ONNX Runtime."},{"question":"What experience level do PyTorch AI jobs usually require?","answer":"PyTorch AI jobs span from entry-level to senior positions. Entry roles typically require fundamental Python and deep learning knowledge. Mid-level positions demand practical experience building and deploying models using the framework. Senior roles require extensive experience with complex architectures, distributed training, production deployment, and often specialization in areas like computer vision or NLP."},{"question":"What is the salary range for PyTorch AI jobs?","answer":"Salaries for PyTorch AI jobs vary based on location, experience level, industry, and specific role. Machine learning engineers and AI researchers using this framework typically earn competitive compensation reflecting their specialized skills. Roles involving advanced model development for computer vision, NLP, or generative AI, especially in major tech hubs, command premium compensation packages."},{"question":"Are PyTorch AI jobs in demand?","answer":"PyTorch AI jobs are in high demand across both academia and industry. The framework has gained widespread adoption for cutting-edge research and commercial applications. Many companies seek specialists who can prototype and deploy deep learning models using its dynamic computation graphs. Major cloud providers like Azure, AWS, and Google Cloud have integrated support, further increasing demand for these skills in production environments."},{"question":"What is the difference between PyTorch and TensorFlow in AI roles?","answer":"PyTorch uses dynamic computation graphs allowing for flexible, iterative development and easier debugging, making it popular in research. TensorFlow traditionally used static graphs optimized for production deployment. AI roles focused on research prototyping often prefer PyTorch for its pythonic interface, while production-focused teams might use TensorFlow. However, both frameworks now support both dynamic and static approaches, with the gap narrowing as they evolve."}]