AWS AI Jobs

Discover the latest remote and onsite AWS AI roles across top active AI companies. Updated hourly.

Check out 352 new AWS AI roles opportunities posted on The Homebase

2026 New Grad | Software Engineer, Full-Stack

New
Top rated
Loop
Full-time
Full-time
Posted

Ship critical infrastructure managing real-world logistics and financial data for large enterprises. Own the why by building deep context through customer calls and understanding Loop's value to customers, pushing back on requirements if better solutions exist. Work full-stack across system boundaries including frontend UX, LLM agents, database schema, and event infrastructures. Leverage AI tools to handle routine tasks enabling focus on quality, architecture, and product taste. Constantly optimize development loops, refactor legacy patterns, automate workflows, and fix broken processes to raise velocity.

$150,000 – $150,000
Undisclosed
YEAR

(USD)

San Francisco or Chicago or NYC, United States
Maybe global
Hybrid
Python
JavaScript
TypeScript
PyTorch
TensorFlow

Software Engineer, Platform Systems

New
Top rated
OpenAI
Full-time
Full-time
Posted

Design and build distributed failure detection, tracing, and profiling systems for large-scale AI training jobs. Develop tooling to identify slow, faulty, or misbehaving nodes and provide actionable visibility into system behavior. Improve observability, reliability, and performance across OpenAI's training platform. Debug and resolve issues in complex, high-throughput distributed systems. Collaborate with systems, infrastructure, and research teams to evolve platform capabilities. Extend and adapt failure detection systems or tracing systems to support new training paradigms and workloads.

Undisclosed

()

London, United Kingdom
Maybe global
Onsite
Python
C++
Docker
Kubernetes
CI/CD

Software Engineer, Platform Systems

New
Top rated
OpenAI
Full-time
Full-time
Posted

Design and build distributed failure detection, tracing, and profiling systems for large-scale AI training jobs. Develop tooling to identify slow, faulty, or misbehaving nodes and provide actionable visibility into system behavior. Improve observability, reliability, and performance across OpenAI's training platform. Debug and resolve issues in complex, high-throughput distributed systems. Collaborate with systems, infrastructure, and research teams to evolve platform capabilities. Extend and adapt failure detection systems or tracing systems to support new training paradigms and workloads.

$310,000 – $460,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Onsite
Python
C++
Docker
Kubernetes
CI/CD

Software Engineer

New
Top rated
AIFund
Full-time
Full-time
Posted

Design, develop, and maintain web applications and backend services that integrate ML-powered features. Collaborate closely with Machine Learning Engineers and Product Managers to understand ML system requirements and translate them into robust software solutions. Build reliable, scalable, and low-latency services that support ML inference, data workflows, and AI-driven user experiences. Use LLMs to build scalable and reliable AI agents. Own the full software development lifecycle: design, implementation, testing, deployment, monitoring, and maintenance. Ensure high standards for code quality, testing, observability, and operational excellence. Troubleshoot production issues and participate in on-call or support rotations when needed. Mentor junior engineers and contribute to technical best practices across teams. Act as a strong cross-functional partner between product, engineering, and ML teams.

Undisclosed

()

San Francisco Bay Area, United States
Maybe global
Hybrid
Python
Docker
Kubernetes
AWS
GCP

Evaluations - Platform Engineer

New
Top rated
Antimetal
Full-time
Full-time
Posted

Own the evaluation stack by building online and offline evaluation pipelines that measure agent quality across ephemeral, voluminous MELT data, code, and unstructured documents, and set metrics defining the experience. Define quality at scale by designing evaluations that capture trajectory quality in production incidents spanning hundreds of services with ephemeral, high-volume, and approximative ground truth, ensuring metrics predict real outcomes. Build platform abstractions for agents by designing core agent architectures and extending internal frameworks such as sub-agents, MCPs, and middleware to enable confident iteration and faster shipping with product, platform, and research teams. Productionize these systems by owning latency, observability, and uptime.

$225,000 – $325,000
Undisclosed
YEAR

(USD)

New York, United States
Maybe global
Onsite
Python
TypeScript
MLOps
MLflow
Docker

Evaluation Engineer

New
Top rated
Elicit
Full-time
Full-time
Posted

The Evaluation Engineer will own the technical foundation of the auto-evaluation systems by building a comprehensive system that runs fast, is easy to use, and supports quickly building new evaluations. Responsibilities include improving the speed of the basic evals infrastructure with minimal latency, designing interfaces suitable for ML engineers, product managers, and customers, and ensuring the system architecture allows team members to easily add examples and run evaluations. The role also involves ensuring evaluations are accurate and reliable by encoding knowledge about how pharma customers make decisions, providing appropriate statistical tests, and confidence intervals for trustworthy results. Additionally, the engineer is expected to spend most time on the core eval platform, collaborate with the evals team on specific evals, mentor an evals engineering intern, and learn how users interact with the evaluation system to improve it.

$140,000 – $200,000
Undisclosed
YEAR

(USD)

Oakland, United States
Maybe global
Hybrid
Python
TypeScript
Docker
CI/CD
AWS

AI Deployment Engineer

New
Top rated
OpenAI
Full-time
Full-time
Posted

The AI Deployment Engineer serves as the primary technical subject matter expert post-sale for a portfolio of customers, embedding deeply with them to design and deploy Generative AI solutions. They engage with senior business and technical stakeholders to identify, prioritize, and validate the highest-value GenAI applications in customers' roadmaps. The role accelerates customer time to value by providing architectural guidance, building hands-on prototypes, and advising on best practices for scaling solutions in production. The engineer maintains strong relationships with leadership and technical teams to drive adoption, expansion, and successful outcomes. They contribute to open-source resources and enterprise-facing technical documentation to scale best practices across customers. The engineer shares learnings and collaborates with internal teams to inform product development and improve customer outcomes. Additionally, they codify knowledge and operationalize technical success practices to help the Solutions Architecture team scale impact across industries and customer types.

$220,000 – $280,000
Undisclosed
YEAR

(USD)

San Francisco, United States
Maybe global
Hybrid
Python
JavaScript
OpenAI API
Prompt Engineering
Model Evaluation

AI Deployment Engineer

New
Top rated
OpenAI
Full-time
Full-time
Posted

The AI Deployment Engineer is responsible for serving as the primary technical subject matter expert post-sale for a portfolio of customers, embedding deeply with them to design and deploy Generative AI solutions. They engage with senior business and technical stakeholders to identify, prioritize, and validate high-value GenAI applications in the customers' roadmaps. The role involves accelerating customer time to value by providing architectural guidance, building hands-on prototypes, and advising on best practices for scaling solutions in production. The engineer maintains strong relationships with leadership and technical teams to drive adoption, expansion, and successful outcomes. They contribute to open-source resources and enterprise-facing technical documentation to scale best practices, share learnings, and collaborate with internal teams to inform product development and improve customer outcomes. Additionally, they codify knowledge and operationalize technical success practices to help the Solutions Architecture team scale impact across industries and customer types.

$220,000 – $280,000
Undisclosed
YEAR

(USD)

Seattle
Maybe global
Hybrid
Python
JavaScript
OpenAI API
Prompt Engineering
Model Evaluation

ML Systems Engineer (Platform & Biometrics Data Infrastructure)

New
Top rated
Eight Sleep
Full-time
Full-time
Posted

Build and operate high-throughput pipelines for sensor and event data (batch and streaming) ensuring quality, lineage, and reliability. Create scalable dataset curation and labeling workflows including sampling, slice definitions, weak supervision, gold-set management, and evaluation set integrity. Develop ML platform components such as feature pipelines, training orchestration, model registry, reproducible experiment tracking, and automated evaluation. Implement monitoring and observability for production ML systems covering data drift, performance regression, alerting, and automated failure detection. Standardize schemas and interfaces across studies and product telemetry to enable reusable, consistent analytics and model development. Collaborate cross-functionally with ML engineers, data science, firmware, and backend teams to support new studies and product launches, ensuring data architecture meets evolving research and product needs.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite
Python
SQL
MLOps
Docker
Kubernetes

Machine Learning Engineer (Foundation Models & Personalization)

New
Top rated
Eight Sleep
Full-time
Full-time
Posted

The Machine Learning Engineer is responsible for building and deploying machine learning models that enhance sleep experiences through personalization, prediction, and behavior understanding, including readiness forecasting, event detection, and individualized recommendations. They will apply and adapt foundation-model capabilities to product workflows, develop user behavior models connecting longitudinal signals to actionable interventions, and design evaluation strategies for offline metrics, slice-based analysis, calibration, reliability, and fairness. The role involves partnering with Product teams to run high-quality online experiments, productionizing models via scalable training and inference pipelines, model monitoring, drift detection, alerting, and continuous improvement loops. Collaboration with cross-functional partners such as Product, Mobile, Backend, and Clinical teams is essential to scope requirements and deliver high-impact features.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite
Python
PyTorch
TensorFlow
JAX
MLOps

Want to see more AI Egnineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Need help with something? Here are our most frequently asked questions.

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What are AWS AI jobs?","answer":"AWS AI jobs involve building, training, and deploying generative AI applications using specialized cloud services. These roles work with tools like SageMaker for custom model development, Bedrock for foundation models, and Lake Formation for data governance. Professionals in these positions create AI-driven applications, implement RAG systems with Kendra, and orchestrate machine learning pipelines using Step Functions and Lambda."},{"question":"What roles commonly require AWS skills?","answer":"Common roles requiring AWS skills include machine learning engineers, data scientists, software engineers, architects, and platform engineers. These professionals work on generative AI applications and AI-assisted development lifecycles. They implement end-to-end ML pipelines in SageMaker, design LLM-powered applications with Bedrock, create agentic workflows, and build AI-enhanced developer tools using Amazon Q Developer."},{"question":"What skills are typically required alongside AWS?","answer":"Alongside AWS expertise, professionals typically need experience with JupyterLab, Git, and IDE integrations like VS Code. Knowledge of LangChain for LLM orchestration, machine learning concepts, and data engineering practices are valuable. Familiarity with generative AI patterns like retrieval-augmented generation, prompt engineering, and AI application development workflows helps create effective solutions within the AWS ecosystem."},{"question":"What experience level do AWS AI jobs usually require?","answer":"AWS AI jobs typically require mid to senior-level experience with cloud infrastructure and AI development patterns. Employers look for professionals familiar with JupyterLab environments, ML workflows in SageMaker, and foundation model deployment via Bedrock. Experience building end-to-end machine learning pipelines, implementing RAG systems, and orchestrating AI workflows using Step Functions and Lambda is highly valued."},{"question":"What is the salary range for AWS AI jobs?","answer":"AWS AI job salaries vary based on experience, location, and specific role. Machine learning engineers and data scientists implementing SageMaker solutions generally command premium compensation. Platform engineers orchestrating AI infrastructure and architects designing generative AI applications often receive higher salaries. Software engineers using Amazon Q for AI-assisted development are increasingly valued for their productivity enhancements."},{"question":"Are AWS AI jobs in demand?","answer":"AWS AI jobs are experiencing strong demand as organizations adopt generative AI technologies. Companies are actively hiring professionals who can implement AI-driven development lifecycles using tools like Amazon Q Developer. There's particular demand for engineers who can work with Bedrock for foundation models, build RAG systems with Kendra, and design agentic workflows for business process automation."},{"question":"What is the difference between AWS and Azure in AI roles?","answer":"The key difference in AI roles is that AWS emphasizes fully managed services like Bedrock for foundation models and SageMaker for end-to-end ML workflows, while Azure offers a different ecosystem through Azure AI services. AWS positions focus more on serverless orchestration and agentic capabilities unique to their toolchain. The platforms have distinct approaches to generative AI implementation, with different service integrations and developer experiences."}]