AI Data Engineer Jobs

Discover the latest remote and onsite AI Data Engineer roles across top active AI companies. Updated hourly.

Check out 207 new AI Data Engineer opportunities posted on The Homebase

Senior ML Platform Engineer (Autonomous Driving)

New
Top rated
42dot
Full-time
Full-time
Posted

Set technical strategy and oversee development of a high scale, reliable data platform to manage, visualize and serve large-scale datasets for machine learning model training and validation. Build the data lakehouse for autonomous driving scene datasets, including sensor data, calibration data, and annotation data. Drive development of the Autonomous Driving Data SDK, including scene data search, datasets preparation, and dataset loading. Investigate and resolve performance bottlenecks throughout the data processing pipelines, including data processing latency, data search latency, and Test Procedure coverage. Bootstrap and maintain infrastructure for Data Platform components such as Data Processing Pipeline, Database, Data Lakehouse, and Data Serving. Collaborate with cross-functional teams including ML algorithm, ML application, and Cloud Infrastructure to align ML Platforms with overall Autonomous Driving System Architecture.

Undisclosed

()

Sunnyvale or San Francisco, United States
Maybe global
Onsite

Senior AI Data Pipeline Engineer

New
Top rated
42dot
Full-time
Full-time
Posted

Design and build high-performance, scalable data pipelines to support diverse AI and Machine Learning initiatives across the organization. Architect and implement multi-region data infrastructure to ensure global data availability and seamless synchronization. Develop flexible pipeline architectures that allow for complex branching and logic isolation to support multiple concurrent AI projects. Optimize large-scale data processing workloads using Databricks and Spark to maximize throughput and minimize processing costs. Maintain and evolve the containerized data environment on Kubernetes, ensuring robust and reliable execution of data workloads. Collaborate with AI researchers and platform teams to streamline the flow of high-quality data into training and evaluation pipelines.

Undisclosed

()

Pangyo, South Korea
Maybe global
Remote

Senior Data Engineer

New
Top rated
Multiverse
Full-time
Full-time
Posted

The Senior Data Engineer on the Foundations team will create technical foundations including infrastructure, tools, and APIs that enable the entire company to access product data safely and efficiently. Responsibilities include defining schemas for new entities and refactoring existing models for improved performance and clarity, transitioning legacy data scripts into robust, version-controlled services, designing and developing domain-driven services with reusable APIs, creating a universal data layer with APIs and connectors such as Data Warehouse APIs (GraphQL), building features in the Internal Developer Platform (IDP) to simplify AI model deployment and management, building infrastructure for GenAI like Vector Databases or Model Context Protocols, automating security and compliance checks to ensure data privacy and safety, replacing manual approval gates with automated checks to maintain speed without compromising safety, and creating a high-fidelity data layer that allows non-technical stakeholders to generate reports without understanding raw tables. The role requires close collaboration with Infrastructure, DevEx, Security engineers, and internal Tech, Data Science, ML, and Business teams to enable self-serve data usage across the company.

Undisclosed

()

London, United Kingdom
Maybe global
Remote

Data Engineer | Power

New
Top rated
Gecko Robotics
Full-time
Full-time
Posted

As a Data Engineer, you will build and evolve the data backbone of an AI-first product including document intelligence, time-series IoT data, and agentic AI systems. You will design, implement, and operate data systems across the full lifecycle from raw ingestion to AI-driven outputs used by customers. You will work directly with customers and internal stakeholders to understand problems and translate them into technical solutions, iterating quickly. Responsibilities include building pipelines that support document processing, sensor data, and ML workflows, contributing to feature engineering and model experimentation when needed, and owning systems in production. You will make architectural decisions, improve system reliability over time, and help define best practices as the team and product scale.

$154,000 – $204,000
Undisclosed
YEAR

(USD)

New York, United States
Maybe global
Onsite

Senior Data Engineer, People Analytics

New
Top rated
Crusoe
Full-time
Full-time
Posted

Build and maintain resilient ETL pipelines to centralize data from core HCM and ATS systems into Google Cloud Platform, Big Query, and other people analytics products. Architect a semantic data layer using dbt to translate raw database schemas into business-friendly logic, enabling non-technical leaders to ask natural language questions and get accurate answers. Leverage AI and LLMs to extract insights from unstructured data and build predictive models for attrition and headcount planning. Design data products that solve operational problems by automating HR workflows, building custom apps for internal mobility, or redesigning organizational structure. Partner with Talent, Finance, and People leaders to translate business questions into data inquiries and consult on analytics possibilities. Design and deploy Sigma workbooks to guide executives through complex narratives to ensure data-driven action.

$165,000 – $200,000
Undisclosed
YEAR

(USD)

Denver, United States
Maybe global
Onsite

ML Systems Engineer (Platform & Biometrics Data Infrastructure)

New
Top rated
Eight Sleep
Full-time
Full-time
Posted

Build and operate high-throughput pipelines for sensor and event data (batch and streaming) ensuring quality, lineage, and reliability. Create scalable dataset curation and labeling workflows including sampling, slice definitions, weak supervision, gold-set management, and evaluation set integrity. Develop ML platform components such as feature pipelines, training orchestration, model registry, reproducible experiment tracking, and automated evaluation. Implement monitoring and observability for production ML systems covering data drift, performance regression, alerting, and automated failure detection. Standardize schemas and interfaces across studies and product telemetry to enable reusable, consistent analytics and model development. Collaborate cross-functionally with ML engineers, data science, firmware, and backend teams to support new studies and product launches, ensuring data architecture meets evolving research and product needs.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite

Data Integration Lead | R&D

New
Top rated
nexos.ai
Full-time
Full-time
Posted

Lead a team of 5-6 engineers to ensure all services, platforms, and teams within the SaaS product have accurate and up-to-date scorecards that provide actionable insights and support continuous improvement. Build and maintain reliable processes to keep platform and service inventory data accurate and current. Provide technical leadership and mentorship to engineers through complex system design decisions and drive technical excellence. Architect and implement a self-improving and self-healing data processing model, large-scale, reliable data pipelines, and AI/ML-enabled systems ensuring high performance, scalability, and data quality. Lead the development of data warehousing and analytics solutions such as BigQuery or Snowflake. Collaborate with stakeholders to translate business needs into scalable data and ML solutions while operating effectively in a fast-paced environment. Promote scorecard usage across the organization via workshops, manuals, and internal communications.

€5,500 – €8,200 / month
Undisclosed
MONTH

(EUR)

Vilnius, Lithuania
Maybe global
Onsite

AI Data Engineer | Mid-Senior | Python | R&D

New
Top rated
nexos.ai
Full-time
Full-time
Posted

Build and shape tomorrow’s AI-based application, ensure scalable data and AI pipelines. Prototype and experiment with cutting-edge AI technologies including LLMs (APIs, RAG, embeddings, etc.) to improve accuracy, insight quality, and product impact. Prepare, transform, and manage datasets that power models, features, and production systems. Collaborate closely with product, data science, and engineering teams to identify AI opportunities and implement them into real use. Fine-tune and optimize existing LLMs, prompts, workflows, and model interactions for performance and reliability. Ensure quality, scalability, observability, robustness, and smooth operation across distributed data and AI systems.

€4,500 – €7,100 / month
Undisclosed
MONTH

(EUR)

Vilnius, Lithuania
Maybe global
Onsite

Member of Technical Staff - Data Ingestion Engineer

New
Top rated
Reflection
Full-time
Full-time
Posted

The role involves building and operating large-scale data ingestion systems for pre-training, including web crawling, extraction, and dataset delivery. The engineer will run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs. They will analyze ingested data to identify gaps, redundancy, and areas for improvement. Responsibilities also include building ingestion pipelines that scale reliably across large data campaigns, developing specialized crawlers for high-priority data sources, reviewing code, debugging production issues, and continuously improving the ingestion infrastructure. The role requires close collaboration with pre-training and data quality teams and working directly with researchers to link data collection to model performance.

Undisclosed

()

San Francisco, United States
Maybe global
Onsite

Member of Technical Staff (All Levels) - Agent Data

New
Top rated
Basis AI
Full-time
Full-time
Posted

As an Agent Data engineer at Basis, you will own projects completely from scoping to delivery and be the Responsible Party for the systems you design, deciding how to build them, measure success, and when to ship. You will manage yourself, plan your own projects, work closely with your pod, and take full responsibility for execution and quality. Your tasks include building and standardizing the data platform by designing data pipelines that ingest, validate, and transform accounting data into reliable datasets, defining schemas and data contracts, building validation, lineage tracking, and drift detection into every pipeline, and creating interfaces for data discovery, computation, and observation. You will model the domain as a system by translating accounting concepts into well-structured ontologies, creating abstractions to help AI systems reason about real-world constraints, and designing for clarity through schema, code, and documentation. Additionally, you will lead through clarity and technical excellence by owning the architectural vision for your area, running effective design reviews, mentoring engineers on system thinking including load testing, schema design, and observability patterns, and simplifying systems by removing accidental complexity and enforcing clean, stable abstractions.

$100,000 – $300,000
Undisclosed
YEAR

(USD)

New York, United States
Maybe global
Onsite

Want to see more AI Data Engineer jobs?

View all jobs

Access all 4,256 remote & onsite AI jobs.

Join our private AI community to unlock full job access, and connect with founders, hiring managers, and top AI professionals.
(Yes, it’s still free—your best contributions are the price of admission.)

Frequently Asked Questions

Have questions about roles, locations, or requirements for AI Data Engineer jobs?

Question text goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

[{"question":"What does an AI Data Engineer do?","answer":"AI Data Engineers build and manage data pipelines specifically for AI and machine learning models. They design architectures that process diverse data types such as text, images, and videos for model consumption. Their daily work includes implementing data validation systems, ensuring quality, and integrating large-scale datasets from multiple sources. They create real-time data workflows, handle vector databases like FAISS or Milvus, and optimize performance of AI data infrastructure. Using tools like Python, SQL, Apache Spark and Airflow, they collaborate with data scientists and ML engineers to transform raw data into formats that support model training and deployment."},{"question":"What skills are required for AI Data Engineer jobs?","answer":"Strong programming skills in Python and SQL form the foundation for AI Data Engineer roles. Proficiency with data engineering frameworks like Apache Spark, Airflow, and Ray is essential for building robust pipelines. Experience with cloud platforms (AWS, GCP, Azure) and vector databases enables handling of AI-specific data needs. Skills in data quality assurance, monitoring, and error handling ensure reliable AI systems. Engineers should understand embedding techniques for unstructured data processing and have experience with ETL processes at scale. Soft skills like cross-functional collaboration are valuable as these roles bridge technical teams with AI scientists and business stakeholders."},{"question":"What qualifications are needed for AI Data Engineer jobs?","answer":"Most AI Data Engineer positions require a bachelor's degree in computer science, data engineering, or related technical fields, with many employers preferring master's degrees for senior roles. Hands-on experience building data pipelines for machine learning applications is crucial. Employers look for demonstrated expertise with cloud data services like Redshift, BigQuery or Snowflake, and familiarity with MLOps practices. Knowledge of data preprocessing techniques for unstructured data (text, images, videos) sets successful candidates apart. Professional certifications in cloud platforms or data technologies can strengthen qualifications, especially when combined with proven experience integrating large-scale datasets for AI workflows."},{"question":"What is the salary range for AI Data Engineer jobs?","answer":"Compensation for AI Data Engineers varies based on several key factors. Location significantly impacts pay, with tech hubs like San Francisco and New York offering higher salaries than smaller markets. Experience level creates substantial differences, with senior engineers commanding significantly more than entry-level positions. Specialized skills in emerging AI tools, vector databases, and specific cloud platforms can increase earning potential. Company size also matters—large tech companies and well-funded AI startups often pay premium rates. The specialized nature of preparing data for AI applications typically positions these roles at higher compensation levels than traditional data engineering positions with similar years of experience."},{"question":"How long does it take to get hired as an AI Data Engineer?","answer":"The hiring timeline for AI Data Engineers typically spans 4-8 weeks from application to offer. The process usually includes an initial resume screening, followed by a technical phone interview covering Python, SQL, and data pipeline concepts. Candidates then face 1-3 rounds of technical interviews focusing on data engineering problems, system design for AI workflows, and coding exercises. Some companies add take-home assignments demonstrating pipeline building for AI data. Final rounds often include discussions with potential team members and hiring managers. Specialized skills in AI data preprocessing and experience with vector databases can accelerate the process, especially for candidates with proven experience in similar roles."},{"question":"Are AI Data Engineer jobs in demand?","answer":"AI Data Engineer positions show strong demand as organizations build infrastructure for AI initiatives. This specialized role bridges traditional data engineering and AI needs, with job postings appearing at major institutions like Stanford and companies like OpenAI. The role is gaining recognition as essential for AI implementation success, particularly as companies scale their machine learning operations. Demand stems from the unique requirements of AI data pipelines, which differ significantly from traditional analytics infrastructure. Organizations need engineers who understand the specific data preprocessing needs of machine learning models and can build robust pipelines for handling diverse data types including text, images, and videos."},{"question":"What is the difference between AI Data Engineer and Data Engineer?","answer":"While both roles build data pipelines, AI Data Engineers specifically focus on preparing data for machine learning and AI systems rather than business analytics. They work extensively with unstructured data (text, images, videos), implementing specialized preprocessing techniques that traditional Data Engineers rarely handle. AI Data Engineers commonly use vector databases like FAISS and embedding libraries that aren't typical in standard data engineering. They must understand model training data requirements and build infrastructure supporting model deployment. Traditional Data Engineers concentrate on structured data flows, data warehousing, and analytics support, while AI Data Engineers create pipelines optimized for machine learning with features like data versioning, lineage tracking, and real-time AI-ready data delivery."}]