Data Engineer | Power
As a Data Engineer, you will build and evolve the data backbone of an AI-first product including document intelligence, time-series IoT data, and agentic AI systems. You will design, implement, and operate data systems across the full lifecycle from raw ingestion to AI-driven outputs used by customers. You will work directly with customers and internal stakeholders to understand problems and translate them into technical solutions, iterating quickly. Responsibilities include building pipelines that support document processing, sensor data, and ML workflows, contributing to feature engineering and model experimentation when needed, and owning systems in production. You will make architectural decisions, improve system reliability over time, and help define best practices as the team and product scale.
Senior Data Engineer, People Analytics
Build and maintain resilient ETL pipelines to centralize data from core HCM and ATS systems into Google Cloud Platform, Big Query, and other people analytics products. Architect a semantic data layer using dbt to translate raw database schemas into business-friendly logic, enabling non-technical leaders to ask natural language questions and get accurate answers. Leverage AI and LLMs to extract insights from unstructured data and build predictive models for attrition and headcount planning. Design data products that solve operational problems by automating HR workflows, building custom apps for internal mobility, or redesigning organizational structure. Partner with Talent, Finance, and People leaders to translate business questions into data inquiries and consult on analytics possibilities. Design and deploy Sigma workbooks to guide executives through complex narratives to ensure data-driven action.
ML Systems Engineer (Platform & Biometrics Data Infrastructure)
Build and operate high-throughput pipelines for sensor and event data (batch and streaming) ensuring quality, lineage, and reliability. Create scalable dataset curation and labeling workflows including sampling, slice definitions, weak supervision, gold-set management, and evaluation set integrity. Develop ML platform components such as feature pipelines, training orchestration, model registry, reproducible experiment tracking, and automated evaluation. Implement monitoring and observability for production ML systems covering data drift, performance regression, alerting, and automated failure detection. Standardize schemas and interfaces across studies and product telemetry to enable reusable, consistent analytics and model development. Collaborate cross-functionally with ML engineers, data science, firmware, and backend teams to support new studies and product launches, ensuring data architecture meets evolving research and product needs.
Data Integration Lead | R&D
Lead a team of 5-6 engineers to ensure all services, platforms, and teams within the SaaS product have accurate and up-to-date scorecards that provide actionable insights and support continuous improvement. Build and maintain reliable processes to keep platform and service inventory data accurate and current. Provide technical leadership and mentorship to engineers through complex system design decisions and drive technical excellence. Architect and implement a self-improving and self-healing data processing model, large-scale, reliable data pipelines, and AI/ML-enabled systems ensuring high performance, scalability, and data quality. Lead the development of data warehousing and analytics solutions such as BigQuery or Snowflake. Collaborate with stakeholders to translate business needs into scalable data and ML solutions while operating effectively in a fast-paced environment. Promote scorecard usage across the organization via workshops, manuals, and internal communications.
AI Data Engineer | Mid-Senior | Python | R&D
Build and shape tomorrow’s AI-based application, ensure scalable data and AI pipelines. Prototype and experiment with cutting-edge AI technologies including LLMs (APIs, RAG, embeddings, etc.) to improve accuracy, insight quality, and product impact. Prepare, transform, and manage datasets that power models, features, and production systems. Collaborate closely with product, data science, and engineering teams to identify AI opportunities and implement them into real use. Fine-tune and optimize existing LLMs, prompts, workflows, and model interactions for performance and reliability. Ensure quality, scalability, observability, robustness, and smooth operation across distributed data and AI systems.
Member of Technical Staff - Data Ingestion Engineer
The role involves building and operating large-scale data ingestion systems for pre-training, including web crawling, extraction, and dataset delivery. The engineer will run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs. They will analyze ingested data to identify gaps, redundancy, and areas for improvement. Responsibilities also include building ingestion pipelines that scale reliably across large data campaigns, developing specialized crawlers for high-priority data sources, reviewing code, debugging production issues, and continuously improving the ingestion infrastructure. The role requires close collaboration with pre-training and data quality teams and working directly with researchers to link data collection to model performance.
Member of Technical Staff (All Levels) - Agent Data
As an Agent Data engineer at Basis, you will own projects completely from scoping to delivery and be the Responsible Party for the systems you design, deciding how to build them, measure success, and when to ship. You will manage yourself, plan your own projects, work closely with your pod, and take full responsibility for execution and quality. Your tasks include building and standardizing the data platform by designing data pipelines that ingest, validate, and transform accounting data into reliable datasets, defining schemas and data contracts, building validation, lineage tracking, and drift detection into every pipeline, and creating interfaces for data discovery, computation, and observation. You will model the domain as a system by translating accounting concepts into well-structured ontologies, creating abstractions to help AI systems reason about real-world constraints, and designing for clarity through schema, code, and documentation. Additionally, you will lead through clarity and technical excellence by owning the architectural vision for your area, running effective design reviews, mentoring engineers on system thinking including load testing, schema design, and observability patterns, and simplifying systems by removing accidental complexity and enforcing clean, stable abstractions.
Software Engineer, Distributed Data Systems
As a Data Engineer, you will architect and build the data infrastructure that powers all company operations, including crawling billions of pages, training embedding models, and serving real-time search. You will have autonomy in designing systems that scale to hundreds of petabytes. Responsibilities include designing lakehouse architectures, building and operating large-scale distributed data processing pipelines, creating streaming pipelines for real-time indexing, architecting data layers for embedding training infrastructure, and scaling deployments to handle analytical queries across petabytes of data.
[UMOS ONE] Data & AI Engineering Lead
The responsibilities include developing AI models and integrating Agentic AI for routing, dispatching, and prediction, specifically using features extracted from knowledge graphs to develop AI-based optimal routing, dispatching technologies, demand prediction, ETA prediction, and improving analytic prediction models. The role also involves designing and implementing the integration architecture with Agentic AI systems. Additionally, responsibilities cover the design and development of mobility and logistics-specific ontologies, building knowledge graph-based data models, integrating and refining large heterogeneous data, and managing relationships among service entities to enhance data intelligence. Furthermore, the position requires designing, building, and operating large-scale data pipelines (ETL/ELT) for UMOS platforms, establishing and automating MLOps pipelines for stable model operation, and developing and integrating efficient API interfaces with service backend systems.
Data Engineer – Spark Specialist
Help users discover and master the Dataiku platform through user training, office hours, demos, and ongoing consultative support. Analyse and investigate various kinds of data and machine learning applications across industries and use cases. Provide strategic input to the customer and account teams that help make customers successful. Scope and co-develop production-level data science projects with customers. Mentor and help educate data scientists and other customer team members to aid in career development and growth.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.