AI Data Engineer | Mid-Senior | Python | R&D
Build and shape tomorrow’s AI-based application, ensure scalable data and AI pipelines. Prototype and experiment with cutting-edge AI technologies including LLMs (APIs, RAG, embeddings, etc.) to improve accuracy, insight quality, and product impact. Prepare, transform, and manage datasets that power models, features, and production systems. Collaborate closely with product, data science, and engineering teams to identify AI opportunities and implement them into real use. Fine-tune and optimize existing LLMs, prompts, workflows, and model interactions for performance and reliability. Ensure quality, scalability, observability, robustness, and smooth operation across distributed data and AI systems.
Member of Technical Staff - Data Ingestion Engineer
The role involves building and operating large-scale data ingestion systems for pre-training, including web crawling, extraction, and dataset delivery. The engineer will run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs. They will analyze ingested data to identify gaps, redundancy, and areas for improvement. Responsibilities also include building ingestion pipelines that scale reliably across large data campaigns, developing specialized crawlers for high-priority data sources, reviewing code, debugging production issues, and continuously improving the ingestion infrastructure. The role requires close collaboration with pre-training and data quality teams and working directly with researchers to link data collection to model performance.
Software Engineer, Distributed Data Systems
As a Data Engineer, you will architect and build the data infrastructure that powers all company operations, including crawling billions of pages, training embedding models, and serving real-time search. You will have autonomy in designing systems that scale to hundreds of petabytes. Responsibilities include designing lakehouse architectures, building and operating large-scale distributed data processing pipelines, creating streaming pipelines for real-time indexing, architecting data layers for embedding training infrastructure, and scaling deployments to handle analytical queries across petabytes of data.
Data Engineer – Spark Specialist
Help users discover and master the Dataiku platform through user training, office hours, demos, and ongoing consultative support. Analyse and investigate various kinds of data and machine learning applications across industries and use cases. Provide strategic input to the customer and account teams that help make customers successful. Scope and co-develop production-level data science projects with customers. Mentor and help educate data scientists and other customer team members to aid in career development and growth.
Data Engineer
The Data Engineer will design, build, and maintain data pipelines, manage data ingestion, and develop reliable data models to support AI and ML workflows. The role also involves close collaboration with ML and product teams to ensure clean, structured, and high-quality data delivery for analytics and product features.
AI Pilot Vibe Coding Assistant (Freelance)
AI Pilot Vibe Coding Assistants collaborate with AI-driven systems to generate, refine, and submit accurate, well-structured outputs based on complex prompts. They handle coding, automation, data processing, troubleshooting technical issues, and improving AI output quality across diverse domains.
Data Engineer
The Data Engineer will design, build, and maintain scalable data pipelines to support analytics and data-driven decision making at Replit. They will collaborate across teams to deliver ETL/ELT workflows, ensure data quality, and build unified data models for in-depth analysis.
Member of Technical Staff, Data Engineering
As a Data Engineer specializing in pretraining data, you will be responsible for developing and maintaining data pipelines that support Cohere's advanced language models. You will manage the entire lifecycle of training data, including ingestion, cleaning, optimization, and modeling for optimal model performance, while collaborating with cross-functional teams to ensure the quality and efficiency of data curation.
Data Operations Manager
Build and scale data and financial operations to support deployment and growth of AI agents for major institutional clients. Take ownership of billing, collections, data infrastructure, dashboards, and cross-functional operations to provide actionable, real-time visibility to business leaders.
Sr. Data Engineer (Poland)
You will build and optimize data pipelines, extract and model diverse datasets, and design maintainable software systems. The role also involves setting data strategies, incorporating best practices, and leveraging AI-powered tools to accelerate development.
Access all 4,256 remote & onsite AI jobs.
Frequently Asked Questions
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.