Top Data Engineer Jobs Openings in 2025

Looking for opportunities in Data Engineer? This curated list features the latest Data Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.

Basis.jpg

Member of Technical Staff (All Levels) - Agent Data

Basis AI
USD
300000
100000
-
300000
US.svg
United States
Full-time
Remote
false
About BasisBasis equips accountants with a team of AI agents to take on real workflows.We have hit product-market fit, have more demand than we can meet, and just raised $34m to scale at a speed that meets this moment.Built in New York City. Read more about Basis here.About the TeamThe Agent Data team ensures Basis agents have accurate data at low latency to do their jobs.About the RoleAs an Agent Data engineer at Basis you'll own projects completely from scoping to delivery. You'll be the Responsible Party (RP) for the systems you design. That means you decide how to build them, how to measure success, and when they're ready to ship.We trust you to manage yourself. You'll plan your own projects, work closely with your pod, and take full responsibility for execution and quality. You'll build systems that serve every part of Basis: AI, product, and internal agents. And you'll make those systems fast, reliable, and easy to understand.What you’ll be doing:Build and standardize our data platformDesign data pipelines that ingest, validate, and transform accounting data into clean, reliable datasets.Define schemas and data contracts that balance flexibility with correctness.Build validation, lineage tracking, and drift detection into every pipeline.Create interfaces that make data discoverable, computable, and observable throughout the system.Model the domain as a systemTranslate accounting concepts into well-structured ontologies: entities, relationships, and rules.Create abstractions that help AI systems reason safely about real-world constraints.Design for clarity: make complex workflows understandable through schema, code, and documentation.Lead through clarity and technical excellenceOwn the architectural vision for your area and keep it consistent over time.Run effective design reviews that challenge assumptions and drive alignment.Mentor engineers on how to think about systems: from load testing to schema design to observability patterns.Simplify aggressively—removing accidental complexity and enforcing clean, stable abstractions.📍 Location: NYC, Flatiron office. In-person team.
No items found.
Apply
Hidden link
Exa.jpg

Software Engineer, Distributed Data Systems

Exa
USD
300000
150000
-
300000
US.svg
United States
Full-time
Remote
false
Exa is building a search engine from scratch to serve every AI application. We build massive-scale infrastructure to crawl the web, train state-of-the-art embedding models to index it, and develop super high performant vector databases in Rust to search over it. We also own a $5M H200 GPU cluster that regularly lights up tens of thousands of machines.As a Data Engineer, you'll architect and build the data infrastructure that powers everything we do—from crawling billions of pages to training our embedding models to serving real-time search. You'll have enormous autonomy in designing systems that scale to hundreds of petabytes. If you've ever wanted to build data pipelines at a scale that most companies only dream about, this is your chance.Desired ExperienceDeep understanding of lakehouse architectures (Delta Lake, Iceberg, Hudi) and when to use themExperience building and operating large-scale distributed data processing pipelinesHands-on experience with streaming data systems (Kafka, Flink, or similar)Familiarity with Ray, Spark, or ClickHouse at production scaleAn obsessive focus on reliability and building systems that don't page you at 3amBonus PointsExperience with Lance or other vector-native storage formatsBackground in GPU-accelerated data processing (RAPIDS, cuDF)Example ProjectsDesign a lakehouse architecture that handles 100+ PB of web crawl dataBuild streaming pipelines that process billions of documents per day for real-time indexingArchitect the data layer for our embedding training infrastructure on RayScale our ClickHouse deployment to handle analytical queries across petabytes of search logsThis is an in-person opportunity in San Francisco. We're happy to sponsor international candidates (e.g., STEM OPT, OPT, H1B, O1, E3).
No items found.
Apply
Hidden link
LMArena.jpg

Don't See Your Role? Apply Here!

LMArena
0
0
-
0
US.svg
United States
Full-time
Remote
false
About LMArenaLMArena is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use. Millions of people use LMArena each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress. We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus.At LMArena, we’re always on the lookout for exceptional people - even if there’s no open role that’s a perfect fit right now.If you believe your skills and experience align with our mission and you're excited about what we're building, we’d love to hear from you.Here are some of the roles we’re actively exploring - or perhaps you’re something entirely unique we didn’t know we needed yet:Engineering ManagerSoftware Engineer (API Product / Product Platform)Software Engineer (Data Infrastructure)Data Scientist / Research Engineer (Human Data & Evaluation)Design EngineerGTM EngineerBackend AI EngineerOr maybe you're a passionate contributor eager to join a fast-growing team solving meaningful problems.Either way - don’t hesitate to introduce yourself.Who is LMArena? Created by researchers from UC Berkeley’s SkyLab, LMArena is an open platform where everyone can easily access, explore and interact with the world’s leading AI models. By comparing them side by side and casting votes for the better response, the community helps shape a public leaderboard, making AI progress more transparent, and grounded in real-world usage.Why Join Us? Trusted by organizations like Google, OpenAI, Meta, xAI, and more, LMArena is rapidly becoming essential infrastructure for transparent, human-centered AI evaluation at scale. With over one million monthly users and growing developer adoption, our impact is helping guide the next generation of safe, aligned AI systems—grounded in open access and collective feedback.Our work is regularly referenced by industry leaders pushing the frontier of safe and reliable AI. Sundar Pichai, Jeff Dean, Elon Musk, and Sam Altman.High Impact: Your work will be used daily by the world’s most advanced AI labs.Global Reach: Develop data infrastructure powering millions of real-world evaluations, influencing AI reliability across industries at the top-tierExceptional Team: We are a small team of top talent from Google, DeepMind, Discord, Vercel, UC Berkeley, and Stanford.What we offer:Competitive salary and meaningful equityComprehensive healthcare coverage (medical, dental, vision)The opportunity to work on cutting-edge AI with a small, mission-driven teamA culture that values transparency, trust, and community impact Come help design the space where anyone can explore and help shape the future of AI!What we offerWe offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.The opportunity to work on cutting-edge AI with a small, mission-driven teamA culture that values transparency, trust, and community impactCome help build the space where anyone can explore and help shape the future of AI.LMArena provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.
No items found.
Apply
Hidden link
42dot.jpg

[UMOS ONE] Data & AI Engineering Lead

42dot
0
0
-
0
KR.svg
South Korea
Full-time
Remote
false
We are looking for the bestUMOS ONE은 현대자동차그룹의 글로벌 소프트웨어(SW)센터인 포티투닷(42dot)의 자회사로, UMOS (Urban Mobility Operating System) 사업을 전개하는 기업입니다. '모든 것이 스스로 움직이고 연결되는 세상'이라는 비전 아래, 미래 도심형 모빌리티 서비스의 시작부터 끝까지 전 과정을 아우르는 통합 플랫폼을 개발하고 있습니다.주요 솔루션으로는 모빌리티 서비스 'TAP!', 차량 관제/운영 시스템 'Pleos Fleet', AI 기반 운송 관리 시스템 'Capora' 등이 있으며, SDV(Software Defined Vehicle) 기반 기술의 상용화를 이끌며 미래 모빌리티 서비스의 중심을 이끌고자 합니다. * 본 채용건은 포티투닷 채용이 아닌 자회사 UMOS ONE 채용입니다. Why join UMOS ONE빠른 실행력으로 판을 바꿀 당신을 기다립니다!우리는 지금, ‘TAP!(탭)’ 모빌리티 서비스부터 차량 운영 시스템 Pleos Fleet, AI 기반 운송관리 시스템 Capora까지 – SDV(Software Defined Vehicle) 기반 기술을 현실로 만들고 있습니다. 빠르게 변화하는 시장 속, 복잡한 문제를 날카롭게 분석하고 정확한 솔루션으로 밀어붙이는 실행력 있는 동료가 필요합니다.함께 크는 팀, 같이 가는 성장UMOS ONE의 문화는 단순한 협업을 넘어, 서로의 성장을 진심으로 응원하는 팀워크에 기반합니다. 혼자 잘하는 것보다 함께 잘하는 팀을 만드는 것을 목표로 치열하지만 따뜻한 분위기 속에서, 성장 욕심 많은 사람들끼리 시너지를 내며 진짜 재미있는 도전을 이어나갈 동료를 모십니다.ResponsibilitiesAI 모델 개발 및 Agentic AI 연동 (Routing, Dispatching, Prediction)지식 그래프에서 추출된 Feature를 활용하여 AI 기반 최적 라우팅 (Routing) 및 배차/운영 (Dispatching) 기술 개발 (최적화 알고리즘, 강화 학습 등)수요 예측 (Demand Prediction), 도착 시간 예측 (ETA) 등 Analytics Prediction 모델 개발 및 성능 개선개발된 지능형 모델을 활용하여 Agentic AI System (자율적 의사결정 시스템)과의 연동 아키텍처 설계 및 구현지식 그래프 및 온톨로지 구축 (Knowledge Graph & Ontology)Mobility 및 Logistics 도메인에 특화된 온톨로지(Ontology) 설계 및 개발지식 그래프 (Knowledge Graph) 기반의 데이터 모델을 구축하고, 대규모 이질적인 데이터를 통합 및 정제Ontology와 유사한 개념으로, 서비스 엔티티(차량, 기사, 주문, 경로 등) 간의 관계를 정의하고 관리하여 데이터의 지능화 수준 향상대용량 데이터 엔지니어링 및 MLOpsUMOS 플랫폼에서 수집되는 대용량 데이터 파이프라인 (ETL/ELT) 설계, 구축 및 운영안정적인 모델 운영을 위한 MLOps 파이프라인 (학습, 배포, 모니터링) 구축 및 자동화서비스 백엔드 시스템과의 효율적인 인터페이스(API) 개발 및 통합Qualifications8년 이상의 데이터 엔지니어링, 머신러닝 엔지니어링 활용 개발 경력Python 등 주력 언어를 활용한 데이터 처리 및 AI 모델 개발에 능숙한 분대용량 데이터 처리를 위한 분산 처리 기술 (Spark, Hadoop 등) 사용 경험클라우드 환경 (AWS, GCP, Azure 등) 기반의 데이터 파이프라인 및 MLOps 구축 경험Mobility, Logistics 도메인 데이터에 대한 깊은 이해를 바탕으로 데이터 정제 및 분석 경험Preferred QualificationsKnowledge Graph, Ontology, Semantic Web 기술 (RDF, SPARQL 등) 관련 연구 또는 개발 경험그래프 데이터베이스 (Neo4j, AWS Neptune 등) 활용 경험최적화 이론 및 알고리즘 기반의 라우팅/디스패칭 모델 개발 경험Agentic AI, Multi-Agent System 관련 개발 경험Kubernetes, Docker 환경에서 실시간 서빙 시스템 구축 경험Kafka, Flink 등 스트림 처리 기술을 활용한 실시간 데이터 처리 경험Interview Process서류전형 - 코딩테스트 - 1차면접 (1시간 내외) - 2차면접 (3시간 내외) - 최종합격전형절차는 직무별로 다르게 운영될 수 있으며, 일정 및 상황에 따라 변동될 수 있습니다.전형일정 및 결과는 지원서에 등록하신 이메일로 개별 안내드립니다.Additional Information근무지: 서울 서초구 신논현역 부근본 채용의 입사는 자회사 UMOS ONE 으로 입사 진행 됩니다.UMOS ONE은 자체 Employee Engagement Program을 제공합니다.이력서 제출 시 주민등록번호, 가족관계, 혼인 여부, 연봉, 사진, 신체조건, 출신 지역 등 채용절차법상 요구 금지된 정보는 제외 부탁드립니다.모든 제출 파일은 30MB 이하의 PDF 양식으로 업로드를 부탁드립니다. (이력서 업로드 중 문제가 발생한다면 지원하시고자 하는 포지션의 URL과 함께 이력서를 recruit@42dot.ai으로 전송 부탁드립니다.)인터뷰 프로세스 종료 후 지원자의 동의하에 평판조회가 진행될 수 있습니다.국가보훈대상자 및 취업보호 대상자는 관계법령에 따라 우대합니다.장애인 고용 촉진 및 직업재활법에 따라 장애인 등록증 소지자를 우대합니다.UMOS ONE은 의뢰하지 않은 서치펌의 이력서를 받지 않으며, 요청하지 않은 이력서에 대해 수수료를 지불하지 않습니다.
No items found.
Apply
Hidden link
Dataiku.jpg

Data Engineer – Spark Specialist

Dataiku
0
0
-
0
FR.svg
France
GE.svg
Germany
NL.svg
Netherlands
Remote
false
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, Dataiku meets teams where they are today, allowing them to begin building with AI using their existing skills and knowledge.Dataiku’s promise to our customers is to provide them with the software and support needed to accelerate their Data Science and Machine Learning maturity. Dataiku’s Data Science team is responsible for delivering on that promise. As an AI Solution Architect at Dataiku, you will have the opportunity to participate in our customers' journeys, from supporting their discovery of the platform to coaching users and co-developing data science applications from design to deployment. You will work mostly for our customers in the financial services and insurance industries. You will get hands-on experience coding in multiple languages (mostly Python, occasionally R, SQL, Pyspark, JavaScript, etc.) and applying the latest big data technologies to business use cases. Our ideal candidate is comfortable learning new languages, technologies, and modelling techniques while being able to explain their work to other data scientists and clients.  Key Areas of Responsibility (What You’ll Do) Help users discover and master the Dataiku platform through user training, office hours, demos, and ongoing consultative support. Analyse and investigate various kinds of data and machine learning applications across industries and use cases. Provide strategic input to the customer and account teams that help make our customers successful. Scope and co-develop production-level data science projects with our customers. Mentor and help educate data scientists and other customer team members to aid in career development and growth. Experience (What We’re Looking For) French and English - fluent Curiosity and a desire to learn new technical skills. Empathy and an eagerness to share your knowledge with your colleagues, Dataiku’s customers, and the general public. Ability to clearly explain complex topics to technical as well as non-technical audiences. Over 5 years of experience with coding (Python, R, SQL). Over 5 years of experience building ML models.   Understanding of underlying data systems and platform mechanics such as Cloud architectures, K8S, Spark, and SQL.  Bonus points for any of these Experience with Consulting and/or Customer-facing Data Science roles. Experience in the bank & insurance, or manufacturing industries. Experience with Spark, SAS, Data Engineering or MLOps. Experience developing web apps in JavaScript, RShiny, or Dash. Experience building APIs. Experience using enterprise data science tools. Passion for teaching or public speaking. #LI-Hybrid  What are you waiting for! At Dataiku, you'll be part of a journey to shape the ever-evolving world of AI. We're not just building a product; we're crafting the future of AI. If you're ready to make a significant impact in a company that values innovation, collaboration, and your personal growth, we can't wait to welcome you to Dataiku! And if you’d like to learn even more about working here, you can visit our Dataiku LinkedIn page.   Our practices are rooted in the idea that everyone should be treated with dignity, decency and fairness. Dataiku also believes that a diverse identity is a source of strength and allows us to optimize across the many dimensions that are needed for our success. Therefore, we are proud to be an equal opportunity employer. All employment practices are based on business needs, without regard to race, ethnicity, gender identity or expression, sexual orientation, religion, age, neurodiversity, disability status, citizenship, veteran status or any other aspect which makes an individual unique or protected by laws and regulations in the locations where we operate. This applies to all policies and procedures related to recruitment and hiring, compensation, benefits, performance, promotion and termination and all other conditions and terms of employment. If you need assistance or an accommodation, please contact us at: reasonable-accommodations@dataiku.com     Protect yourself from fraudulent recruitment activity Dataiku will never ask you for payment of any type during the interview or hiring process. Other than our video-conference application, Zoom, we will also never ask you to make purchases or download third-party applications during the process. If you experience something out of the ordinary or suspect fraudulent activity, please review our page on identifying and reporting fraudulent activity here.
Data Engineer
Data Science & Analytics
Apply
Hidden link
AI Artisan HQ.jpg

Data Engineer

Artisan AI
-
US.svg
United States
Full-time
Remote
false
Role OverviewWe are looking for a high-caliber Data Engineer who can architect and scale the data systems that power our AI workflows. You’ll be responsible for building reliable data pipelines, integrating external APIs, maintaining clean and structured data models, and enabling the product and ML teams to iterate quickly.You should thrive in ambiguous environments, enjoy wearing multiple hats, and be comfortable designing end-to-end data solutions with minimal direction.What You’ll OwnDesign, build, and maintain scalable data pipelines that process and transform large volumes of structured and unstructured data.Manage ingestion from third-party APIs, internal systems, and customer datasets.Develop and maintain data models, data schemas, and storage systems optimized for ML and product performance.Collaborate with ML engineers to prepare model-ready datasets, embeddings, feature stores, and evaluation data.Implement data quality monitoring, validation, and observability.Work closely with product engineers to support new features that rely on complex data flows.Optimize systems for performance, cost, and reliability.Contribute to early architecture decisions, infrastructure design, and best practices for data governance.Build tooling that enables the entire team to access clean, well-structured data.Who You AreBuilder MentalityYou’re a hands-on engineer who thrives in a fast-paced environment, enjoys autonomy, and takes ownership of problems from start to finish.Strong CommunicationYou translate technical complexity into clarity. You work well with ML, product, and GTM partners.Practical, Not AcademicYou can design elegant systems but default to shipping solutions that work and can be iterated on.Detail-Oriented & ReliableYou care about clean pipelines, reproducibility, and data correctness.What You Bring3+ years of experience as a Data Engineer, ML Engineer, Backend Engineer, or similar.Proficiency in Python, SQL, and modern data tooling (dbt, Airflow, Dagster, or similar).Experience designing and operating ETL/ELT pipelines in production.Experience with cloud platforms (AWS, GCP, or Azure).Familiarity with data lakes, warehouses, and vector databases.Experience integrating APIs and working with semi-structured data (JSON, logs, event streams).Strong understanding of data modeling and optimization.Bonus: experience supporting LLMs, embeddings, or ML training pipelines.Bonus: startup experience or comfort working in fast, ambiguous environments.What Success Looks LikeStable, documented, testable pipelines powering ML and product features.High-quality data consistently available for analytics, modeling, and core product workflows.Faster iteration cycles for the Engineering and ML teams due to improved tooling.Clear visibility into data quality and reliability.Strong cross-functional collaboration and communication.Why ArtisanBuild core systems at the heart of a fast-growing AI company.High autonomy, high impact, zero bureaucracy.Work with a talented, ambitious team solving meaningful problems.Shape the data platform from the ground up.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Mindrift.jpg

AI Pilot Vibe Coding Assistant (Freelance)

Mindrift
USD
0
0
-
10
ZA.svg
South Africa
Part-time
Remote
true
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.Mindrift is looking for passionate freelance contributors to join the Tendem project (https://tendem.ai/) and help shape the future of hybrid agents — where human expertise and Vibe Coding work together seamlessly. As a Vibe Coding Expert, you’ll partner with systems that take on repetitive tasks, while you provide the nuance, judgment, and creativity needed to deliver outstanding results. In this role, you won’t just refine what Vibe Coding generates — you’ll actively collaborate with it, shaping and completing outputs so they are accurate, reliable, and ready for real-world use. Your day-to-day work may range from tackling complex challenges across different domains with the support of automation, to producing well-reasoned, precise, and clearly written outputs backed by credible sources. This flexible, part-time remote opportunity is ideal for professionals with technical expertise and hands-on experience in scripting, automation, or AI-driven tools. Your contributions will directly influence how Vibe Coding systems evolve, learn, and empower industries worldwide.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleThis is a freelance role for a project, and your typical tasks may include: Your mission - deliver well-reasoned, accurate, and clearly written outputs backed by credible sources. Solve complex tasks in different domains with the help of automations Develop and submit precise answers based on complex prompts, including coding, automation, and data processing tasks. Write and optimize Python scripts for data analysis, automation, and verification. Work with large datasets efficiently, ensuring data is clean and well-structured. Utilize various LLMs to generate advanced prompts and improve AI output quality. Format outputs in required structures such as Markdown, JSON, tables, etc. Identify and troubleshoot non-trivial technical problems related to AI workflows and integrations. How to get started Simply apply to this post, qualify, and get the chance to contribute to projects that match your technical skills, on your own schedule. From coding and automation to fine-tuning AI outputs, you’ll play a key role in advancing AI capabilities and real-world applications.RequirementsYou hold a Bachelor's or Master’s Degree in Engineering, Applied Mathematics, Computer Science, or related technical fields.You have a minimum of 1 year of professional experience in AI automation, data engineering, or software development is desirable.Your level of English is upper-intermediate (B2) or above.Strong data analysis and automation skills, with experience in scripting (e.g., Python) for task efficiency. Proficient in working with large datasets and integrating data from multiple sources.Ability to develop, test, and optimize AI-driven workflows and tools. Detail-oriented mindset to ensure accuracy and quality in data processing and output Hands-on experience with LLMs and AI frameworks to enhance automation and problem-solving.You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines.Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.BenefitsWhy this freelance opportunity might be a great fit for you? Get paid for your expertise, with rates that can go up to $10/hour depending on your skills, experience, and project needs.Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.Work on advanced AI projects and gain valuable experience that enhances your portfolio.Influence how future AI models understand and communicate in your field of expertise.
Data Engineer
Data Science & Analytics
Software Engineer;recQ38v9PITPFTF3H
Apply
Hidden link
Replit.jpg

Data Engineer

Replit
USD
160000
-
325000
US.svg
United States
Full-time
Remote
false
Replit is the agentic software creation platform that enables anyone to build applications using natural language. With millions of users worldwide and over 500,000 business users, Replit is democratizing software development by removing traditional barriers to application creation.About the role: As a Data Engineer, your job is to facilitate data analytics and measurement at scale at Replit. You'll work with product and business teams to help build data pipelines and transformations to enable us to understand and measure product usage. You'll also work to make our data scientists and analysts -- and the business decisions that depend on them-- more powerful and efficient.You will:Design, build, and maintain scalable data pipelines that power analytics and data-driven decision-making across Replit (e.g. tracking Repl deployments, AI agent usage, etc.)Develop ETL/ELT workflows using modern data stack tools and transform raw data into clean, reliable datasets that enable self-service analytics.Partner with teams across the company to understand data needs, deliver robust solutions, and implement data quality monitoring to ensure accuracy and reliability.Examples of what you could do:Build unified data models combining product usage, billing, and customer data to enable cohort analysis and retention tracking.Design real-time pipelines that surface key metrics and automated data quality checks to catch inconsistencies before they impact downstream users.Create dimensional models that enable flexible analysis of user behavior, feature adoption, and conversion funnels.Required skills and experience:5+ years of experience building production data pipelines with strong SQL skills and experience designing data models.Experience with modern data transformation tools (dbt preferred), proficiency in Python, and hands-on experience with cloud data warehouses (BigQuery, Snowflake, Redshift).Understanding of data warehouse design principles and ability to communicate effectively with both technical and non-technical stakeholders.Preferred Qualifications:Experience with modern data stack tools (dbt, Fivetran, Segment, HEX, Databricks, Amplitude) and background in high-growth SaaS or PLG companies.Familiarity with event-based analytics platforms, data visualization tools, and software engineering best practices.Bonus Points:Experience with real-time data processing, reverse ETL tools, or developer tools and collaborative coding environments.Knowledge of data governance frameworks or machine learning pipelines and feature engineering.This is a full-time role that can be held from our Foster City, CA office. The role has an in-office requirement of Monday, Wednesday, and Friday.Full-Time Employee Benefits Include:💰 Competitive Salary & Equity💹 401(k) Program⚕️ Health, Dental, Vision and Life Insurance🩼 Short Term and Long Term Disability🚼 Paid Parental, Medical, Caregiver Leave🚗 Commuter Benefits📱 Monthly Wellness Stipend🧑‍💻 Autonoumous Work Environement🖥 In Office Set-Up Reimbursement🏝 Flexible Time Off (FTO) + Holidays🚀 Quarterly Team Gatherings☕ In Office AmenitiesWant to learn more about what we are up to?Meet the Replit AgentReplit: Make an app for thatReplit BlogAmjad TED TalkInterviewing + Culture at ReplitOperating PrinciplesReasons not to work at ReplitTo achieve our mission of making programming more accessible around the world, we need our team to be representative of the world. We welcome your unique perspective and experiences in shaping this product. We encourage people from all kinds of backgrounds to apply, including and especially candidates from underrepresented and non-traditional backgrounds.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Cohere Health.jpg

Member of Technical Staff, Data Engineering

Cohere
-
No items found.
Full-time
Remote
true
Who are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.Join us on our mission and shape the future!Why this role?As a Data Engineer specializing in pretraining data, you will play a pivotal role in developing the data pipeline that underpins Cohere’s advanced language models. Your responsibilities will encompass the end-to-end management of training data, including ingestion, cleaning, filtering, and optimization, as well as data modeling to ensure datasets are structured and formatted for optimal model performance. You will work with diverse data sources, such as web data, code data, and multilingual corpora, to ensure their quality, diversity, and reliability. By combining research and engineering, you will bridge the gap between raw data and cutting-edge AI models, directly contributing to improvements in critical training metrics like throughput and accelerator utilization.Your work will be essential to Cohere’s mission of delivering efficient and reliable language understanding and generation capabilities, driving innovation in natural language processing. If you are passionate about transforming data into the foundation of AI systems, this role offers a unique opportunity to make a meaningful impact.Please Note: We have offices in London, Paris, Toronto, San Francisco and New York but also embrace being remote-friendly! There are no restrictions on where you can be located for this role between EST and EU.As a Member of Technical Staff, Data Engineering, you will:Design and build scalable data pipelines to ingest, parse, filter, and optimize diverse web datasets.Conduct data ablations to assess data quality and experiment with data mixtures to enhance model performance.Develop robust data modeling techniques to ensure datasets are structured and formatted for optimal training efficiency.Research and implement innovative data curation methods, leveraging Cohere’s infrastructure to drive advancements in natural language processing.Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models.You may be a good fit if you have:Strong software engineering skills, with proficiency in Python and experience building data pipelines.Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.Experience working with large-scale web datasets like CommonCrawl.A passion for bridging research and engineering to solve complex data-related challenges in AI model training.Bonus: paper at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.Full-Time Employees at Cohere enjoy these Perks:🤝 An open and inclusive culture and work environment 🧑‍💻 Work closely with a team on the cutting edge of AI research 🍽 Weekly lunch stipend, in-office lunches & snacks🦷 Full health and dental benefits, including a separate budget to take care of your mental health 🐣 100% Parental Leave top-up for up to 6 months🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement🏙 Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend✈️ 6 weeks of vacation (30 working days!)
Data Engineer
Data Science & Analytics
Apply
Hidden link
Greenlite AI.jpg

Data Operations Manager

Greenlite AI
USD
0
150000
-
190000
US.svg
United States
Full-time
Remote
false
About the roleAs Data Operations Manager at Greenlite, you'll build and scale the data and financial operations that keep our AI agents humming and our customers happy. You'll work directly with our biggest customers—institutions serving over a billion people—by ensuring our operations, billing, and data infrastructure can support their growth as they deploy our AI. Your work is informed by real customer needs and ships to production, so you need to think like a technical operator, work effectively with finance, engineering, GTM, and vendor teams, understand how financial systems and AI workloads scale, and adapt quickly to what the business needs most.This is a core operations role on the Finance & Operations team. What makes someone exceptional here is the ability to take a high-level problem—"we can't see abnormal spend fast enough" or "we need tighter cash cycle time"—and design systems and processes to solve it. You're comfortable working across finance, product, and infrastructure. You operate on a "most important thing" principle, constantly reprioritizing around what moves the needle most. You're not just running billing or watching dashboards—you're building the operational foundation that lets us scale from 35 to 90+ people while staying cash-efficient and keeping deep visibility into how our AI is being used.We work in-person Monday through Friday in our SF office.What you'll doMonth 1Own billing and collections operations end-to-end—tighten invoicing processes, shorten cash cycle time, and establish baseline metrics so we know where we are.Audit our current data infrastructure: what do we have, what are the gaps, and what's the first thing that breaks as we scale?Design and implement a simple spend monitoring dashboard so the leadership team can see cash burn, usage patterns, and revenue against plan in real time.Month 2–3Build automated data pipelines that give us clear, accurate visibility into usage, revenue, margins, and spend across customers and products.Set up alerts and tooling so abnormal spend patterns, failed collections, or off-nominal transactions are flagged automatically rather than caught in retrospect.Partner with our incoming RevOps leader on billing and collections workflows—stand up systems that both teams can lean on.Partner with our leadership team & department leads on vendor management and treasury strategy—translate high-level strategy into concrete systems and guardrails.OngoingCollaborate with GTM, engineering, and data teams to ensure our operational and data infrastructure can support the next 5x growth.Take on high-impact, cross-functional projects as the business evolves—whatever moves the needle most.Maintain strong relationships with vendors and external partners; optimize costs and quality over time.Build dashboards and reports that give leadership early signals on business health and operational efficiency.What we're looking forBackgroundYou have 5–8+ years in a mix of operations, analytics, finance, product management, or technical consulting. You've made 2–3 career pivots and are a strong generalist, not a narrow specialist. Startup experience is strongly preferred—you're comfortable in messy, 0→1 environments where things aren't yet built.Technical and systems thinkingYou're strong with data: SQL & spreadsheet magic at minimum; ideally Python or similar for light data engineering. You can reason from first principles about systems and constraints even if you're not a pure software engineer. You can take a vague operational problem and design tooling and process to fix it.You're comfortable working across finance and FP&A, GTM and revenue operations, engineering and data infrastructure, and vendor/legal/contract work. You love working with ambiguous requirements and creating structure.Traits and mentalityYou operate on a "most important thing" principle—you can prioritize ruthlessly around what moves the needle most. You have high ownership and low ego; you're happy to do anything from financial modeling to chasing down invoices. You're energized by fast-paced environments with shifting priorities and ambiguous requirements.You understand financial systems and have intuition for cash flow, burn, and unit economics. Experience in fintech or payments is a plus but not required; what matters more is that you can learn fast and care about getting operational detail right.Comp and logisticsSalary: $150K–$190K, depending on experience and background. Plus equity. Health, dental, vision, and other benefits. Five-day/week in-person in downtown San Francisco.
Data Engineer
Data Science & Analytics
Product Manager
Product & Operations
Apply
Hidden link
Mirage.jpg

Member of Technical Staff, Training Data Infrastructure

Mirage
USD
300000
215000
-
300000
US.svg
United States
Full-time
Remote
false
Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.Our ProductsCaptions Mirage Studio Our TechnologyAI Research @ MirageMirage Model AnnouncementSeeing Voices (white-paper)Press CoverageTechCrunchLenny’s PodcastForbes AI 50Fast CompanyOur InvestorsWe’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) We do not work with third-party recruiting agencies, please do not contact us** About the Role and Team:Captions seeks an exceptional Research Engineer (MOTS) to drive innovation in training data infrastructure. You'll conduct research on and develop sophisticated distributed training workflows and optimized data processing systems for massive video and multimodal datasets. Beyond pure performance, you'll develop deep insight into our data to maximize training effectiveness. As an early member of our ML Research team, you'll build foundational systems that directly impact our ability to train models powering video and multimodal creation for millions of users.You'll work directly alongside our research and engineering teams in our NYC office. We've intentionally built a culture where infrastructure and data work is highly valued - your success will be measured by the reliability and performance of our systems, not by your ability to navigate politics. We're a team that loves diving deep into technical problems and emerging with practical solutions.Our team values:Quick iteration and practical solutions.Open discussion of technical approaches.Direct access to decision makers.Regular sharing of learnings, results, and iterative work.Key Responsibilities:Infrastructure Development:Build performant pipelines for processing video and multimodal training data at scale.Design distributed systems that scale seamlessly with our rapidly growing video and multimodal datasets.Create efficient data loading systems optimized for GPU training throughput.Implement comprehensive telemetry for video processing and training pipelines.Core Systems Development:Create foundation data processing systems that intelligently cache and reuse expensive computations across the training pipeline.Build robust data validation and quality measurement systems for video and multimodal content.Design systems for data versioning and reproducing complex multimodal training runs.Develop efficient storage and compute patterns for high-dimensional data and learned representations.System Optimization:Own and improve end-to-end training pipeline performance.Build systems for efficient storage and retrieval of video training data.Build frameworks for systematic data and model quality improvement.Develop infrastructure supporting fast research iteration cycles.Build tools and systems for deep understanding of our training data characteristics.Research & Product Impact:Build infrastructure enabling rapid testing of research hypotheses.Create systems for incorporating user feedback into training workflows.Design measurement frameworks that connect model improvements to user outcomes.Enable systematic experimentation with direct user feedback loops.Requirements: Technical Background:Bachelor's or Master's degree in Computer Science, Machine Learning, or related field.3+ years experience in ML infrastructure development or large-scale data engineering.Strong programming skills, particularly in Python and distributed computing frameworks.Expertise in building and optimizing high-throughput data pipelines.Proven experience with video/image data pre-processing and feature engineering.Deep knowledge of machine learning workflows, including model training and data loading systems.System Development:Track record in performance optimization and system scaling.Experience with cluster management and distributed computing.Background in MLOps and infrastructure monitoring.Demonstrated ability to build reliable, large-scale data processing systems.Engineering Approach:Love tackling hard technical problems head-on.Take ownership while knowing when to loop in teammates.Get excited about improving system performance.Want to work directly with researchers and engineers who are equally passionate about building great systems.Benefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite! Grubhub subscriptionHealth & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)Multiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.
Data Engineer
Data Science & Analytics
DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Craft.co

Sr. Data Engineer (Poland)

Craft
-
PL.svg
Poland
Full-time
Remote
true
About Craft:Craft is the leader in supplier risk intelligence, enabling enterprises to discover, evaluate, and continuously monitor their suppliers at scale. Our unique, proprietary data platform tracks real-time signals on millions of companies globally, delivering best-in-class monitoring and insight into global supply chains. Our customers include Fortune 500 companies, government agencies, SMEs, and global service platforms. Through our configurable Software-as-a-Service portal, our customers can monitor any company they work with and execute critical actions in real-time. We’ve developed distribution partnerships with some of the largest integrators and software platforms globally.We are a post-Series B high-growth technology company backed by top-tier investors in Silicon Valley and Europe, headquartered in San Francisco with hubs in Seattle, London, and Warsaw. We support remote and hybrid work, with team members across North America and Europe.We're looking for innovative and driven people passionate about building the future of Enterprise Intelligence to join our growing team!About the Role:We’re growing quickly and looking to hire several senior-level data engineers for multiple teams. Each team is responsible for a key product within the organization. As a core member of the team, you will have great say in how solutions are engineered and delivered. Craft gives engineers a lot of responsibility and authority, which is matched by our investment in their growth and development.Our data engineers carry a lot of software engineering responsibilities, so we're looking for engineers who have strong Python coding experience, Pandas expertise, and solid software engineering practices. What You'll Do:Build and optimize data pipelines (batch and streaming).Extracting, analyzing and modeling rich and diverse datasets of structured and unstructured dataDesign software that is easily testable and maintainable.Support in setting data strategies and our vision.Keep track of emerging technologies and trends in the Data Engineering world, incorporating modern tooling and best practices at Craft.Work on extendable data processing systems that allows to add and scale pipelines.Apply machine learning techniques such as anomaly detection, clustering, regression classification, and summarization to extract value from our data sets.Leverage AI-powered development tools (e.g. Cursor) to accelerate development, refactoring, and code generation.Who You Are:4+ years of experience in Data Engineering.4+ years of experience with Python.Experience in developing, maintaining, and ensuring the reliability, scalability, fault tolerance, and observability of data pipelines in a production environment.Have fundamental knowledge of data engineering techniques: ETL/ELT, batch and streaming, DWH, Data Lakes, distributed processing.Strong knowledge of SDLC and solid software engineering practices.Familiar with infrastructure-as-code approach.Demonstrated curiosity through asking questions, digging into new technologies, and always trying to grow.Strong problem solving and the ability to communicate ideas effectively.Self-starter, independent, likes to take initiative.Familiarity with at least some of the technologies in our current tech stack:Python, PySpark, Pandas, SQL (PostgreSQL), ElasticSearch, Airflow, DockerDatabricks, AWS (S3, Batch, Athena, RDS, DynamoDB, Glue, ECS, Amazon Neptune)CircleCI, GitHub, TerraformKnowledge surrounding AI-assisted coding and experience with Cursor, Co-Pilot, or CodexA strong track record of leveraging AI IDEs like Cursor to:Rapidly scaffold components and APIsRefactor legacy codebases efficientlyReduce context-switching and accelerate documentationExperiment and prototype with near-instant feedbackWhat We Offer:Option to work as a B2B contractor or full-time employeeCompetitive salary at a well-funded, fast-growing startupPTO days so you can take the time you need to refresh!Full-time employees: 28 PTO days allotted + paid public holidays B2B contractors: 15 PTO days allotted + paid public holidays100% remote work (or hybrid if you prefer! We have coworking space in center of Warsaw.)A Note to Candidates:We are an equal opportunity employer who values and encourages diversity, equity and belonging at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, caste, or disability status.Don’t meet every requirement? Studies have shown that women, communities of color and historically underrepresented talent are less likely to apply to jobs unless they meet every single qualification. At Craft, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we strongly encourage you to apply. You may be just the right candidate for this or other roles!
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Figure.jpg

Staff Data Engineer

Figure AI
USD
350000
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Demandbase.jpg

Senior Data Engineer (Evergreen)

Demandbase
USD
190000
-
284000
US.svg
United States
Full-time
Remote
true
Introduction to Demandbase:  Demandbase is the leading account-based GTM platform for B2B enterprises to identify and target the right customers, at the right time, with the right message. With a unified view of intent data, AI-powered insights, and prescriptive actions, go-to-market teams can seamlessly align and execute with confidence. Thousands of businesses depend on Demandbase to maximize revenue, minimize waste, and consolidate their data and technology stacks - all in one platform. As a company, we’re as committed to growing careers as we are to building world-class technology. We invest heavily in people, our culture, and the community around us. We have offices in the San Francisco Bay Area, Seattle, and India, as well as a team in the UK, and allow employees to work remotely. We have also been continuously recognized as one of the best places to work in the San Francisco Bay Area including, “Best Workplaces for Millennials” and “Best Workplaces for Parents”! We're committed to attracting, developing, retaining, and promoting a diverse workforce. By ensuring that every Demandbase employee is able to bring a diversity of talents to work, we're increasingly capable of living out our mission to transform how B2B goes to market. We encourage people from historically underrepresented backgrounds and all walks of life to apply. Come grow with us at Demandbase! The base compensation range for this position for candidates in the SF Bay Area is: $190,000 - $284,000. For all other locations, the base compensation range is based on the primary work location of the candidate as our ranges are location specific. Actual compensation packages are based on a wide array of factors unique to each candidate, including but not limited to skillset, years of experience, and depth of experience. About this Pipeline Role: This is a pipeline posting for potential future openings on our Data Engineering team. While we are not actively hiring for this position at this time, we are always looking to connect with talented Senior Data Engineers who are passionate about building and optimizing large-scale data systems. By joining our talent pipeline, you’ll stay on our radar for future opportunities that align with your skills and interests as they become available. In this role, you would work on improving the core pipelines that power our Identification product and design new processes that enable our data science team to test and deploy new ML/AI models. The product delivered by this team is integrated into the core product stack and is a critical component of Demandbase’s account intelligence platform. If you are an experienced engineer who is passionate about data and eager to make an impact, we’d love to stay connected. What you’ll be doing (in a future role): Lead initiatives to build, expand, and improve real-world entity identification datasets Coordinate with downstream stakeholders with dependencies on identification datasets Design and build new pipelines to increase identification coverage and detect errors Collaborate with a skilled data science team to enable new ML/AI model development Provide insights into optimizing existing pipelines for performance and cost-efficiency Create and document descriptive plans for new feature implementation What we look for: Bachelor’s degree in computer science, engineering, mathematics, or related field 8+ years of relevant experience Progressive experience in the following areas: Object-oriented / strongly typed programming (Scala, Java, etc.) Productionizing and deploying Spark pipelines Complex SQL Apache Airflow or similar orchestration tools Strong SDLC principles (CI/CD, unit testing, Git process, etc.) Solid understanding of AWS services (IAM, EC2, S3) An interest in data science Even Better If You Have: Experience with Python, distributed computing, ad-targeting, or GenAI Background in the ad-tech industry Experience modeling and working with graph-based datasets Interested in joining our pipeline?If this role sounds like a great fit for your background and career goals, we encourage you to join our talent network by submitting your application. We’ll reach out when a relevant opportunity opens up! Benefits: We offer a comprehensive benefits package designed to support your health, well-being, and financial security. Our employees enjoy up to 100% paid premiums for Medical and Vision coverage, ensuring access to top-tier care for you and your loved ones. In addition, we provide a range of mental wellness resources, including access to Modern Health, to help support your emotional well-being. We believe in a healthy work-life harmony, which is why we offer a flexible PTO policy, 15 paid holidays in 2025—including a three-day break around July 4th and a full week off for Thanksgiving—and No Internal Meetings Fridays to give you uninterrupted time to focus on what matters most. For your financial future, we offer a competitive 401(k) plan, short-term and long-term disability coverage, life insurance, and other valuable benefits to ensure your financial peace of mind. Our Commitment to Diversity, Equity, and Inclusion at Demandbase: At Demandbase, we believe in creating a workplace culture that values and celebrates diversity in all its forms. We recognize that everyone brings unique experiences, perspectives, and identities to the table, and we are committed to building a community where everyone feels valued, respected, and supported. Discrimination of any kind is not tolerated, and we strive to ensure that every individual has an equal opportunity to succeed and grow, regardless of their gender identity, sexual orientation, disability, race, ethnicity, background, marital status, genetic information, education level, veteran status, national origin, or any other protected status. We do not automatically disqualify applicants with criminal records and will consider each applicant on a case-by-case basis. We recognize that not all candidates will have every skill or qualification listed in this job description. If you feel you have the level of experience to be successful in the role, we encourage you to apply! We acknowledge that true diversity and inclusion requires ongoing effort, and we are committed to doing the work required to make our workplace a safe and equitable space for all. Join us in building a community where we can learn from each other, celebrate our differences, and work together.      Personal information that you submit will be used by Demandbase for recruiting and other business purposes. Our Privacy Policy explains how we collect and use personal information.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Mirage.jpg

Software Engineer, ML Data Platform

Mirage
USD
0
185000
-
285000
US.svg
United States
Full-time
Remote
false
Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.Our ProductsCaptions Mirage Studio Our TechnologyAI Research @ MirageMirage Model AnnouncementSeeing Voices (white-paper)Press CoverageTechCrunchLenny’s PodcastForbes AI 50Fast CompanyOur InvestorsWe’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) We do not work with third-party recruiting agencies, please do not contact us** About the Role We’re looking for a Software Engineer to help build and scale the data systems that power our machine learning products. This role sits at the intersection of data engineering and ML infrastructure: you’ll design large-scale streaming pipelines, build tools that abstract infrastructure complexity for feature developers, and ensure that our feature data is reliable, discoverable, and performant across online and offline environments. If you’re passionate about building foundational systems that enable machine learning at scale — and love solving complex distributed data problems — this is the role for you.What You’ll DoDesign and scale feature pipelines: Build distributed data processing systems for feature extraction, orchestration, and serving — including real-time streaming, batch ingestion, and CDC workflows.Feature Extraction: Design and implement reliable, reusable feature pipelines for ML models, ensuring features are accurate, scalable, and production-ready through well-designed SDKs and orchestration tools.Build and evolve storage infrastructure: Manage multi-tier data systems (e.g. Bigtable for online features/state, BigQuery for analytics and offline training), including schema evolution, versioning, and compatibility.Own orchestration and reliability: Lead workflow orchestration design (e.g. Pub/Sub, Busboy, Airflow/Temporal), monitoring, and alerting to ensure reliability at 100M+ video scale.Collaborate with ML teams: Partner with ML engineers on feature availability, dataset curation, and streaming pipelines for training and inference.Optimize for performance and cost: Tune GPU utilization, resource allocation, and data processing efficiency to maximize system throughput and minimize cost.Enable analytics and insights: Support downstream analytics and data science workflows by ensuring data accessibility, discoverability, and performance at scale.Preferred Qualifications4+ years building distributed data systems, feature platforms, or ML infrastructure at scale.Strong experience with streaming and batch pipelines (e.g. Pub/Sub, Kafka, Dataflow, Beam, Flink, Spark).Deep knowledge of cloud-native data stores (e.g. Bigtable, BigQuery, DynamoDB, Snowflake) and schema/versioning best practices. Proficiency in Python and experience building developer-facing libraries or SDKs.Experience with Kubernetes, containerized data infrastructure, and workflow orchestration tools (e.g. Airflow, Temporal).Familiarity with ML workflows and feature store design — enough to partner closely with ML teams.Bonus: Experience working with video, audio, or other unstructured media data in a production environment.Benefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite! Grubhub subscriptionHealth & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)Multiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.
Data Engineer
Data Science & Analytics
DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Anduril Industries.jpg

Analytics Engineer

Anduril
USD
0
146000
-
194000
US.svg
United States
Full-time
Remote
false
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.ABOUT THE TEAM  We build robots that find other robots and knock them out of the sky. At a time when air superiority can no longer be taken for granted, the Air Defense (AD) Team provides mission critical capabilities to warfighters. From detection to tracking, identification, deterrence, and defeat, our family of networked sensors and effectors enables our customers to rapidly close the kill chain against a broad range of Unmanned Aerial System (UAS) threats. Working across product, engineering, sales, logistics, operations, and mission success, the Air Defense team develops, tests, deploys, and sustains the Anduril Air Defense Family of Systems (FoS) in challenging operational environments worldwide. ABOUT THE ROLE We are looking for a well-rounded Senior Analytics Engineer to support our Air Defense team. In this role, you will design and maintain data systems to ingest and transform structured, semistructured, and unstructured data, creating robust and efficient data models that deliver actionable insights. You will collaborate with stakeholders to gather requirements and implement secure, reliable analytics solutions, focusing on supporting a variety of data applications.  You will work with engineering and field operations teams to better understand how our family of systems and capabilities performs in production (both in and out of classified environments). Additionally, you will support operational teams by building out data products to support operational workflows. Ensuring data quality and performing root cause analysis on complex systems will be key, as will deploying mission-critical analytics solutions with occasional travel to build, test, and deploy capabilities in real-world scenarios. WHAT YOU'LL DO Help lead the development and maintenance of data systems architecture that enable high quality, low latency data ingestion across many different source systems and transformation for downstream data products and operational workflows. Collaborate with stakeholders to collect requirements and write code for secure, timely, accurate, trusted, and extensible data models. Become a trusted partner to AD’s leadership by creating reusable entities that generalize how our division operates and building both deployment- and business-related workflows, dashboards, and metrics that drive better and faster decision-making. Develop and deliver analytics solutions for evolving problems in network-isolated and classified environments. Help empower users to leverage the aforementioned data architecture to self-serve through enablement & knowledge-transfer, as well as improving the developer/end-user experience. REQUIRED QUALIFICATIONS 6+ years of experience in an analytics-focused or data-oriented role (e.g., data engineer, analytics engineer, data scientist, backend software engineer). Exceptional general problem-solving and critical-thinking skills, bringing to bear technical solutions to ambiguous and dynamic data problems. Proficiency designing scalable and adaptable logical and physical data models that facilitate both future product evolution and analytical requirements Strong skills in performing data-driven root cause analysis on complex systems. Excellent written and verbal communication skills, especially when communicating with a technical audience; an affection for documentation. Ability to drive consensus across internal and external stakeholders, with demonstrated experience leading through influence. Comfortable navigating a polyglot ecosystem and a proven ability to quickly understand established code bases and operate at different levels of abstraction. Procedural fluency in writing, debugging, profiling, testing, and maintaining performant Python and SQL code. Experience writing another language (e.g., JavaScript/TypeScript, Go, Java, Scala, Haskell, OCaml, Julia, etc.). Proficiency in building end-to-end, scalable data solutions (including the steps for beyond implementation such as enablement, support, integrations, etc.) in a cloud setting. Experience writing pipelines with common orchestration tooling (e.g., Flyte, Airflow, Dagster, etc.). Experience with DevOps and software deployment: containerization & container orchestration (e.g., Docker, Kubernetes, Helm, etc.), GitOps & CI/CD tooling (e.g., CircleCI, ArgoCD, etc.), observability/monitoring (e.g., DataDog, Grafana, etc.) Experience with infrastructure-as-code (e.g., Terraform), core- and data-related cloud services (e.g., AWS, Azure). Experience writing and working with microservices architectures (e.g., gRPC + protocol buffers). Current US Person status (U.S. citizen or permanent resident) with the ability to obtain and maintain a U.S. Department of Defense (DOD) Secret Clearance or higher. PREFERRED QUALIFICATIONS Ability to comprehend and appropriately modify software written in a systems or lower-level language such as C/C++, Zig, Rust, etc. Experience with setting up and managing infrastructure to support analytical workloads on-premises or in resource-constrained environments. Experience delivering and maintaining systems that securely egress data from air-gapped and security-hardened networks. Experience working with data formats (e.g. MCAP, HDF5, etc.) relatively common to robotics. Experience with dbt, Palantir Foundry, Trino/Presto, Apache Spark, Apache Kafka, Apache Flink, and/or in-memory databases (e.g., DuckDB, Polars). Experience with the Nix (dependency management & system configuration) ecosystem. Strong Linux fundamentals. Exposure to the the technical, programmatic, and operational challenges of developing and deploying autonomous weapon systems across command echelons. Deep intellectual interest in the intersection of analytics and the physical hardware world, motivated by Anduril’s mission. Prior defense, aerospace, or intelligence domain experience. US Salary Range$146,000—$194,000 USD  The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including:  Healthcare Benefits  US Roles: Comprehensive medical, dental, and vision plans at little to no cost to you.  UK & AUS Roles: We cover full cost of medical insurance premiums for you and your dependents.  IE Roles: We offer an annual contribution toward your private health insurance for you and your dependents.  Additional Benefits  Income Protection: Anduril covers life and disability insurance for all employees.  Generous time off: Highly competitive PTO plans with a holiday hiatus in December. Caregiver & Wellness Leave is available to care for family members, bond with a new baby, or address your own medical needs.  Family Planning & Parenting Support: Coverage for fertility treatments (e.g., IVF, preservation), adoption, and gestational carriers, along with resources to support you and your partner from planning to parenting.  Mental Health Resources: Access free mental health resources 24/7, including therapy and life coaching. Additional work-life services, such as legal and financial support, are also available.  Professional Development: Annual reimbursement for professional development  Commuter Benefits: Company-funded commuter benefits based on your region.  Relocation Assistance: Available depending on role eligibility.  Retirement Savings Plan  US Roles: Traditional 401(k), Roth, and after-tax (mega backdoor Roth) options.  UK & IE Roles: Pension plan with employer match.  AUS Roles: Superannuation plan.  The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.  To view Anduril's candidate data privacy policy, please visit https://anduril.com/applicant-privacy-notice/. 
Data Engineer
Data Science & Analytics
Apply
Hidden link
Databricks.jpg

Big Data Solutions Consultant, Spark Expert

Databricks
0
0
-
0
NL.svg
Netherlands
Full-time
Remote
false
CSQ127R55 As a Big Data Solutions Consultant (Resident Solutions Architect) in our Professional Services team you will work with clients on short to medium term customer engagements on their big data challenges using the Databricks platform. You will provide data engineering, data science, and cloud technology projects which require integrating with client systems, training, and other technical tasks to help customers to get most value out of their data. RSAs are billable and know how to complete projects according to specification with excellent customer service. You will report to the regional Manager/Lead. The impact you will have: You will guide customers as they implement transformational big data projects, including end-to-end development and deployment of industry-leading big data and AI applications You will assure that Databricks best practices are being used within all projects and that our quality of service and implementation is strictly followed You will facilitate technical workshops, discovery and design sessions, customer requirements gathering and scoping for new and existing strategic customers Assist the Professional Services leader and project managers with level of effort estimation and mitigation of risk within customer proposals and statements of work Architect, design, develop, deploy, operationalize and document complex customer engagements individually or as part of an extended team as the technical lead and overall authority Knowledge transfer, enablement and mentoring of other team members, customers and partners, including developing reusable project artifacts Provide experience to the consulting team, provide best practices in client engagement to other teams What we look for: 4+ years experience in data engineering, data platforms & analytics Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) Comfort with object-oriented and functional programming in Scala and Python Experience in building scalable streaming and batch solutions using cloud-native components Strong knowledge of distributed computing with Apache Spark™ Travel to customers 30% of the time Nice to have: Databricks Certification About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook. Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.  Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
PathAI.jpg

Database Specialist - Contract

PathAI
-
US.svg
United States
Contractor
Remote
true
Who We Are PathAI's mission is to improve patient outcomes with AI-powered pathology. Our platform promises substantial improvements to the accuracy of diagnosis and the efficacy of treatment of diseases like cancer, leveraging modern approaches in machine learning. Our team, comprising diverse employees with a wide range of backgrounds and experiences, is passionate about solving challenging problems and making a huge impact. We are seeking an experienced Contract Database / Data Warehouse Specialist to enhance the scalability, performance, and maintainability of our ML data infrastructure. The ideal candidate will bring expertise in relational databases, ETL processes, and modern big data deployments. You will work closely with our MLOps and ML engineering teams to optimize storage usage, modernize ETL pipelines, deploy new technology, and/or build / enhance tools that support analytics and machine learning workflows. Contract Duration: Minimum 6 months Location: Remote (U.S.) What You’ll Do Analyze and optimize storage strategies for ML experiment data and metadata. Design and implement intelligent retention and expiration for large-scale datasets. Modernize and refactor ETL pipelines to improve scalability and ease of maintenance. Build and enhance database-backed applications supporting ML R&D and production analytics. Collaborate with ML engineers, SREs, and platform teams. Provide knowledge transfer for long-term maintainers. What You’ll Need Proven expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning. Strong experience with ETL development and cloud data warehousing (e.g., Snowflake, Redshift). Familiarity with big data deployments and scalable architectures such as Spark and Hive. Experience with Apache Airflow for systems automation. Proficiency in Python for application development, data processing and automation. Understanding of S3-based storage and large-scale data management strategies. Ability to write clear technical documentation and collaborate effectively across teams. Experience with query optimization, data partitioning strategies, and cost optimization in cloud environments Nice to Have Background in machine learning data pipelines or analytics-heavy environments. Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments. We Want to Hear From You At PathAI, we are looking for individuals who are team players, are willing to do the work no matter how big or small it may be, and who are passionate about everything they do. If this sounds like you, even if you may not match the job description to a tee, we encourage you to apply. You could be exactly what we're looking for.  PathAI is an equal opportunity employer, dedicated to creating a workplace that is free of harassment and discrimination. We base our employment decisions on business needs, job requirements, and qualifications — that's all. We do not discriminate based on race, gender, religion, health, personal beliefs, age, family or parental status, or any other status. We don't tolerate any kind of discrimination or bias, and we are looking for teammates who feel the same way.  #LI-Remote
Data Engineer
Data Science & Analytics
DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Sanity.jpg

Head of Data & Analytics

Sanity
USD
0
280000
-
330000
US.svg
United States
CA.svg
Canada
earth.svg
Europe
Full-time
Remote
true
Build and lead our data function as we scale, turning insights into strategic decisions that drive our next phase of growth.At Sanity.io, we’re building the future of AI powered Content Operations. Our AI Content Operating System gives teams the freedom to model, create and automate content the way their business works. Accelerating digital development and super charging content operations efficiency. Companies like SKIMS, Figma, Riot Games, Anthropic, COMPLEX, Nordstrom and Morningbrew are using Sanity to power and automate their content operations.As our Head of Data & Analytics, you'll build and lead our data function during a pivotal growth phase. You'll be both player and coach – rolling up your sleeves for hands-on technical work while scaling a team of data engineers, analytics engineers, and data analysts. This role will shape how we leverage data to drive strategic decisions and accelerate our next phase of growth.What you would do:Build and scale the data teamRecruit and manage data engineers, analytics engineers, and data analystsFoster a high-performing, collaborative culture focused on delivering actionable insightsBalance hands-on technical work with strategic leadership and team developmentEstablish scalable processes and best practices as the team growsDrive data strategy and infrastructureOwn our data strategy and ensure alignment with business objectivesDesign scalable data pipelines, architecture, and governance using modern tools like BigQuery, dbt, Airflow, Amplitude and LookerPartner with engineering to enhance product telemetry and data collectionImplement data quality frameworks and monitoring systemsEnable data-driven decision makingWork with product, sales, marketing, and leadership teams to deliver insights that drive business outcomesBuild dashboards and self-service analytics capabilitiesLead strategic analyses on customer behavior, product adoption, and business performanceTranslate complex data into clear recommendations for technical and non-technical stakeholdersAbout you:Based in: East Coast of North America or EuropeExperience: 5+ years in data roles with 3+ years building and scaling data teams at a fast-growing SaaS startupLeadership: Drive to create change and ability to define strategy to develop Data and Analytics requirements and priorities. Proven ability to executeBusiness acumen: Deep understanding of SaaS metrics and how data drives business strategyCommunication: Outstanding ability to influence stakeholders and translate technical concepts into business insightsOwnership mindset: High accountability and sense of urgency with bias toward action and problem-solving – able to do IC work when neededTechnical expertise: Strong in SQL, Python, data modeling, and modern data stack (we use BigQuery, dbt, Airflow, Looker)What we can offer:A highly-skilled, inspiring, and supportive teamPositive, flexible, and trust-based work environment that encourages long-term professional and personal growthA global, multi-culturally diverse group of colleagues and customersComprehensive health plans and perksA healthy work-life balance that accommodates individual and family needsCompetitive stock options programSalary Range: $280k - $330k annually. Final compensation within this range will be determined based on the candidate’s experience and skill set.Who we are:Sanity.io is a modern, flexible content operating system that replaces rigid legacy content management systems. One of our big differentiators is treating content as data so that it can be stored in a single source of truth, but seamlessly adapted and personalized for any channel without extra effort. Forward-thinking companies choose Sanity because they can create tailored content authoring experiences, customized workflows, and content models that reflect their business.Sanity recently raised a $85m Series C led by GP Bullhound and is also backed by leading investors like ICONIQ Growth, Threshold Ventures, Heavybit and Shopify, as well as founders of companies like Vercel, WPEngine, Twitter, Mux, Netlify and Heroku. This funding round has put Sanity in a strong position for accelerated growth in the coming years.You can only build a great company with a great culture. Sanity is a 200+ person company with highly committed and ambitious people. We are pioneers, we exist for our customers, we are hel ved, and we love type two fun! Read more about our values here!Sanity.io pledges to be an organization that reflects the globally diverse audience that our product serves. We believe that in addition to hiring the best talent, a diversity of perspectives, ideas, and cultures leads to the creation of better products and services. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, or gender identity.
Data Engineer
Data Science & Analytics
Data Analyst
Data Science & Analytics
Apply
Hidden link
Jasper.jpg

Data Engineer

Jasper
-
FR.svg
France
Full-time
Remote
false
Jasper is the leading AI marketing platform, enabling the world's most innovative companies to reimagine their end-to-end marketing workflows and drive higher ROI through increased brand consistency, efficiency, and personalization at scale. Jasper has been recognized as "one of the Top 15 Most Innovative AI Companies of 2024" by Fast Company and is trusted by nearly 20% of the Fortune 500 – including Prudential, Ulta Beauty, and Wayfair. Founded in 2021, Jasper is a remote-first organization with team members across the US, France, and Australia.About The RoleJasper Research is seeking an experienced Data Engineer who will play a pivotal role in supporting our image research team to help design, scale, and maintain our data infrastructure, as well as data processing pipelines powering the training of state-of-the-art multimodal models.In this role, you will work closely with our research scientists and research engineers to collect, clean, and process large-scale datasets from a variety of sources, ensuring that our models are built on the best possible data foundations.This role is open to candidates located in France. It will be a hybrid setup, which requires you to come into the office when necessary. The office is based at Station F in Paris, the vibrant hub of the French startup ecosystem. Our efficient and lean team at Station F thrives on innovation and collaboration.What you will do at JasperDesign and implement end-to-end scalable data pipelines to ingest, transform, and load data into our data warehouse.Analyze existing datasets and implement robust data validation, deduplication, and bias mitigation processes to ensure the highest quality and diversity of training data.Create training sets from existing data, using classical computer vision algorithms, vision models and LLMs.Optimize data loading, preprocessing, and augmentation workflows to eliminate bottlenecks and maximize training efficiency.Document all data processes, schemas, and transformations to ensure full reproducibility and transparency for the research team.Work hand-in-hand with research scientists and engineers to understand their data needs, provide actionable insights, and rapidly iterate on pipeline improvements.Source new multi-modal data from public sources.What you will bring to JasperBachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.Strong experience as a Data Engineer or in a similar data-focused role.Strong experience in image manipulation at scale and understanding of computer vision.Hands-on experience with distributed computing frameworks and cloud platforms for distributed ML training.Familiarity with cloud-based data warehousing and storage solutions (e.g., BigQuery).Strong attention to detail, commitment to data quality, and a proactive approach to supporting research needs.Preferred QualificationsKnowledge of data transformation and enrichment techniques, including clustering, deduplication, and synthetic data generationExperience with vector databases for ML dataProficiency in Python and SQL for data manipulation and analysis.Proficiency in at least one ML library (TensorFlow, PyTorch, Jax). PyTorch preferred.Contributions to open-source data tools or projects.Familiarity with data privacy and compliance regulations (GDPR, CCPA).Benefits & PerksMutuelle coverage for hospitalisation and mental health care provided through Alan Comprehensive healthcare planFlexible PTO with a FlexExperience budget (€552 annually) to help you make the most of your time away from workFlexWellness program (€1,640 annually) to help support your personal health goalsGenerous budget for home office set up €1,375 annual learning and development stipend
Data Engineer
Data Science & Analytics
Apply
Hidden link
No job found
There is no job in this category at the moment. Please try again later