Top Data Engineer Jobs Openings in 2025

Looking for opportunities in Data Engineer? This curated list features the latest Data Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.

BJAK.jpg

Data Engineer

Bjak
-
MY.svg
Malaysia
Full-time
Remote
false
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
BJAK.jpg

データエンジニア (Data Engineer)

Bjak
-
JP.svg
Japan
Full-time
Remote
false
言語モデルを現実のアプリケーションへ変革する私たちはグローバルなユーザーを対象とした AI システムを構築しています。現在は AI トランジションの時代にあり、この新しいプロジェクトチームは、現実世界への影響力を拡大し、世界中で最大限に活用されるアプリケーションの構築に注力します。このポジションはグローバルな役割であり、柔軟なリモートワークと本社での対面コラボレーションを組み合わせたハイブリッド勤務を採用しています。製品、エンジニアリング、オペレーション、インフラ、データの各地域チームと緊密に連携し、影響力のある AI ソリューションを構築・拡張します。この役割が重要な理由最先端のモデルをファインチューニングし、評価フレームワークを設計し、AI 機能を本番環境に投入します。あなたの仕事は、モデルがインテリジェントであるだけでなく、安全で信頼でき、大規模に影響力を持つことを保証します。主な業務内容大規模モデルのファインチューニングのために、ユーザー生成のテキストおよび画像データを収集・クレンジング・前処理するクラウドソーシングおよび社内ラベリングチームを活用し、スケーラブルなデータラベリングパイプラインを設計・管理するコンテンツモデレーション用の自動化データセット(例:安全コンテンツ vs 非安全コンテンツ)を構築・維持する研究者やエンジニアと協力し、データセットが高品質、多様性を持ち、モデル学習ニーズに適合するようにする求める人物像主体性と独立性を好む方「行動から明確さが生まれる」と信じ、完璧な計画を待つのではなくプロトタイプ・テスト・反復を実行できる方スタートアップ特有の混乱下でも冷静かつ効果的に行動できる方 —— 優先順位の変化やゼロからの構築を恐れないスピードを重視し、完璧を待つよりも「今すぐ価値ある成果」を届けることを優先できる方フィードバックや失敗を成長の一部と捉え、常にレベルアップを目指せる方謙虚さ、向上心、行動力を持ち、仲間を助けながら進める方応募資格機械学習や大規模モデルのファインチューニング用データセット準備の実務経験テキストおよび画像データにおけるデータクレンジング、前処理、変換スキルデータラベリングワークフローやラベルデータの品質保証に関する実務経験モデレーションデータセット(安全性、コンプライアンス、フィルタリング)の構築・維持経験Python、SQL などのスクリプト言語に精通し、大規模データパイプラインの運用経験待遇・福利厚生フラットな組織構造と本当のオーナーシッププロダクト方向性や意思決定への全面的な関与柔軟な勤務形態プロダクト・データ・エンジニアリングを横断する高インパクトな役割市場最高水準の給与と成果に基づくボーナスグローバルなプロダクト開発への参画機会充実した福利厚生 —— 住宅補助、高品質な社員食堂、残業食事補助健康・歯科・眼科保険グローバル旅行保険(本人および扶養家族対象)無制限で柔軟な有給休暇制度チームと文化私たちは高密度・高パフォーマンスのチームであり、高品質な仕事とグローバルインパクトに注力しています。オーナーのように行動し、スピード、明確さ、徹底的な責任感を重視します。成長意欲があり、卓越性を大切にする方を歓迎します。会社概要:BJAKBJAK は東南アジア最大の保険アグリゲーターで、800 万人以上のユーザーを持ち、社員が完全に所有する企業です。本社はマレーシアにあり、タイ、台湾、日本でも事業を展開しています。Bjak.com を通じて、数百万人のユーザーに透明性が高く、手頃な金融保障を提供しています。また、API、自動化、AI などの先端技術を駆使し、複雑な金融商品をシンプルにし、次世代のインテリジェントな金融システムを構築しています。現実世界にインパクトを与える AI システムを構築し、高インパクトな環境で急速に成長したい方、ぜひご応募ください。------------------------------------------Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
BJAK.jpg

Data Engineer

Bjak
-
No items found.
Full-time
Remote
true
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This is a remote role based in Indonesia, working closely with our HQ in Malaysia and cross-functional regional teams. You’ll operate across the stack, from backend logic and integration to frontend delivery, building intelligent systems that scale fast and matter deeply.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Basis.jpg

Tech Lead - Data Engineering

Basis AI
USD
100000
-
300000
US.svg
United States
Full-time
Remote
false
About BasisBasis equips accountants with a team of AI agents to take on real workflows.We have hit product-market fit, have more demand than we can meet, and just raised $34m to scale at a speed that meets this moment.Built in New York City. Read more about Basis here.About the TeamThe Data Engineering team at Basis owns and builds the tooling that allows our agents to interact with data from outside of Basis.We care deeply about clarity: clean abstractions, simple mental models, and clear interfaces that help our AI and product teams move fast without breaking things.About the RoleAs a data engineer at Basis you'll own projects completely from scoping to delivery. You'll be the Responsible Party (RP) for the systems you design. That means you decide how to build them, how to measure success, and when they're ready to ship.We trust you to manage yourself. You'll plan your own projects, work closely with your pod, and take full responsibility for execution and quality. You'll build systems that serve every part of Basis: AI, product, and internal agents. And you'll make those systems fast, reliable, and easy to understand.What you’ll be doing:Build and standardize our data platformDesign data pipelines that ingest, validate, and transform accounting data into clean, reliable datasets.Define schemas and data contracts that balance flexibility with correctness.Build validation, lineage tracking, and drift detection into every pipeline.Create interfaces that make data discoverable, computable, and observable throughout the system.Model the domain as a systemTranslate accounting concepts into well-structured ontologies: entities, relationships, and rules.Create abstractions that help AI systems reason safely about real-world constraints.Design for clarity: make complex workflows understandable through schema, code, and documentation.Lead through clarity and technical excellenceOwn the architectural vision for your area and keep it consistent over time.Run effective design reviews that challenge assumptions and drive alignment.Mentor engineers on how to think about systems: from load testing to schema design to observability patterns.Simplify aggressively—removing accidental complexity and enforcing clean, stable abstractions.📍 Location: NYC, Flatiron office. In-person team.In accordance with New York State regulations, the salary range for this position is $100,000 –$300,000. This range represents our broad compensation philosophy and covers various responsibility and experience levels. Additionally, all employees are eligible to participate in our equity plan and benefits program. We are committed to meritocratic and competitive compensation.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Basis.jpg

Tech Lead - Data Engineering

Basis AI
USD
100000
-
300000
US.svg
United States
Full-time
Remote
false
About BasisBasis equips accountants with a team of AI agents to take on real workflows.We have hit product-market fit, have more demand than we can meet, and just raised $34m to scale at a speed that meets this moment.Built in New York City. Read more about Basis here.About the TeamThe Data Engineering team at Basis owns and builds the tooling that allows our agents to interact with data from outside of Basis.We care deeply about clarity: clean abstractions, simple mental models, and clear interfaces that help our AI and product teams move fast without breaking things.About the RoleAs a Tech Lead on the Data Engineering team, you'll own the technical vision for how agents interact with data from other systems.You'll design solid architectures, make trade-offs clear, and teach others how to think about distributed systems effectively. You'll ensure consistency across runtime, data, and schema layers so our systems scale predictably and stay understandable as we grow.You'll lead by example through your code, design reviews, and documented decisions, making sure the platform is both powerful and elegantly simple.What you’ll be doing:Build and standardize our data platformDesign data pipelines that ingest, validate, and transform accounting data into clean, reliable datasets.Define schemas and data contracts that balance flexibility with correctness.Build validation, lineage tracking, and drift detection into every pipeline.Create interfaces that make data discoverable, computable, and observable throughout the system.Model the domain as a systemTranslate accounting concepts into well-structured ontologies: entities, relationships, and rules.Create abstractions that help AI systems reason safely about real-world constraints.Design for clarity: make complex workflows understandable through schema, code, and documentation.Lead through clarity and technical excellenceOwn the architectural vision for your area and keep it consistent over time.Run effective design reviews that challenge assumptions and drive alignment.Mentor engineers on how to think about systems: from load testing to schema design to observability patterns.Simplify aggressively by removing unnecessary complexity and maintaining clean, stable abstractions.📍 Location: NYC, Flatiron office. In-person team.In accordance with New York State regulations, the salary range for this position is $100,000 –$300,000. This range represents our broad compensation philosophy and covers various responsibility and experience levels. Additionally, all employees are eligible to participate in our equity plan and benefits program. We are committed to meritocratic and competitive compensation.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Mirage.jpg

Member of Technical Staff, Training Data Infrastructure

Mirage
USD
0
215000
-
300000
US.svg
United States
Full-time
Remote
false
Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.Our ProductsCaptions Mirage Studio Our TechnologyAI Research @ MirageMirage Model AnnouncementSeeing Voices (white-paper)Press CoverageTechCrunchLenny’s PodcastForbes AI 50Fast CompanyOur InvestorsWe’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) We do not work with third-party recruiting agencies, please do not contact us** About the Role and Team:Captions seeks an exceptional Research Engineer (MOTS) to drive innovation in training data infrastructure. You'll conduct research on and develop sophisticated distributed training workflows and optimized data processing systems for massive video and multimodal datasets. Beyond pure performance, you'll develop deep insight into our data to maximize training effectiveness. As an early member of our ML Research team, you'll build foundational systems that directly impact our ability to train models powering video and multimodal creation for millions of users.You'll work directly alongside our research and engineering teams in our NYC office. We've intentionally built a culture where infrastructure and data work is highly valued - your success will be measured by the reliability and performance of our systems, not by your ability to navigate politics. We're a team that loves diving deep into technical problems and emerging with practical solutions.Our team values:Quick iteration and practical solutions.Open discussion of technical approaches.Direct access to decision makers.Regular sharing of learnings, results, and iterative work.Key Responsibilities:Infrastructure Development:Build performant pipelines for processing video and multimodal training data at scale.Design distributed systems that scale seamlessly with our rapidly growing video and multimodal datasets.Create efficient data loading systems optimized for GPU training throughput.Implement comprehensive telemetry for video processing and training pipelines.Core Systems Development:Create foundation data processing systems that intelligently cache and reuse expensive computations across the training pipeline.Build robust data validation and quality measurement systems for video and multimodal content.Design systems for data versioning and reproducing complex multimodal training runs.Develop efficient storage and compute patterns for high-dimensional data and learned representations.System Optimization:Own and improve end-to-end training pipeline performance.Build systems for efficient storage and retrieval of video training data.Build frameworks for systematic data and model quality improvement.Develop infrastructure supporting fast research iteration cycles.Build tools and systems for deep understanding of our training data characteristics.Research & Product Impact:Build infrastructure enabling rapid testing of research hypotheses.Create systems for incorporating user feedback into training workflows.Design measurement frameworks that connect model improvements to user outcomes.Enable systematic experimentation with direct user feedback loops.Requirements: Technical Background:Bachelor's or Master's degree in Computer Science, Machine Learning, or related field.3+ years experience in ML infrastructure development or large-scale data engineering.Strong programming skills, particularly in Python and distributed computing frameworks.Expertise in building and optimizing high-throughput data pipelines.Proven experience with video/image data pre-processing and feature engineering.Deep knowledge of machine learning workflows, including model training and data loading systems.System Development:Track record in performance optimization and system scaling.Experience with cluster management and distributed computing.Background in MLOps and infrastructure monitoring.Demonstrated ability to build reliable, large-scale data processing systems.Engineering Approach:Love tackling hard technical problems head-on.Take ownership while knowing when to loop in teammates.Get excited about improving system performance.Want to work directly with researchers and engineers who are equally passionate about building great systems.Benefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite! Grubhub subscriptionHealth & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)Multiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Craft.co

Sr. Data Engineer (Poland)

Craft
-
PL.svg
Poland
Full-time
Remote
true
About Craft:Craft is the leader in supplier risk intelligence, enabling enterprises to discover, evaluate, and continuously monitor their suppliers at scale. Our unique, proprietary data platform tracks real-time signals on millions of companies globally, delivering best-in-class monitoring and insight into global supply chains. Our customers include Fortune 500 companies, government agencies, SMEs, and global service platforms. Through our configurable Software-as-a-Service portal, our customers can monitor any company they work with and execute critical actions in real-time. We’ve developed distribution partnerships with some of the largest integrators and software platforms globally.We are a post-Series B high-growth technology company backed by top-tier investors in Silicon Valley and Europe, headquartered in San Francisco with hubs in Seattle, London, and Warsaw. We support remote and hybrid work, with team members across North America and Europe.We're looking for innovative and driven people passionate about building the future of Enterprise Intelligence to join our growing team!About the Role:We’re growing quickly and looking to hire several senior-level data engineers for multiple teams. Each team is responsible for a key product within the organization. As a core member of the team, you will have great say in how solutions are engineered and delivered. Craft gives engineers a lot of responsibility and authority, which is matched by our investment in their growth and development.Our data engineers carry a lot of software engineering responsibilities, so we're looking for engineers who have strong Python coding experience, Pandas expertise, and solid software engineering practices. What You'll Do:Build and optimize data pipelines (batch and streaming).Extracting, analyzing and modeling rich and diverse datasets of structured and unstructured dataDesign software that is easily testable and maintainable.Support in setting data strategies and our vision.Keep track of emerging technologies and trends in the Data Engineering world, incorporating modern tooling and best practices at Craft.Work on extendable data processing systems that allows to add and scale pipelines.Apply machine learning techniques such as anomaly detection, clustering, regression classification, and summarization to extract value from our data sets.Leverage AI-powered development tools (e.g. Cursor) to accelerate development, refactoring, and code generation.Who You Are:4+ years of experience in Data Engineering.4+ years of experience with Python.Experience in developing, maintaining, and ensuring the reliability, scalability, fault tolerance, and observability of data pipelines in a production environment.Have fundamental knowledge of data engineering techniques: ETL/ELT, batch and streaming, DWH, Data Lakes, distributed processing.Strong knowledge of SDLC and solid software engineering practices.Familiar with infrastructure-as-code approach.Demonstrated curiosity through asking questions, digging into new technologies, and always trying to grow.Strong problem solving and the ability to communicate ideas effectively.Self-starter, independent, likes to take initiative.Familiarity with at least some of the technologies in our current tech stack:Python, PySpark, Pandas, SQL (PostgreSQL), ElasticSearch, Airflow, DockerDatabricks, AWS (S3, Batch, Athena, RDS, DynamoDB, Glue, ECS, Amazon Neptune)CircleCI, GitHub, TerraformKnowledge surrounding AI-assisted coding and experience with Cursor, Co-Pilot, or CodexA strong track record of leveraging AI IDEs like Cursor to:Rapidly scaffold components and APIsRefactor legacy codebases efficientlyReduce context-switching and accelerate documentationExperiment and prototype with near-instant feedbackWhat We Offer:Option to work as a B2B contractor or full-time employeeCompetitive salary at a well-funded, fast-growing startupPTO days so you can take the time you need to refresh!Full-time employees: 28 PTO days allotted + paid public holidays B2B contractors: 15 PTO days allotted + paid public holidays100% remote work (or hybrid if you prefer! We have coworking space in center of Warsaw.)A Note to Candidates:We are an equal opportunity employer who values and encourages diversity, equity and belonging at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, caste, or disability status.Don’t meet every requirement? Studies have shown that women, communities of color and historically underrepresented talent are less likely to apply to jobs unless they meet every single qualification. At Craft, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we strongly encourage you to apply. You may be just the right candidate for this or other roles!
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Figure.jpg

Staff Data Engineer

Figure AI
USD
150000
-
350000
US.svg
United States
Full-time
Remote
false
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA. Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system. Responsibilities Design, deploy, and maintain Figure's training clusters Architect and maintain scalable deep learning frameworks for training on massive robot datasets Work together with AI researchers to implement training of new model architectures at a large scale Implement distributed training and parallelization strategies to reduce model development cycles Implement tooling for data processing, model experimentation, and continuous integration Requirements Strong software engineering fundamentals Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field Experience with Python and PyTorch Experience managing HPC clusters for deep neural network training Minimum of 4 years of professional, full-time experience building reliable backend systems Bonus Qualifications Experience managing cloud infrastructure (AWS, Azure, GCP) Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.) Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.) The US base salary range for this full-time position is between $150,000 - $350,000 annually. The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
Sanity.jpg

Sr. Data Engineer, New Venture

Sanity
USD
0
210000
-
265000
US.svg
United States
Full-time
Remote
false
At Sanity.io, we’re building the future of AI-powered Content Operations. Our AI Content Operating System gives teams the freedom to model, create, and automate content the way their business works, accelerating digital development and supercharging content operations efficiency. Companies like SKIMS, Figma, Riot Games, Anthropic, COMPLEX, Nordstrom, and Morningbrew are using Sanity to power and automate their content operations.As part of our new venture, your work will center on addressing one of AI’s toughest problems: how to help machines truly understand and use human-created content. You’ll build systems that structure and enrich large volumes of information to enable AI agents and LLMs to access the right context at the right time. This means designing and developing tools and pipelines that shape, structure, and connect information and content in innovative ways, and creating new methods to ensure AIs reflect the most accurate, authentic, and up-to-date representation of a business, its brand, products, and knowledge base.As a Senior Data Engineer you'll architect and optimize the data infrastructure that powers our next generation of AI capabilities. You'll be the engine behind our AI systems, building scalable, efficient data pipelines that process massive volumes of content while maintaining low latency and managing costs intelligently. Your work will directly enable AI agents and LLMs to access the right data at the right time. You'll join a small, cross-functional team where your expertise in data engineering and ML infrastructure will be critical to turning ambitious AI concepts into production-ready systems. If you're passionate about building robust data systems that power cutting-edge AI, obsess over performance optimization, and love solving complex scaling challenges, we'd love to have you on the team.What you will do:Design, build, and optimize scalable data pipelines for AI and ML workloads, handling large volumes of structured and unstructured content data.Architect data processing systems that transform, enrich, and prepare content for LLM consumption, with a focus on latency optimization and cost efficiency.Build ETL/ELT workflows that extract, transform, and load data from diverse sources to support real-time and batch AI operations.Implement data quality monitoring and observability systems to ensure pipeline reliability and data accuracy for AI models.Collaborate with engineers and product teams to understand data requirements and design optimal data architectures that support AI features.Optimize data storage strategies across data lakes, warehouses, and vector databases to balance performance, cost, and scalability.Build automated data validation and testing frameworks to maintain data integrity throughout the pipeline.Stay at the forefront of LLM research, understanding model behaviors, limitations, and capabilities to inform system design decisions.Monitor and optimize pipeline performance, identifying bottlenecks and implementing solutions to improve throughput and reduce latency.Create clear documentation of data architectures, pipeline logic, and operational procedures.About you:Based in the San Francisco Bay Area and able to work at least 2 days per week in our San Francisco office.5+ years of data engineering experience, with at least 2 years focused on AI/ML data pipelines or supporting machine learning workloads.High level of proficiency in Python and SQL.Strong experience with distributed data processing frameworks like Apache Spark, Dask, or Ray.Proficiency with GCP and their data services.Experience with real-time data streaming technologies like Kafka, Redpanda or NATS.Familiarity with vector databases (e.g., Milvus, ElasticSearch, Vespa) and their role in AI applications.Experience with data modeling, schema design, and working with both relational and NoSQL databases (PostgreSQL, MongoDB, Cassandra).Strong focus on performance optimization, cost management, and building systems that scale efficiently.Experience implementing data observability and monitoring solutions (e.g., Prometheus, ClickHouse).Ability to write clean, well-documented, maintainable code with proper testing practices.Excellent problem-solving skills and a data-driven approach to decision making.Strong communication skills and ability to collaborate effectively with cross-functional teams.Comfortable with ambiguity and excited about working on undefined problems that require creative solutions.Familiarity with data pipeline orchestration tools such as Airflow, Dagster, Prefect, or similar frameworks is a nice to have.What we can offer:A highly skilled, inspiring, and supportive team.Positive, flexible, and trust-based work environment that encourages long-term professional and personal growth.A global, multi-culturally team of colleagues and customers.Comprehensive health plans and perks.A healthy work-life balance that accommodates individual and family needs.Competitive salary and stock options program.Base Salary Range: $210,000 - $265,000 annually. Final compensation within this range will be determined based on the candidate’s experience and skill set. Who we are:Sanity.io is a modern, flexible content operating system that replaces rigid legacy content management systems. One of our big differentiators is treating content as data so that it can be stored in a single source of truth, but seamlessly adapted and personalized for any channel without extra effort. Forward-thinking companies choose Sanity because they can create tailored content authoring experiences, customized workflows, and content models that reflect their business.Sanity recently raised a $85m Series C led by GP Bullhound and is also backed by leading investors like ICONIQ Growth, Threshold Ventures, Heavybit and Shopify, as well as founders of companies like Vercel, WPEngine, Twitter, Mux, Netlify and Heroku. This funding round has put Sanity in a strong position for accelerated growth in the coming years.You can only build a great company with a great culture. Sanity is a 200+ person company with highly committed and ambitious people. We are pioneers, we exist for our customers, we are hel ved, and we love type two fun! Read more about our values here!Sanity.io pledges to be an organization that reflects the globally diverse audience that our product serves. We believe that in addition to hiring the best talent, a diversity of perspectives, ideas, and cultures leads to the creation of better products and services. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or gender identity.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Demandbase.jpg

Senior Data Engineer (Evergreen)

Demandbase
USD
190000
-
284000
US.svg
United States
Full-time
Remote
true
Introduction to Demandbase:  Demandbase is the leading account-based GTM platform for B2B enterprises to identify and target the right customers, at the right time, with the right message. With a unified view of intent data, AI-powered insights, and prescriptive actions, go-to-market teams can seamlessly align and execute with confidence. Thousands of businesses depend on Demandbase to maximize revenue, minimize waste, and consolidate their data and technology stacks - all in one platform. As a company, we’re as committed to growing careers as we are to building world-class technology. We invest heavily in people, our culture, and the community around us. We have offices in the San Francisco Bay Area, Seattle, and India, as well as a team in the UK, and allow employees to work remotely. We have also been continuously recognized as one of the best places to work in the San Francisco Bay Area including, “Best Workplaces for Millennials” and “Best Workplaces for Parents”! We're committed to attracting, developing, retaining, and promoting a diverse workforce. By ensuring that every Demandbase employee is able to bring a diversity of talents to work, we're increasingly capable of living out our mission to transform how B2B goes to market. We encourage people from historically underrepresented backgrounds and all walks of life to apply. Come grow with us at Demandbase! The base compensation range for this position for candidates in the SF Bay Area is: $190,000 - $284,000. For all other locations, the base compensation range is based on the primary work location of the candidate as our ranges are location specific. Actual compensation packages are based on a wide array of factors unique to each candidate, including but not limited to skillset, years of experience, and depth of experience. About this Pipeline Role: This is a pipeline posting for potential future openings on our Data Engineering team. While we are not actively hiring for this position at this time, we are always looking to connect with talented Senior Data Engineers who are passionate about building and optimizing large-scale data systems. By joining our talent pipeline, you’ll stay on our radar for future opportunities that align with your skills and interests as they become available. In this role, you would work on improving the core pipelines that power our Identification product and design new processes that enable our data science team to test and deploy new ML/AI models. The product delivered by this team is integrated into the core product stack and is a critical component of Demandbase’s account intelligence platform. If you are an experienced engineer who is passionate about data and eager to make an impact, we’d love to stay connected. What you’ll be doing (in a future role): Lead initiatives to build, expand, and improve real-world entity identification datasets Coordinate with downstream stakeholders with dependencies on identification datasets Design and build new pipelines to increase identification coverage and detect errors Collaborate with a skilled data science team to enable new ML/AI model development Provide insights into optimizing existing pipelines for performance and cost-efficiency Create and document descriptive plans for new feature implementation What we look for: Bachelor’s degree in computer science, engineering, mathematics, or related field 8+ years of relevant experience Progressive experience in the following areas: Object-oriented / strongly typed programming (Scala, Java, etc.) Productionizing and deploying Spark pipelines Complex SQL Apache Airflow or similar orchestration tools Strong SDLC principles (CI/CD, unit testing, Git process, etc.) Solid understanding of AWS services (IAM, EC2, S3) An interest in data science Even Better If You Have: Experience with Python, distributed computing, ad-targeting, or GenAI Background in the ad-tech industry Experience modeling and working with graph-based datasets Interested in joining our pipeline?If this role sounds like a great fit for your background and career goals, we encourage you to join our talent network by submitting your application. We’ll reach out when a relevant opportunity opens up! Benefits: We offer a comprehensive benefits package designed to support your health, well-being, and financial security. Our employees enjoy up to 100% paid premiums for Medical and Vision coverage, ensuring access to top-tier care for you and your loved ones. In addition, we provide a range of mental wellness resources, including access to Modern Health, to help support your emotional well-being. We believe in a healthy work-life harmony, which is why we offer a flexible PTO policy, 15 paid holidays in 2025—including a three-day break around July 4th and a full week off for Thanksgiving—and No Internal Meetings Fridays to give you uninterrupted time to focus on what matters most. For your financial future, we offer a competitive 401(k) plan, short-term and long-term disability coverage, life insurance, and other valuable benefits to ensure your financial peace of mind. Our Commitment to Diversity, Equity, and Inclusion at Demandbase: At Demandbase, we believe in creating a workplace culture that values and celebrates diversity in all its forms. We recognize that everyone brings unique experiences, perspectives, and identities to the table, and we are committed to building a community where everyone feels valued, respected, and supported. Discrimination of any kind is not tolerated, and we strive to ensure that every individual has an equal opportunity to succeed and grow, regardless of their gender identity, sexual orientation, disability, race, ethnicity, background, marital status, genetic information, education level, veteran status, national origin, or any other protected status. We do not automatically disqualify applicants with criminal records and will consider each applicant on a case-by-case basis. We recognize that not all candidates will have every skill or qualification listed in this job description. If you feel you have the level of experience to be successful in the role, we encourage you to apply! We acknowledge that true diversity and inclusion requires ongoing effort, and we are committed to doing the work required to make our workplace a safe and equitable space for all. Join us in building a community where we can learn from each other, celebrate our differences, and work together.      Personal information that you submit will be used by Demandbase for recruiting and other business purposes. Our Privacy Policy explains how we collect and use personal information.
Data Engineer
Data Science & Analytics
Apply
Hidden link
Mirage.jpg

Software Engineer, ML Data Platform

Mirage
USD
0
185000
-
285000
US.svg
United States
Full-time
Remote
false
Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.Our ProductsCaptions Mirage Studio Our TechnologyAI Research @ MirageMirage Model AnnouncementSeeing Voices (white-paper)Press CoverageTechCrunchLenny’s PodcastForbes AI 50Fast CompanyOur InvestorsWe’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) We do not work with third-party recruiting agencies, please do not contact us** About the Role We’re looking for a Software Engineer to help build and scale the data systems that power our machine learning products. This role sits at the intersection of data engineering and ML infrastructure: you’ll design large-scale streaming pipelines, build tools that abstract infrastructure complexity for feature developers, and ensure that our feature data is reliable, discoverable, and performant across online and offline environments. If you’re passionate about building foundational systems that enable machine learning at scale — and love solving complex distributed data problems — this is the role for you.What You’ll DoDesign and scale feature pipelines: Build distributed data processing systems for feature extraction, orchestration, and serving — including real-time streaming, batch ingestion, and CDC workflows.Feature Extraction: Design and implement reliable, reusable feature pipelines for ML models, ensuring features are accurate, scalable, and production-ready through well-designed SDKs and orchestration tools.Build and evolve storage infrastructure: Manage multi-tier data systems (e.g. Bigtable for online features/state, BigQuery for analytics and offline training), including schema evolution, versioning, and compatibility.Own orchestration and reliability: Lead workflow orchestration design (e.g. Pub/Sub, Busboy, Airflow/Temporal), monitoring, and alerting to ensure reliability at 100M+ video scale.Collaborate with ML teams: Partner with ML engineers on feature availability, dataset curation, and streaming pipelines for training and inference.Optimize for performance and cost: Tune GPU utilization, resource allocation, and data processing efficiency to maximize system throughput and minimize cost.Enable analytics and insights: Support downstream analytics and data science workflows by ensuring data accessibility, discoverability, and performance at scale.Preferred Qualifications4+ years building distributed data systems, feature platforms, or ML infrastructure at scale.Strong experience with streaming and batch pipelines (e.g. Pub/Sub, Kafka, Dataflow, Beam, Flink, Spark).Deep knowledge of cloud-native data stores (e.g. Bigtable, BigQuery, DynamoDB, Snowflake) and schema/versioning best practices. Proficiency in Python and experience building developer-facing libraries or SDKs.Experience with Kubernetes, containerized data infrastructure, and workflow orchestration tools (e.g. Airflow, Temporal).Familiarity with ML workflows and feature store design — enough to partner closely with ML teams.Bonus: Experience working with video, audio, or other unstructured media data in a production environment.Benefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite! Grubhub subscriptionHealth & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)Multiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
Anthropic.jpg

Analytics Data Engineer

Anthropic
USD
0
275000
-
355000
US.svg
United States
Full-time
Remote
false
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the Role As an Analytics Engineer, you will be an early member of the Data Science & Analytics team building the foundation to scale analytics across our organization. You will collaborate with key stakeholders in Engineering, Product, GTM and other areas to build scalable solutions to transform data into key metrics reporting and insights. You will be responsible for ensuring teams have access to reliable, accurate metrics that can scale with our company’s growth. You will also lead your own projects to enable self-serve insights to help teams make data-driven decisions.  Responsibilities: Understand the data needs of stakeholder teams in terms of key data models and reporting, and translate that into technical requirements Define, build and manage key data pipelines in dbt that transform raw logs into canonical datasets Establish high data integrity standards and SLAs to ensure timely, accurate delivery of data Develop insightful and reliable dashboards to track performance of core metrics that will deliver insights to the whole company Build foundational data products, dashboards and tools to enable self-serve analytics to scale across the company Influence the future roadmap of Product and GTM teams from a data systems perspective Become an expert in our organization’s data models and the company's data architecture You might be a good fit if you have: 5+ years of experience as an Analytics Data Engineer or similar Data Science & Analytics roles, preferably partnering with GTM and Product leads to build and report on key company-wide metrics. A passion for the company's mission of building helpful, honest, and harmless AI. Expertise in building multi-step ETL jobs through tooling like dbt; proficiency with workflow management platforms like Airflow and version control management tools through GitHub. Expertise in SQL and Python to transform data into accurate, clean data models. Experience building data reporting and dashboarding in visualization tools like Hex to serve multiple cross-functional teams. A bias for action and urgency, not letting perfect be the enemy of the effective. A “full-stack mindset”, not hesitating to do what it takes to solve a problem end-to-end, even if it requires going outside the original job description. Experience building an Analytics Data Engineering (or similar) function at start-ups.  A strong disposition to thrive in ambiguity, taking initiative to create clarity and forward progress. The expected base compensation for this position is below. Our total compensation package for full-time employees includes equity, benefits, and may include incentive compensation.Annual Salary:$275,000—$355,000 USDLogistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices. Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. But if we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this. We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed.  Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team. How we're different We believe that the highest-impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large-scale research efforts. And we value impact — advancing our long-term goals of steerable, trustworthy AI — rather than work on smaller and more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We're an extremely collaborative group, and we host frequent research discussions to ensure that we are pursuing the highest-impact work at any given time. As such, we greatly value communication skills. The easiest way to understand our research directions is to read our recent research. This research continues many of the directions our team worked on prior to Anthropic, including: GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues. Guidance on Candidates' AI Usage: Learn about our policy for using AI in our application process
Data Engineer
Data Science & Analytics
Apply
Hidden link
Anduril Industries.jpg

Analytics Engineer

Anduril
USD
0
146000
-
194000
US.svg
United States
Full-time
Remote
false
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. By bringing the expertise, technology, and business model of the 21st century’s most innovative companies to the defense industry, Anduril is changing how military systems are designed, built and sold. Anduril’s family of systems is powered by Lattice OS, an AI-powered operating system that turns thousands of data streams into a realtime, 3D command and control center. As the world enters an era of strategic competition, Anduril is committed to bringing cutting-edge autonomy, AI, computer vision, sensor fusion, and networking technology to the military in months, not years.ABOUT THE TEAM  We build robots that find other robots and knock them out of the sky. At a time when air superiority can no longer be taken for granted, the Air Defense (AD) Team provides mission critical capabilities to warfighters. From detection to tracking, identification, deterrence, and defeat, our family of networked sensors and effectors enables our customers to rapidly close the kill chain against a broad range of Unmanned Aerial System (UAS) threats. Working across product, engineering, sales, logistics, operations, and mission success, the Air Defense team develops, tests, deploys, and sustains the Anduril Air Defense Family of Systems (FoS) in challenging operational environments worldwide. ABOUT THE ROLE We are looking for a well-rounded Senior Analytics Engineer to support our Air Defense team. In this role, you will design and maintain data systems to ingest and transform structured, semistructured, and unstructured data, creating robust and efficient data models that deliver actionable insights. You will collaborate with stakeholders to gather requirements and implement secure, reliable analytics solutions, focusing on supporting a variety of data applications.  You will work with engineering and field operations teams to better understand how our family of systems and capabilities performs in production (both in and out of classified environments). Additionally, you will support operational teams by building out data products to support operational workflows. Ensuring data quality and performing root cause analysis on complex systems will be key, as will deploying mission-critical analytics solutions with occasional travel to build, test, and deploy capabilities in real-world scenarios. WHAT YOU'LL DO Help lead the development and maintenance of data systems architecture that enable high quality, low latency data ingestion across many different source systems and transformation for downstream data products and operational workflows. Collaborate with stakeholders to collect requirements and write code for secure, timely, accurate, trusted, and extensible data models. Become a trusted partner to AD’s leadership by creating reusable entities that generalize how our division operates and building both deployment- and business-related workflows, dashboards, and metrics that drive better and faster decision-making. Develop and deliver analytics solutions for evolving problems in network-isolated and classified environments. Help empower users to leverage the aforementioned data architecture to self-serve through enablement & knowledge-transfer, as well as improving the developer/end-user experience. REQUIRED QUALIFICATIONS 6+ years of experience in an analytics-focused or data-oriented role (e.g., data engineer, analytics engineer, data scientist, backend software engineer). Exceptional general problem-solving and critical-thinking skills, bringing to bear technical solutions to ambiguous and dynamic data problems. Proficiency designing scalable and adaptable logical and physical data models that facilitate both future product evolution and analytical requirements Strong skills in performing data-driven root cause analysis on complex systems. Excellent written and verbal communication skills, especially when communicating with a technical audience; an affection for documentation. Ability to drive consensus across internal and external stakeholders, with demonstrated experience leading through influence. Comfortable navigating a polyglot ecosystem and a proven ability to quickly understand established code bases and operate at different levels of abstraction. Procedural fluency in writing, debugging, profiling, testing, and maintaining performant Python and SQL code. Experience writing another language (e.g., JavaScript/TypeScript, Go, Java, Scala, Haskell, OCaml, Julia, etc.). Proficiency in building end-to-end, scalable data solutions (including the steps for beyond implementation such as enablement, support, integrations, etc.) in a cloud setting. Experience writing pipelines with common orchestration tooling (e.g., Flyte, Airflow, Dagster, etc.). Experience with DevOps and software deployment: containerization & container orchestration (e.g., Docker, Kubernetes, Helm, etc.), GitOps & CI/CD tooling (e.g., CircleCI, ArgoCD, etc.), observability/monitoring (e.g., DataDog, Grafana, etc.) Experience with infrastructure-as-code (e.g., Terraform), core- and data-related cloud services (e.g., AWS, Azure). Experience writing and working with microservices architectures (e.g., gRPC + protocol buffers). Current US Person status (U.S. citizen or permanent resident) with the ability to obtain and maintain a U.S. Department of Defense (DOD) Secret Clearance or higher. PREFERRED QUALIFICATIONS Ability to comprehend and appropriately modify software written in a systems or lower-level language such as C/C++, Zig, Rust, etc. Experience with setting up and managing infrastructure to support analytical workloads on-premises or in resource-constrained environments. Experience delivering and maintaining systems that securely egress data from air-gapped and security-hardened networks. Experience working with data formats (e.g. MCAP, HDF5, etc.) relatively common to robotics. Experience with dbt, Palantir Foundry, Trino/Presto, Apache Spark, Apache Kafka, Apache Flink, and/or in-memory databases (e.g., DuckDB, Polars). Experience with the Nix (dependency management & system configuration) ecosystem. Strong Linux fundamentals. Exposure to the the technical, programmatic, and operational challenges of developing and deploying autonomous weapon systems across command echelons. Deep intellectual interest in the intersection of analytics and the physical hardware world, motivated by Anduril’s mission. Prior defense, aerospace, or intelligence domain experience. US Salary Range$146,000—$194,000 USD  The salary range for this role is an estimate based on a wide range of compensation factors, inclusive of base salary only. Actual salary offer may vary based on (but not limited to) work experience, education and/or training, critical skills, and/or business considerations. Highly competitive equity grants are included in the majority of full time offers; and are considered part of Anduril's total compensation package. Additionally, Anduril offers top-tier benefits for full-time employees, including:  Healthcare Benefits  US Roles: Comprehensive medical, dental, and vision plans at little to no cost to you.  UK & AUS Roles: We cover full cost of medical insurance premiums for you and your dependents.  IE Roles: We offer an annual contribution toward your private health insurance for you and your dependents.  Additional Benefits  Income Protection: Anduril covers life and disability insurance for all employees.  Generous time off: Highly competitive PTO plans with a holiday hiatus in December. Caregiver & Wellness Leave is available to care for family members, bond with a new baby, or address your own medical needs.  Family Planning & Parenting Support: Coverage for fertility treatments (e.g., IVF, preservation), adoption, and gestational carriers, along with resources to support you and your partner from planning to parenting.  Mental Health Resources: Access free mental health resources 24/7, including therapy and life coaching. Additional work-life services, such as legal and financial support, are also available.  Professional Development: Annual reimbursement for professional development  Commuter Benefits: Company-funded commuter benefits based on your region.  Relocation Assistance: Available depending on role eligibility.  Retirement Savings Plan  US Roles: Traditional 401(k), Roth, and after-tax (mega backdoor Roth) options.  UK & IE Roles: Pension plan with employer match.  AUS Roles: Superannuation plan.  The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.  To view Anduril's candidate data privacy policy, please visit https://anduril.com/applicant-privacy-notice/. 
Data Engineer
Data Science & Analytics
Apply
Hidden link
Databricks.jpg

Big Data Solutions Consultant, Spark Expert

Databricks
0
0
-
0
NL.svg
Netherlands
Full-time
Remote
false
CSQ127R55 As a Big Data Solutions Consultant (Resident Solutions Architect) in our Professional Services team you will work with clients on short to medium term customer engagements on their big data challenges using the Databricks platform. You will provide data engineering, data science, and cloud technology projects which require integrating with client systems, training, and other technical tasks to help customers to get most value out of their data. RSAs are billable and know how to complete projects according to specification with excellent customer service. You will report to the regional Manager/Lead. The impact you will have: You will guide customers as they implement transformational big data projects, including end-to-end development and deployment of industry-leading big data and AI applications You will assure that Databricks best practices are being used within all projects and that our quality of service and implementation is strictly followed You will facilitate technical workshops, discovery and design sessions, customer requirements gathering and scoping for new and existing strategic customers Assist the Professional Services leader and project managers with level of effort estimation and mitigation of risk within customer proposals and statements of work Architect, design, develop, deploy, operationalize and document complex customer engagements individually or as part of an extended team as the technical lead and overall authority Knowledge transfer, enablement and mentoring of other team members, customers and partners, including developing reusable project artifacts Provide experience to the consulting team, provide best practices in client engagement to other teams What we look for: 4+ years experience in data engineering, data platforms & analytics Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) Comfort with object-oriented and functional programming in Scala and Python Experience in building scalable streaming and batch solutions using cloud-native components Strong knowledge of distributed computing with Apache Spark™ Travel to customers 30% of the time Nice to have: Databricks Certification About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook. Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.  Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
MagicSchool AI.jpg

Senior Database Engineer

MagicSchool AI
USD
0
160000
-
205000
US.svg
United States
Full-time
Remote
true
WHO WE ARE: MagicSchool is the premier generative AI platform for teachers. We're just over 2 years old, and more than 6 million teachers from all over the world have joined our platform. Join a top team at a fast growing company that is working towards real social impact. Make an account and try us out at our website and connect with our passionate community on our Wall of Love.ResponsibilitiesApplication Database DesignPartner with application development teams to design optimal Postgres data models that support new features and scale with business growthReview and optimize Postgres data structures and query patterns to ensure efficient performanceCollaborate on API design decisions that impact database structure and query efficiencyDatabase OperationsAnalyze database query load to find opportunities for efficiency gains, and implementing them practivelyImplement and maintain database observability and monitoring systemsManage database upgrades, backup strategies, and read replica configurationsTroubleshoot and resolve production database incidents as part of support rotationMaintain and enhance adherence to devops principles in database operations.Data Integrity & User PrivacyPlan, review and execute application data migrations, collaborating with product development teams to coordinate releasesImplement and execute efficient large scale data migrationsDesign and implement data security / privacy toolingData PlatformCollaborate with development and platform team to architect large scale new data persistence systems and migrate production traffic to themDevelop and promote standards and best practices of database design and developmentQualifications5+ years of hands-on experience operating PostgreSQL in production environments3+ years of hands-on experience asynchronously collaborating with development teams, either in application development or infrastructure as code.Strong SQL skills including complex queries, performance tuning, and data modelingExperience with database migrations, schema evolution, and zero-downtime deployment strategiesProficiency with database monitoring, performance analysis, and troubleshootingUnderstanding of database internals: indexing strategies, query planning, and execution optimizationExperience working closely with application developers and understanding application data access patternsStrong communication and collaboration skillsComfort working in a fast paced growth-staged companyAdditional Valued / Nice to Have Experience:Backend or full-stack engineering experience in a Typescript or Next.js environmentPython application, data pipeline, or data warehouse engineeringExperienced with infrastructure-as-code / devops practices and environments.Experience with non-relational database usage in a production environmentFamiliarity with the Supabase platformWhy Join Us?Work on cutting-edge AI technology that directly impacts educators and students.Join a mission-driven team passionate about making education more efficient and equitable.Flexibility of working from home, while fostering a unique culture built on relationships, trust, communication, and collaboration with our team - no matter where they live.Unlimited time off to empower our employees to manage their work-life balance. We work hard for our teachers and users, and encourage our employees to rest and take the time they need.Choice of employer-paid health insurance plans so that you can take care of yourself and your family. Dental and vision are also offered at very low premiums.Every employee is offered generous stock options, vested over 4 years.Plus a 401k match & monthly wellness stipendOur Values:Educators are Magic:  Educators are the most important ingredient in the educational process - they are the magic, not the AI. Trust them, empower them, and put them at the center of leading change in service of students and families.Joy and Magic: Bring joy and magic into every learning experience - push the boundaries of what’s possible with AI.Community:  Foster community that supports one another during a time of rapid technological change. Listen to them and serve their needs.Innovation:  The education system is outdated and in need of innovation and change - AI is an opportunity to bring equity, access, and serve the individual needs of students better than we ever have before.Responsibility: Put responsibility and safety at the forefront of the technological change that AI is bringing to education.Diversity: Diversity of thought, perspectives, and backgrounds helps us serve the wide audience of educators and students around the world.Excellence:  Educators and students deserve the best - and we strive for the highest quality in everything we do.
Data Engineer
Data Science & Analytics
Apply
Hidden link
PathAI.jpg

Database Specialist - Contract

PathAI
-
US.svg
United States
Contractor
Remote
true
Who We Are PathAI's mission is to improve patient outcomes with AI-powered pathology. Our platform promises substantial improvements to the accuracy of diagnosis and the efficacy of treatment of diseases like cancer, leveraging modern approaches in machine learning. Our team, comprising diverse employees with a wide range of backgrounds and experiences, is passionate about solving challenging problems and making a huge impact. We are seeking an experienced Contract Database / Data Warehouse Specialist to enhance the scalability, performance, and maintainability of our ML data infrastructure. The ideal candidate will bring expertise in relational databases, ETL processes, and modern big data deployments. You will work closely with our MLOps and ML engineering teams to optimize storage usage, modernize ETL pipelines, deploy new technology, and/or build / enhance tools that support analytics and machine learning workflows. Contract Duration: Minimum 6 months Location: Remote (U.S.) What You’ll Do Analyze and optimize storage strategies for ML experiment data and metadata. Design and implement intelligent retention and expiration for large-scale datasets. Modernize and refactor ETL pipelines to improve scalability and ease of maintenance. Build and enhance database-backed applications supporting ML R&D and production analytics. Collaborate with ML engineers, SREs, and platform teams. Provide knowledge transfer for long-term maintainers. What You’ll Need Proven expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning. Strong experience with ETL development and cloud data warehousing (e.g., Snowflake, Redshift). Familiarity with big data deployments and scalable architectures such as Spark and Hive. Experience with Apache Airflow for systems automation. Proficiency in Python for application development, data processing and automation. Understanding of S3-based storage and large-scale data management strategies. Ability to write clear technical documentation and collaborate effectively across teams. Experience with query optimization, data partitioning strategies, and cost optimization in cloud environments Nice to Have Background in machine learning data pipelines or analytics-heavy environments. Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments. We Want to Hear From You At PathAI, we are looking for individuals who are team players, are willing to do the work no matter how big or small it may be, and who are passionate about everything they do. If this sounds like you, even if you may not match the job description to a tee, we encourage you to apply. You could be exactly what we're looking for.  PathAI is an equal opportunity employer, dedicated to creating a workplace that is free of harassment and discrimination. We base our employment decisions on business needs, job requirements, and qualifications — that's all. We do not discriminate based on race, gender, religion, health, personal beliefs, age, family or parental status, or any other status. We don't tolerate any kind of discrimination or bias, and we are looking for teammates who feel the same way.  #LI-Remote
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
X.jpg

AI Healthcare and Administration Tutor

X AI
USD
45
-
100
US.svg
United States
Contractor
Remote
false
About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.About the Role We are seeking a dedicated AI Healthcare and Administration Data Specialist to enhance xAI’s AI models by providing high-quality data annotations and inputs tailored to healthcare and administration contexts. In this role, you will leverage your expertise in patient care coordination, medical billing, administrative workflows, and healthcare operations to support the training of AI systems. You will collaborate with technical teams to refine annotation tools and curate impactful data, ensuring our models effectively capture real-world healthcare and administrative dynamics. This role requires adaptability, strong analytical skills, and a passion for driving innovation in a fast-paced environment. Responsibilities Utilize proprietary software to provide accurate input and labels for healthcare and administration projects, ensuring high-quality data for AI model training. Deliver curated, high-quality data for scenarios involving patient care coordination, medical billing, administrative workflows, and healthcare operations. Collaborate with technical staff to support the training of new AI tasks and contribute to the development of innovative technologies. Assist in designing and improving efficient annotation tools tailored for healthcare and administration data. Select and analyze complex problems in healthcare and administration fields aligned with your expertise to enhance AI model performance. Interpret, analyze, and execute tasks based on evolving instructions, maintaining precision and adaptability. Required Qualifications Professional experience in healthcare administration or related fields (e.g., medical and health services manager, medical secretary, or administrative assistant). Proficiency in reading and writing informal and professional English. Strong communication, interpersonal, analytical, and organizational skills. Excellent reading comprehension and ability to exercise autonomous judgment with limited data. Passion for technological advancements and innovation in healthcare and administration processes. Preferred Qualifications Relevant certification or training (e.g., Certified Medical Manager, Certified Professional in Healthcare Management, or similar administrative certification). Experience mentoring or training others in healthcare administration or operational practices. Comfort with recording audio or video sessions for data collection. Familiarity with AI or data annotation workflows in a technical setting. Location, Hourly & Other Expectations This position is based in Palo Alto, CA (in-office, 5 days per week) or fully remote with strong self-motivation required. US-based candidates cannot be hired in Wyoming or Illinois at this time. Visa sponsorship is not available. Team members are expected to work from 9:00am - 5:30pm PST for the first two weeks of training, and 9:00am - 5:30pm in their own timezone thereafter. For remote work, candidates must use a Chromebook, Mac with macOS 11.0 or later, or Windows 10 or later, and have reliable access to a smartphone. Interview Process After submitting your application, our team will review your CV and statement of exceptional work. If your application advances, you will be invited to a 15-minute phone interview to discuss basic qualifications. Successful candidates will proceed to the main process, which includes: Technical deep-dive: Discussing your healthcare and administration expertise and data annotation experience. A take-home challenge focused on healthcare or administration data labeling or analysis. A meet-and-greet with the wider team. Our goal is to complete the main interview process within one week. Compensation $45/hour - $100/hour The posted pay range is intended for U.S.-based candidates and depends on factors including relevant experience, skills, education, geographic location, and qualifications. For international candidates, our recruiting team can provide an estimated pay range for your location. Benefits Hourly pay is just one part of our total rewards package at xAI. Specific benefits vary by country, depending on your country of residence you may have access to medical benefits. We do not offer benefits for part-time roles.xAI is an equal opportunity employer. California Consumer Privacy Act (CCPA) Notice
Data Engineer
Data Science & Analytics
Apply
Hidden link
Sanity.jpg

Head of Data & Analytics

Sanity
USD
0
280000
-
330000
US.svg
United States
CA.svg
Canada
earth.svg
Europe
Full-time
Remote
true
Build and lead our data function as we scale, turning insights into strategic decisions that drive our next phase of growth.At Sanity.io, we’re building the future of AI powered Content Operations. Our AI Content Operating System gives teams the freedom to model, create and automate content the way their business works. Accelerating digital development and super charging content operations efficiency. Companies like SKIMS, Figma, Riot Games, Anthropic, COMPLEX, Nordstrom and Morningbrew are using Sanity to power and automate their content operations.As our Head of Data & Analytics, you'll build and lead our data function during a pivotal growth phase. You'll be both player and coach – rolling up your sleeves for hands-on technical work while scaling a team of data engineers, analytics engineers, and data analysts. This role will shape how we leverage data to drive strategic decisions and accelerate our next phase of growth.What you would do:Build and scale the data teamRecruit and manage data engineers, analytics engineers, and data analystsFoster a high-performing, collaborative culture focused on delivering actionable insightsBalance hands-on technical work with strategic leadership and team developmentEstablish scalable processes and best practices as the team growsDrive data strategy and infrastructureOwn our data strategy and ensure alignment with business objectivesDesign scalable data pipelines, architecture, and governance using modern tools like BigQuery, dbt, Airflow, Amplitude and LookerPartner with engineering to enhance product telemetry and data collectionImplement data quality frameworks and monitoring systemsEnable data-driven decision makingWork with product, sales, marketing, and leadership teams to deliver insights that drive business outcomesBuild dashboards and self-service analytics capabilitiesLead strategic analyses on customer behavior, product adoption, and business performanceTranslate complex data into clear recommendations for technical and non-technical stakeholdersAbout you:Based in: East Coast of North America or EuropeExperience: 5+ years in data roles with 3+ years building and scaling data teams at a fast-growing SaaS startupLeadership: Drive to create change and ability to define strategy to develop Data and Analytics requirements and priorities. Proven ability to executeBusiness acumen: Deep understanding of SaaS metrics and how data drives business strategyCommunication: Outstanding ability to influence stakeholders and translate technical concepts into business insightsOwnership mindset: High accountability and sense of urgency with bias toward action and problem-solving – able to do IC work when neededTechnical expertise: Strong in SQL, Python, data modeling, and modern data stack (we use BigQuery, dbt, Airflow, Looker)What we can offer:A highly-skilled, inspiring, and supportive teamPositive, flexible, and trust-based work environment that encourages long-term professional and personal growthA global, multi-culturally diverse group of colleagues and customersComprehensive health plans and perksA healthy work-life balance that accommodates individual and family needsCompetitive stock options programSalary Range: $280k - $330k annually. Final compensation within this range will be determined based on the candidate’s experience and skill set.Who we are:Sanity.io is a modern, flexible content operating system that replaces rigid legacy content management systems. One of our big differentiators is treating content as data so that it can be stored in a single source of truth, but seamlessly adapted and personalized for any channel without extra effort. Forward-thinking companies choose Sanity because they can create tailored content authoring experiences, customized workflows, and content models that reflect their business.Sanity recently raised a $85m Series C led by GP Bullhound and is also backed by leading investors like ICONIQ Growth, Threshold Ventures, Heavybit and Shopify, as well as founders of companies like Vercel, WPEngine, Twitter, Mux, Netlify and Heroku. This funding round has put Sanity in a strong position for accelerated growth in the coming years.You can only build a great company with a great culture. Sanity is a 200+ person company with highly committed and ambitious people. We are pioneers, we exist for our customers, we are hel ved, and we love type two fun! Read more about our values here!Sanity.io pledges to be an organization that reflects the globally diverse audience that our product serves. We believe that in addition to hiring the best talent, a diversity of perspectives, ideas, and cultures leads to the creation of better products and services. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, or gender identity.
Data Engineer
Data Science & Analytics
Data Analyst
Data Science & Analytics
Apply
Hidden link
Jasper.jpg

Data Engineer

Jasper
-
FR.svg
France
Full-time
Remote
false
Jasper is the leading AI marketing platform, enabling the world's most innovative companies to reimagine their end-to-end marketing workflows and drive higher ROI through increased brand consistency, efficiency, and personalization at scale. Jasper has been recognized as "one of the Top 15 Most Innovative AI Companies of 2024" by Fast Company and is trusted by nearly 20% of the Fortune 500 – including Prudential, Ulta Beauty, and Wayfair. Founded in 2021, Jasper is a remote-first organization with team members across the US, France, and Australia.About The RoleJasper Research is seeking an experienced Data Engineer who will play a pivotal role in supporting our image research team to help design, scale, and maintain our data infrastructure, as well as data processing pipelines powering the training of state-of-the-art multimodal models.In this role, you will work closely with our research scientists and research engineers to collect, clean, and process large-scale datasets from a variety of sources, ensuring that our models are built on the best possible data foundations.This role is open to candidates located in France. It will be a hybrid setup, which requires you to come into the office when necessary. The office is based at Station F in Paris, the vibrant hub of the French startup ecosystem. Our efficient and lean team at Station F thrives on innovation and collaboration.What you will do at JasperDesign and implement end-to-end scalable data pipelines to ingest, transform, and load data into our data warehouse.Analyze existing datasets and implement robust data validation, deduplication, and bias mitigation processes to ensure the highest quality and diversity of training data.Create training sets from existing data, using classical computer vision algorithms, vision models and LLMs.Optimize data loading, preprocessing, and augmentation workflows to eliminate bottlenecks and maximize training efficiency.Document all data processes, schemas, and transformations to ensure full reproducibility and transparency for the research team.Work hand-in-hand with research scientists and engineers to understand their data needs, provide actionable insights, and rapidly iterate on pipeline improvements.Source new multi-modal data from public sources.What you will bring to JasperBachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.Strong experience as a Data Engineer or in a similar data-focused role.Strong experience in image manipulation at scale and understanding of computer vision.Hands-on experience with distributed computing frameworks and cloud platforms for distributed ML training.Familiarity with cloud-based data warehousing and storage solutions (e.g., BigQuery).Strong attention to detail, commitment to data quality, and a proactive approach to supporting research needs.Preferred QualificationsKnowledge of data transformation and enrichment techniques, including clustering, deduplication, and synthetic data generationExperience with vector databases for ML dataProficiency in Python and SQL for data manipulation and analysis.Proficiency in at least one ML library (TensorFlow, PyTorch, Jax). PyTorch preferred.Contributions to open-source data tools or projects.Familiarity with data privacy and compliance regulations (GDPR, CCPA).Benefits & PerksMutuelle coverage for hospitalisation and mental health care provided through Alan Comprehensive healthcare planFlexible PTO with a FlexExperience budget (€552 annually) to help you make the most of your time away from workFlexWellness program (€1,640 annually) to help support your personal health goalsGenerous budget for home office set up €1,375 annual learning and development stipend
Data Engineer
Data Science & Analytics
Apply
Hidden link
Galileo Group Air Ops.jpg

Lead Data Engineer

Air Ops
-
US.svg
United States
Full-time
Remote
false
About AirOpsToday thousands of leading brands and agencies use AirOps to win the battle for attention with content that both humans and agents love.We’re building the platform and profession that will empower a million marketers to become modern leaders — not spectators — as AI reshapes how brands reach their audiences.We’re backed by awesome investors, including Unusual Ventures, Wing VC, Founder Collective, XFund, Village Global, and Alt Capital, and we’re building a world-class team with in-person hubs in San Francisco, New York, and Montevideo, Uruguay.About the RoleAs Lead Data Engineer, you will own and scale the data platform that powers AirOps insights on AI search visibility and content performance. You will set technical direction, write production code, and build a small, high-output team that turns raw web, content, and AI agent data into trustworthy datasets. Your work will drive customer-facing analytics and product features while giving our content and growth teams a clear path from strategy to execution. You value extreme ownership, sweat the details on data quality, and love partnering across functions to ship fast without losing rigor.Key ResponsibilitiesData platform ownership: design, build, and operate batch and streaming pipelines that ingest data from crawlers, partner APIs, product analytics, and CRM.Core modeling: define and maintain company-wide models for content entities, search queries, rankings, AI agent answers, engagement, and revenue attribution.Orchestration and CI: implement workflow management with Airflow or Prefect, dbt-based transformations, version control, and automated testing.Data quality and observability: set SLAs, add tests and data contracts, monitor lineage and freshness, and lead root cause analysis.Warehouse and storage: run Snowflake or BigQuery and Postgres with strong performance, cost management, and partitioning strategies.Semantic layer and metrics: deliver clear, documented metrics datasets that power dashboards, experiments, and product activation.Product and customer impact: partner with Product and Customer teams to define tracking plans and measure content impact across on-site and off-site channels.Tooling and vendors: evaluate, select, and integrate the right tools for ingestion, enrichment, observability, and reverse ETL.Team leadership: hire, mentor, and level up data and analytics engineers; establish code standards, review practices, and runbooks.Qualifications5+ years in data engineering with 2+ years leading projectsExpert SQL and Python with deep experience building production pipelines at scaleHands-on with dbt and a workflow manager such as Airflow or PrefectStrong background in dimensional and event-driven modeling and a company-wide metrics layerExperience with Snowflake or BigQuery, plus Postgres for transactional use casesTrack record building data products for analytics and customer reportingCloud experience on AWS or GCP and infrastructure as code such as TerraformDomain experience in SEO, content analytics, or growth experimentation is a plusClear communicator with a bias for action, curiosity, and a high bar for qualityOur Guiding PrinciplesExtreme OwnershipQualityCuriosity and PlayMake Our Customers HeroesRespectful CandorBenefitsEquity in a fast-growing startupCompetitive benefits package tailored to your locationFlexible time off policyGenerous parental leaveA fun-loving and (just a bit) nerdy team that loves to move fast!
Data Engineer
Data Science & Analytics
Apply
Hidden link
No job found
There is no job in this category at the moment. Please try again later