Top Data Engineer Jobs Openings in 2025

Looking for opportunities in Data Engineer? This curated list features the latest Data Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.

meetgranola_logo

Analytics Engineer

Granola
GBP
0
100000
-
130000
GB.svg
United Kingdom
Full-time
Remote
false
Hey! We're team Granola If you haven't already, you should check out what we're building, and why you should work here.We’re looking for a Analytics Engineer who loves turning raw data into powerful insights and predictive models. You’ll work closely with product, engineering, and ops to help shape our roadmap — from hypotheses, to experiments, to scalable models in production. You'll have a big impact on how we unlock value from our data, and scale Granola’s analytics and intelligence capability as we 100× in users.In this role, you will:Own Granola's data infrastructureMaintain and improve pipelines between our data warehouse and various data sources, including our production database, Amplitude/Posthog, and other SaaS tools we use, and build new pipelines.Build improvements to our data transformation process using SQL and dbt, optimising for performance and ease of querying.Make architecture decisions regarding data infrastructure.Define metrics and dashboards to track business health and model performance.Continuously evaluate and iterate on models in production — improving, retraining, and optimising.Mentor others in data thinking, share knowledge, and help elevate our analytical culture.Your background looks something like:Several years doing applied data science / machine learning in a product environment (consumer SaaS, fintech, marketplace or similar).Strong foundation in statistics, experimentation, causal inference, and common ML techniques (regression, tree ensembles, clustering, time series).Hands-on experience with ML frameworks and tools (e.g. Python, scikit-learn, PyTorch or TensorFlow, MLflow or equivalent).Experience deploying models (e.g. via REST endpoints, batch jobs, feature store) and monitoring them.Comfortable working with data pipelines, SQL, data engineering, ETL, handling large datasets.Strong communication skills — able to explain technical insights to non-technical stakeholders.Bonus: experience with reinforcement learning, LLMs or generative AI, real-time streaming, or causal modelling.As a person, you…Are first and foremost a builder — you want to go from zero to one, and ship things that matter.Thrive in working in person from our London office (most of the time).Love the challenges (and messiness) of a growing startup environment — you either have startup experience or are drawn to that kind of fast feedback loop.Value working with people who are kind, ambitious, and pragmatic.Are excited about AI/ML, and driven by turning data into impact (direct experience not strictly required in every domain).Enjoy ambiguity, shifting priorities, and iterating quickly.About the opportunity We are living in the most exciting time for tool builders since Engelbart's demo in 1968. We want to assemble the best crew to build this future together, here in London. Our compensation philosophy is to pay slightly above market on salary and above market on equity.We do our best work in person, and so our team spends time together five days per week in our new, bright, and spacious office at Old Street. We are happy to offer relocation assistance to candidates who'll be moving to London to join us.Lastly, we think amazing talent comes from all kinds of life journeys and experiences. If what is written above speaks to you, whether you look like a fit on paper or not, please reach out.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
galileo_group_inc_logo

Lead Data Engineer

Air Ops
-
US.svg
United States
Full-time
Remote
false
About AirOpsToday thousands of leading brands and agencies use AirOps to win the battle for attention with content that both humans and agents love.We’re building the platform and profession that will empower a million marketers to become modern leaders — not spectators — as AI reshapes how brands reach their audiences.We’re backed by awesome investors, including Unusual Ventures, Wing VC, Founder Collective, XFund, Village Global, and Alt Capital, and we’re building a world-class team with in-person hubs in San Francisco, New York, and Montevideo, Uruguay.About the RoleAs Lead Data Engineer, you will own and scale the data platform that powers AirOps insights on AI search visibility and content performance. You will set technical direction, write production code, and build a small, high-output team that turns raw web, content, and AI agent data into trustworthy datasets. Your work will drive customer-facing analytics and product features while giving our content and growth teams a clear path from strategy to execution. You value extreme ownership, sweat the details on data quality, and love partnering across functions to ship fast without losing rigor.Key ResponsibilitiesData platform ownership: design, build, and operate batch and streaming pipelines that ingest data from crawlers, partner APIs, product analytics, and CRM.Core modeling: define and maintain company-wide models for content entities, search queries, rankings, AI agent answers, engagement, and revenue attribution.Orchestration and CI: implement workflow management with Airflow or Prefect, dbt-based transformations, version control, and automated testing.Data quality and observability: set SLAs, add tests and data contracts, monitor lineage and freshness, and lead root cause analysis.Warehouse and storage: run Snowflake or BigQuery and Postgres with strong performance, cost management, and partitioning strategies.Semantic layer and metrics: deliver clear, documented metrics datasets that power dashboards, experiments, and product activation.Product and customer impact: partner with Product and Customer teams to define tracking plans and measure content impact across on-site and off-site channels.Tooling and vendors: evaluate, select, and integrate the right tools for ingestion, enrichment, observability, and reverse ETL.Team leadership: hire, mentor, and level up data and analytics engineers; establish code standards, review practices, and runbooks.Qualifications5+ years in data engineering with 2+ years leading projectsExpert SQL and Python with deep experience building production pipelines at scaleHands-on with dbt and a workflow manager such as Airflow or PrefectStrong background in dimensional and event-driven modeling and a company-wide metrics layerExperience with Snowflake or BigQuery, plus Postgres for transactional use casesTrack record building data products for analytics and customer reportingCloud experience on AWS or GCP and infrastructure as code such as TerraformDomain experience in SEO, content analytics, or growth experimentation is a plusClear communicator with a bias for action, curiosity, and a high bar for qualityOur Guiding PrinciplesExtreme OwnershipQualityCuriosity and PlayMake Our Customers HeroesRespectful CandorBenefitsEquity in a fast-growing startupCompetitive benefits package tailored to your locationFlexible time off policyGenerous parental leaveA fun-loving and (just a bit) nerdy team that loves to move fast!
Data Engineer
Data Science & Analytics
Apply
Hidden link
elicit_research_logo

Founding Data Engineer

Elicit
USD
0
185000
-
305000
US.svg
United States
Full-time
Remote
false
About ElicitElicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.What we're aiming for:Elicit radically increases the amount of good reasoning in the world.For experts, Elicit pushes the frontier forward.For non-experts, Elicit makes good reasoning more accessible. People who don't have the tools, expertise, time, or mental energy to make carefully-reasoned decisions on their own can do so with Elicit.Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.Why we're hiring for this roleTwo main reasons:Currently, Elicit operates over academic papers and clinical trials. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as they're published, combining different data sources and ingestion methods. Once that's done there is a growing list of other document types and sources we'd love to integrate!One of our main initiatives is to broaden the sorts of tasks you can complete in Elicit. We need a data engineer to figure out the best way to ingest massive amounts of heterogeneous data in such a way as to make it usable by LLMs. We need your help to integrate into our customers' custom data providers to that they can create task-specific workflows over them.In general, we're looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.Our tech stackData pipeline: Python, Flyte, SparkProbably less relevant to you, but ICOI:Backend: Node and Python, event sourcingFrontend: Next.js, TypeScript, and TailwindWe like static type checking in Python and TypeScript!All infrastructure runs in Kubernetes across a couple of cloudsWe use GitHub for code reviews and CIWe deploy using the gitops pattern (i.e. deploys are defined and tracked by diffs in our k8s manifests)Am I a good fit?Consider the questions:How would you optimize a Spark job that's processing a large amount of data but running slowly?What are the differences between RDD, DataFrame, and Dataset in Spark? When would you use each?How does data partitioning work in distributed systems, and why is it important?How would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?If you have a solid answer for these—without reference to documentation—then we should chat!Location and travelWe have a lovely office in Oakland, CA; there are people there every day but we don't all work from there all the time. It's important to us to spend time with our teammates, however, so we ask that all Elicians spend about 1 week out of every 6 with teammates.We wrote up more details on this page.What you'll bring to the role5+ years of experience as a data engineer: owning make-or-break decisions about how to ingest, manage, and use dataStrong proficiency in Python (5+ years experience)You have created and owned a data platform at rapidly-growing startups—gathering needs from colleagues, planning an architecture, deploying the infrastructure, and implementing the toolingExperience with architecting and optimizing large data pipelines, ideally with particular experience with Spark; ideally these are pipelines which directly support user-facing features (rather than internal BI, for example)Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approachesExperience with columnar data storage formats like ParquetStrong opinions, weakly-held about approaches to data quality managementCreative and user-centric problem-solvingYou should be excited to play a key role in shipping new features to users—not just building out a data platform!Nice to HaveExperience in developing deduplication processes for large datasetsHands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)Familiarity with machine learning concepts and their application in search technologiesExperience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)Experience in science and academia: familiarity with academic publications, and the ability to accurately model the needs of our usersHands-on experience with industry standard tools like Airflow, DBT, or HadoopHands-on experience with standard paradigms like data lake, data warehouse, or lakehouseWhat you'll doYou'll own:Building and optimizing our academic research paper pipelineYou'll architect and implement robust, scalable systems to handle data ingestion while maintaining high performance and quality.You'll work on efficiently deduplicating hundreds of millions of research papers, and calculating embeddings.Your goal will be to make Elicit the most complete and up-to-date database of scholarly sources.Expanding the datasets Elicit works overOur users want Elicit to work over court documents, SEC filings, … your job will be to figure out how to ingest and index a rapidly increasing ontology of documents.We also want to support less structured documents, spreadsheets, presentations, all the way up to rich media like audio and video.Larger customers often want for us to integrate private data into Elicit for their organisation to use. We'll look to you to define and build a secure, reliable, fast, and auditable approach to these data connectors.Data for our ML systemsYou'll figure out the best way to preprocess all these data mentioned above to make them useful to models.We often need datasets for our model fine-tuning. You'll work with our ML engineers and evaluation experts to find, gather, version, and apply these datasets in training runs.Your first week:Start building foundational contextGet to know your team, our stack (including Python, Flyte, and Spark), and the product roadmap.Familiarize yourself with our current data pipeline architecture and identify areas for potential improvement.Make your first contribution to ElicitComplete your first Linear issue related to our data pipeline or academic paper processing.Have a PR merged into our monorepo, demonstrating your understanding of our development workflow.Gain understanding of our CI/CD pipeline, monitoring, and logging tools specific to our data infrastructure.Your first month:You'll complete your first multi-issue projectTackle a significant data pipeline optimization or enhancement project.Collaborate with the team to implement improvements in our academic paper processing workflow.You're actively improving the teamContribute to regular team meetings and hack days, sharing insights from your data engineering expertise.Add documentation or diagrams explaining our data pipeline architecture and best practices.Suggest improvements to our data processing and storage methodologies.Your first quarter:You're flying soloIndependently implement significant enhancements to our data pipeline, improving efficiency and scalability.Make impactful decisions regarding our data architecture and processing strategies.You've developed an area of expertiseBecome the go-to resource for questions related to our academic paper processing pipeline and data infrastructure.Lead discussions on optimizing our data storage and retrieval processes for academic literature.You actively research and improve the productPropose and scope improvements to make Elicit more comprehensive and up-to-date in terms of scholarly sources.Identify and implement technical improvements to surpass competitors like Google Scholar in terms of coverage and data quality.Compensation, benefits, and perksIn addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location):Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking eventsFully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your familyFlexible vacation policy, with a minimum recommendation of 20 days/year + company holidays401K with a 6% employer matchA new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and eventsA team administrative assistant who can help you with personal and work tasksYou can find more reasons to work with us in this thread!For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:Senior (L4): $185-270k + equityExpert (L5): $215-305k + equityPrincipal (L6): >$260 + significant equityWe're optimizing for a hire who can contribute at a L4/senior-level or above.We also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms (10-year exercise periods).
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

데이터 엔지니어 (Data Engineer)

Bjak
-
KR.svg
South Korea
Full-time
Remote
false
언어 모델을 현실 세계의 애플리케이션으로 전환하기우리는 전 세계 사용자를 위한 AI 시스템을 구축하고 있습니다. 지금은 AI 전환의 시대이며, 이 새로운 프로젝트 팀은 더 큰 현실 세계의 영향력과 글로벌 수준의 활용도를 실현하는 애플리케이션 개발에 집중하고 있습니다.이 포지션은 글로벌 역할이며, 유연한 원격 근무와 본사 오피스 협업을 결합한 하이브리드 근무 방식을 채택합니다. 제품, 엔지니어링, 운영, 인프라, 데이터 등 각 지역 팀과 긴밀히 협력하여 영향력 있는 AI 솔루션을 개발하고 확장하게 됩니다.이 역할이 중요한 이유당신은 최신 모델을 파인튜닝하고, 평가 프레임워크를 설계하며, AI 기능을 실제 서비스 환경에 적용하게 됩니다. 당신의 업무는 모델이 단순히 지능적일 뿐만 아니라, 안전하고, 신뢰할 수 있으며, 대규모로 효과적인 성과를 낼 수 있도록 보장합니다.주요 업무대규모 모델 파인튜닝을 위해 사용자 생성 텍스트 및 이미지 데이터를 수집·정제·전처리크라우드소싱 및 사내 라벨링 팀을 활용하여 확장 가능한 데이터 라벨링 파이프라인 설계 및 관리콘텐츠 모더레이션(예: 안전 vs 비안전 콘텐츠)을 위한 자동화 데이터셋 구축 및 유지연구원 및 엔지니어와 협력하여 데이터셋의 품질, 다양성을 확보하고 모델 학습 요구와 일치하도록 관리이런 분을 찾습니다주도성과 독립성을 중시하는 분“명확함은 실행에서 나온다”는 믿음을 가지고, 완벽한 계획을 기다리기보다 프로토타입 제작·테스트·반복을 통해 실행하는 분스타트업 환경의 혼란 속에서도 침착하고 효과적으로 일할 수 있는 분 —— 우선순위 변화나 제로 베이스 구축도 두려워하지 않는 분속도 지향적으로, 완벽한 결과보다 지금 가치 있는 무언가를 전달하는 것을 중요하게 여기는 분피드백과 실패를 성장의 일부로 보고, 지속적으로 실력을 발전시키려는 분겸손함, 배움에 대한 열정, 실행력을 가지고 있으며, 동료들을 함께 성장시키는 분자격 요건머신러닝 또는 대규모 모델 파인튜닝용 데이터셋 준비 경험텍스트 및 이미지 데이터 정제, 전처리, 변환에 대한 강한 역량데이터 라벨링 워크플로우 및 라벨 데이터 품질 관리 경험모더레이션 데이터셋(안전성, 컴플라이언스, 필터링) 구축 및 유지 경험Python, SQL 등 스크립트 언어 활용 능력 및 대규모 데이터 파이프라인 구축 경험혜택 및 보상수평적인 조직 구조와 진정한 오너십제품 방향성과 합의 기반 의사결정에 전면적으로 참여유연한 근무 형태제품, 데이터, 엔지니어링 전반에 걸쳐 높은 영향력을 가지는 역할업계 최고 수준의 보상 및 성과 기반 보너스글로벌 제품 개발에 참여할 기회다양한 복지 —— 주택 임대 보조, 우수한 회사 구내식당, 야근 식사 제공건강, 치과, 안과 보험본인 및 가족을 위한 글로벌 여행 보험무제한·유연한 휴가 제도팀과 문화우리는 고밀도·고성과 팀으로, 고품질의 업무와 글로벌 임팩트에 집중합니다. 우리는 주인의식으로 행동하며, 속도, 명확함, 끊임없는 책임감을 중시합니다. 성장 욕구가 크고, 탁월함을 진심으로 추구하는 분이라면 함께 하기를 기대합니다.회사 소개: BJAKBJAK은 동남아시아 최대의 보험 비교 플랫폼으로, 800만 명 이상의 사용자를 보유하고 있으며, 직원이 100% 지분을 소유한 회사입니다. 본사는 말레이시아에 있으며, 태국, 대만, 일본에서도 운영되고 있습니다.우리는 Bjak.com을 통해 수백만 명의 사용자에게 투명하고 합리적인 금융 보호를 제공합니다. 또한, API, 자동화, AI 등 최첨단 기술을 활용해 복잡한 금융 상품을 단순화하고, 차세대 지능형 금융 시스템을 구축하고 있습니다.현실 세계에 영향을 미치는 AI 시스템을 구축하고, 임팩트 있는 환경에서 빠르게 성장하고 싶다면, 지금 바로 지원하세요!------------------------------------------Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

Data Engineer

Bjak
-
GB.svg
United Kingdom
Full-time
Remote
false
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

数据工程师 (Data Engineer)

Bjak
-
CN.svg
China
Full-time
Remote
false
将语言模型转化为现实应用我们正在为全球用户打造 AI 系统。当前正处于 AI 变革时代 —— 本项目团队专注于构建真正落地、产生现实世界影响力与大规模使用率的应用。该职位为全球岗位,支持灵活混合办公模式 —— 结合远程办公与总部现场协作。你将与产品、工程、运营、基础设施与数据等区域团队紧密合作,共同构建并扩展具有深远影响的 AI 解决方案。为什么这个职位重要你将参与高质量数据集的构建与管理,为大语言模型的微调、模型安全性、内容可信度提供基础支撑。你的工作直接决定了模型是否能在规模化应用中保持智能、可靠、安全。你的职责收集、清洗与预处理用户生成的文本与图像数据,用于大模型微调设计并管理可扩展的数据标注流程,结合众包平台与内部标注团队构建与维护用于内容审核的自动化数据集(例如:安全与不安全内容的识别)与研究员及工程师协作,确保数据集具有高质量、多样性,并对齐模型训练目标我们正在寻找这样的你:喜欢拥有主导权并独立完成任务相信“清晰来自行动” —— 原型、测试与迭代优于完美计划能在初创节奏下保持高效 —— 不惧优先级变化或从零开始构建系统有速度偏好 —— 更愿意及时交付有价值的成果,而不是追求完美却迟迟不落地将反馈与失败视为成长的机会 —— 渴望不断提升自己拥有谦逊、求知欲与实干精神,并乐于帮助他人共同进步任职要求有为机器学习或大模型微调准备数据集的成熟经验精通文本与图像数据的清洗、预处理与转换流程熟悉数据标注工作流,并能对标注数据进行质量控制与管理熟悉内容审核类数据集的构建与维护(例如安全性、合规性、过滤机制)精通脚本编程(如 Python、SQL),并能处理大规模数据管道你将获得扁平化组织架构与真实项目主导权全程参与产品方向制定与共识决策过程灵活办公制度高影响力岗位,跨产品、数据与工程多团队协作顶尖市场薪酬与绩效奖金制度全球化产品开发机会丰厚福利:住房补贴、高质量公司食堂、加班餐补健康、牙科与视力保险全球差旅保险(适用于你与家属)无限制、弹性带薪休假团队与文化我们是一支高密度、高绩效的团队,专注于打造高质量产品,影响全球用户。我们像主人一样思考与行动,重视执行速度、沟通清晰与极致责任感。如果你渴望成长并追求卓越,欢迎加入我们!关于 BJAKBJAK 是东南亚最大的保险聚合平台,拥有超过 800 万用户,并实现员工持股制。总部位于马来西亚,业务覆盖泰国、台湾、日本等地。我们通过 Bjak.com,帮助数百万用户获取透明、可负担的金融保障服务。我们通过 API、自动化与 AI 等前沿技术,简化复杂金融产品,致力于打造下一代智能金融系统。如果你希望在现实世界中构建真正有影响力的 AI 系统,并在高成长环境中实现快速突破,欢迎加入我们!------------------------------------------Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

Data Engineer

Bjak
-
GE.svg
Germany
Full-time
Remote
false
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

Data Engineer

Bjak
-
US.svg
United States
Full-time
Remote
false
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

Data Engineer

Bjak
-
HK.svg
Hong Kong
Full-time
Remote
false
Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

數據工程師 (Data Engineer)

Bjak
-
TW.svg
Taiwan
Full-time
Remote
false
將語言模型轉化為真實世界應用我們正在為全球用戶開發 AI 系統。這個專案團隊的目標是建構應用,讓 AI 在現實世界中發揮更大影響並獲得更廣泛使用。此職缺為 全球性角色,採取 混合工作模式。你將負責的工作收集、清洗與前處理用戶生成的文字與圖像數據,用於大型模型微調設計並管理可擴展的資料標註流程,結合眾包與內部標註團隊建立並維護內容審核的自動化資料集(安全 vs 非安全)與研究員及工程師合作,確保資料集品質、多樣性,並符合模型訓練需求必要條件機器學習或大模型微調資料集準備經驗熟悉文字與圖像數據清理、前處理與轉換具備資料標註流程與品質管理經驗熟悉建立與維護審核資料集(安全性、合規性、過濾)熟練 Python、SQL 並能操作大規模數據管線你將獲得扁平化組織與實際主導權全程參與產品方向與共識決策彈性混合工作模式高影響力角色,跨產品、數據與工程協作頂尖市場水準的薪酬與績效獎金全球產品開發經驗與曝光機會多樣福利:住房補助、優質員工餐廳、加班餐點補貼健康、牙科與視力保險全球差旅保險(本人與眷屬適用)無限制、彈性帶薪休假團隊與文化我們是一支高密度、高績效的團隊,專注於 高品質工作與全球影響力。我們像主人一樣行動,重視速度、清晰與徹底的責任感。如果你渴望成長並追求卓越,誠摯邀請你加入!關於 BjakBJAK 是東南亞第一大保險聚合平台,擁有 800 萬以上用戶,並由員工全額持股。總部設於馬來西亞,並在泰國、台灣與日本營運。我們透過 Bjak.com 幫助數百萬人獲得透明且可負擔的金融保障。我們利用 API、自動化與 AI 前沿技術,簡化複雜金融商品,致力於打造下一代智慧金融系統。------------------------------------------Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

データエンジニア (Data Engineer)

Bjak
-
JP.svg
Japan
Full-time
Remote
false
言語モデルを現実のアプリケーションへ変革する私たちはグローバルなユーザーを対象とした AI システムを構築しています。現在は AI トランジションの時代にあり、この新しいプロジェクトチームは、現実世界への影響力を拡大し、世界中で最大限に活用されるアプリケーションの構築に注力します。このポジションはグローバルな役割であり、柔軟なリモートワークと本社での対面コラボレーションを組み合わせたハイブリッド勤務を採用しています。製品、エンジニアリング、オペレーション、インフラ、データの各地域チームと緊密に連携し、影響力のある AI ソリューションを構築・拡張します。この役割が重要な理由最先端のモデルをファインチューニングし、評価フレームワークを設計し、AI 機能を本番環境に投入します。あなたの仕事は、モデルがインテリジェントであるだけでなく、安全で信頼でき、大規模に影響力を持つことを保証します。主な業務内容大規模モデルのファインチューニングのために、ユーザー生成のテキストおよび画像データを収集・クレンジング・前処理するクラウドソーシングおよび社内ラベリングチームを活用し、スケーラブルなデータラベリングパイプラインを設計・管理するコンテンツモデレーション用の自動化データセット(例:安全コンテンツ vs 非安全コンテンツ)を構築・維持する研究者やエンジニアと協力し、データセットが高品質、多様性を持ち、モデル学習ニーズに適合するようにする求める人物像主体性と独立性を好む方「行動から明確さが生まれる」と信じ、完璧な計画を待つのではなくプロトタイプ・テスト・反復を実行できる方スタートアップ特有の混乱下でも冷静かつ効果的に行動できる方 —— 優先順位の変化やゼロからの構築を恐れないスピードを重視し、完璧を待つよりも「今すぐ価値ある成果」を届けることを優先できる方フィードバックや失敗を成長の一部と捉え、常にレベルアップを目指せる方謙虚さ、向上心、行動力を持ち、仲間を助けながら進める方応募資格機械学習や大規模モデルのファインチューニング用データセット準備の実務経験テキストおよび画像データにおけるデータクレンジング、前処理、変換スキルデータラベリングワークフローやラベルデータの品質保証に関する実務経験モデレーションデータセット(安全性、コンプライアンス、フィルタリング)の構築・維持経験Python、SQL などのスクリプト言語に精通し、大規模データパイプラインの運用経験待遇・福利厚生フラットな組織構造と本当のオーナーシッププロダクト方向性や意思決定への全面的な関与柔軟な勤務形態プロダクト・データ・エンジニアリングを横断する高インパクトな役割市場最高水準の給与と成果に基づくボーナスグローバルなプロダクト開発への参画機会充実した福利厚生 —— 住宅補助、高品質な社員食堂、残業食事補助健康・歯科・眼科保険グローバル旅行保険(本人および扶養家族対象)無制限で柔軟な有給休暇制度チームと文化私たちは高密度・高パフォーマンスのチームであり、高品質な仕事とグローバルインパクトに注力しています。オーナーのように行動し、スピード、明確さ、徹底的な責任感を重視します。成長意欲があり、卓越性を大切にする方を歓迎します。会社概要:BJAKBJAK は東南アジア最大の保険アグリゲーターで、800 万人以上のユーザーを持ち、社員が完全に所有する企業です。本社はマレーシアにあり、タイ、台湾、日本でも事業を展開しています。Bjak.com を通じて、数百万人のユーザーに透明性が高く、手頃な金融保障を提供しています。また、API、自動化、AI などの先端技術を駆使し、複雑な金融商品をシンプルにし、次世代のインテリジェントな金融システムを構築しています。現実世界にインパクトを与える AI システムを構築し、高インパクトな環境で急速に成長したい方、ぜひご応募ください。------------------------------------------Transform Language Models into Real-World ApplicationsWe’re building AI systems for a global audience. We are living in an era of AI transition - this new project team will be focusing on building applications to enable more real world impact and highest usage for the world. This role is a global role with hybrid work arrangement - combining flexible remote work with in-office collaboration at our HQ. You’ll work closely with regional teams across product, engineering, operations, infrastructure and data to build and scale impactful AI solutions.Why This Role MattersYou’ll fine-tune state-of-the-art models, design evaluation frameworks, and bring AI features into production. Your work ensures our models are not only intelligent, but also safe, trustworthy, and impactful at scale.What You’ll DoCollect, clean, and preprocess user-generated text and image data for fine-tuning large modelsDesign and manage scalable data labeling pipelines, leveraging both crowdsourcing and in-house labeling teamsBuild and maintain automated datasets for content moderation (e.g., safe vs unsafe content)Collaborate with researchers and engineers to ensure datasets are high-quality, diverse, and aligned with model training needsWhat Is It LikeLikes ownership and independenceBelieve clarity comes from action - prototype, test, and iterate without waiting for perfect plans.Stay calm and effective in startup chaos - shifting priorities and building from zero doesn’t faze you.Bias for speed - you believe it’s better to deliver something valuable now than a perfect version much later.See feedback and failure as part of growth - you’re here to level up.Possess humility, hunger, and hustle, and lift others up as you go.RequirementsProven experience preparing datasets for machine learning or fine-tuning large modelsStrong skills in data cleaning, preprocessing, and transformation for both text and image dataHands-on experience with data labeling workflows and quality assurance for labeled dataFamiliarity with building and maintaining moderation datasets (safety, compliance, and filtering)Proficiency in scripting (Python, SQL) and working with large-scale data pipelinesWhat You’ll GetFlat structure & real ownershipFull involvement in direction and consensus decision makingFlexibility in work arrangementHigh-impact role with visibility across product, data, and engineeringTop-of-market compensation and performance-based bonusesGlobal exposure to product developmentLots of perks - housing rental subsidies, a quality company cafeteria, and overtime mealsHealth, dental & vision insuranceGlobal travel insurance (for you & your dependents)Unlimited, flexible time offOur Team & CultureWe’re a densed, high-performance team focused on high quality work and global impact. We behave like owners. We value speed, clarity, and relentless ownership. If you’re hungry to grow and care deeply about excellence, join us.About BjakBJAK is Southeast Asia’s #1 insurance aggregator with 8M+ users, fully owned by its employees. Headquartered in Malaysia and operating in Thailand, Taiwan, and Japan, we help millions of users access transparent and affordable financial protection through Bjak.com. We simplify complex financial products through cutting-edge technologies, including APIs, automation, and AI, to build the next generation of intelligent financial systems. If you're excited to build real-world AI systems and grow fast in a high-impact environment, we’d love to hear from you.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
Hidden link
labelbox_logo

Data Operations Engineer

Labelbox
USD
70000
-
150000
US.svg
United States
PL.svg
Poland
Full-time
Remote
false
Shape the Future of AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially. About Labelbox We're the only company offering three integrated solutions for frontier AI development: Enterprise Platform & Tools: Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale Frontier Data Labeling Service: Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling Why Join Us High-Impact Environment: We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions. Technical Excellence: Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence. Innovation at Speed: We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution. Continuous Growth: Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI. Clear Ownership: You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics. Role Overview We are seeking a skilled and detail-oriented Data Operations Engineer to support our data annotation and data quality assurance processes. In this role, you will play a critical part in optimizing, maintaining, and scaling our data labeling workflows, primarily using Labelbox. You will ensure that labelers are able to efficiently and accurately generate human-labeled data by building tools, using LLM models, automating common project management tasks, and troubleshooting complex issues within the production pipeline. Your ability to script in Python and apply engineering problem-solving principles to data operations will be key to improving both efficiency and quality across our projects. Your Impact Build, deploy, and maintain Python automation scripts and other tools to streamline the data annotation process, automate repetitive tasks, and reduce manual effort. Identify bottlenecks in the data labeling pipeline and implement solutions to enhance throughput, accuracy, and scalability of labeling operations. Work closely with the Project Management team to ensure that data labeling meets accuracy standards and troubleshoot any issues related to data quality.  Plan quality assurance workflows to use GenAI and open-source models to find data anomalies. Set up monitoring tools to track the performance of data annotation operations, reporting key metrics and areas for improvement to leadership. Integrate and manage third-party api tools with Labelbox, ensuring seamless operation and data flow across platforms. Ability to build and maintain internal tools with retool and similar tools. Provide ongoing technical support to the project managers and labelers, assisting with technical challenges in Labelbox and associated tools. What You Bring 3+ years of working experience in a technical role, interfacing with technical and non-technical teams, and writing Python scripts for data processing. 2+ years of experience using LLMs in prompting frameworks (e.g. LLM-as-a-judge). Some experience with machine learning models in scripts or data pipelines. Bachelor’s Degree in Engineering, Computer Science, or a technical field. Practical experience using LLMs or traditional models to assist annotation QA or generate/transform data. Proficiency in Python scripting and experience with automation of operational tasks. Experience with Labelbox or similar data annotation platforms. Strong analytical and problem-solving skills with a demonstrated ability to optimize processes. Experience with data pipelines, data analysis, and data workflow management. Familiarity with cloud platforms such as AWS, GCP, or Azure. English fluency. Knowledge of Statistical Analysis techniques to uncover bad patterns in human-labeled data. Nice to have Experience with Labelbox or similar data annotation platforms. Prior experience in a production or process engineering role, especially in data operations or similar environments. Understanding of project management methodologies and the ability to work collaboratively across teams. Alignerr Services at Labelbox As part of the Alignerr Services team, you'll lead implementation of customer projects and manage our elite network of AI experts who deliver high-quality human feedback crucial for AI advancement. Your team will oversee 250,000+ monthly hours of specialized work across RLHF, complex reasoning, and multimodal AI projects, resulting in quality improvements for Frontier AI Labs. You'll leverage our AI-powered talent acquisition system and exclusive access to 16M+ specialized professionals to rapidly build and deploy expert teams that help customers, which include the majority of leading AI labs and AI disruptors, achieve breakthrough AI capabilities through precisely aligned human data—directly contributing to the critical human element in advancing artificial intelligence.Labelbox strives to ensure pay parity across the organization and discuss compensation transparently.  The expected annual base salary range for United States-based candidates is below. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors, including skills and competencies, experience, and geographical location.Annual base salary range$70,000—$150,000 USDLife at Labelbox Location: Join our dedicated tech hubs in San Francisco or Wrocław, Poland Work Style: Hybrid model with 2 days per week in office, combining collaboration and flexibility Environment: Fast-paced and high-intensity, perfect for ambitious individuals who thrive on ownership and quick decision-making Growth: Career advancement opportunities directly tied to your impact Vision: Be part of building the foundation for humanity's most transformative technology Our Vision We believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated, the need for high-quality, specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs. Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs. Your Personal Data Privacy: Any personal information you provide Labelbox as a part of your application will be processed in accordance with Labelbox’s Job Applicant Privacy notice. Any emails from Labelbox team members will originate from a @labelbox.com email address. If you encounter anything that raises suspicions during your interactions, we encourage you to exercise caution and suspend or discontinue communications.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
Hidden link
happyrobot_logo

Analytics Data Engineer

HappyRobot
USD
0
120000
-
220000
US.svg
United States
Full-time
Remote
false
About HappyrobotHappyRobot is a platform to build and deploy AI workers that automate communication. See a demoOur AI workers connect to any system or data source to handle phone calls, email, messages…We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.We’re thrilled to share that with our $44M Series B, HappyRobot has now raised a total of $62M — backed by leading investors who believe in our mission and vision for the future.We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.About the RoleBuild foundational data products, dashboards and tools to enable self-serve analytics to scale across the company.Develop insightful and reliable dashboards to track performance of core metrics that will deliver insights to the whole company.Build and maintain robust data pipelines and models to ensure data quality.Partner with Product, Engineering, and Design teams to inform decisions.Translate complex data into clear, actionable insights for product teams.Love working with data. Love making product better. Love finding the story behind the numbers.ResponsibilitiesDefine, build, and maintain product metrics, dashboards, and pipelines.Write SQL and Python code to extract, transform, and analyze data.Design and run experiments (A/B tests) to support product development.Proactively explore data to identify product opportunities and insights.Collaborate with cross-functional teams to ensure data-driven decisions.Ensure data quality, reliability, and documentation across analytics efforts.Must Have3+ years of experience as an Analytics Data Engineer or similar Data Science & Analytics roles, preferably partnering with GTM and Product leads to build and report on key company-wide metrics.Strong SQL and data engineering skills to transform data into accurate, clean data models (e.g., dbt, Airflow, data warehouses).Advanced analytics experience: segmentation and cohort analysis.Proficiency in Python for data analysis and modeling.Excellent communication skills: able to explain complex data insights clearly.Curious, collaborative, and driven to make an impact in a fast-paced environment.Nice to HaveExperience in B2B SaaS or AI/ML products.Familiarity with product analytics tools (e.g., Mixpanel, Amplitude).Exposure to machine learning concepts or AI-powered systems.Why join us?Opportunity to work at a high-growth AI startup, backed by top investors.Fast Growth - Backed by a16z and YC, on track for double-digit ARR.Ownership & Autonomy - Take full ownership of projects and ship fast.Top-Tier Compensation - Competitive salary + equity in a high-growth startup.Comprehensive Benefits - Healthcare, dental, vision coverage.Work With the Best - Join a world-class team of engineers and builders.Our Operating Principles Extreme Ownership We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting — if something needs fixing, we own it and make it better. This means stepping up, even when it’s not “your job.” If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it. We don’t wait for someone else to solve it — we lead with accountability and expect the same from those around us. Craftsmanship Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for “just fine.” We sweat the details because details compound. Whether it’s a product feature, an internal doc, or a sales call — we treat it as a reflection of our standards. We aim to deliver jaw-dropping customer experiences by being curious, meticulous, and proud of what we build — even when nobody’s watching. We are “majos” Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. “Majo” is our way of saying: be a good human. Be approachable, helpful, and warm. We’re building something ambitious, and it’s easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego. Urgency with Focus Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high leverage: the biggest results from the simplest, smartest actions. We’re running a high-speed marathon — not a sprint with no strategy. Talent Density and Meritocracy Hire only people who can raise the average; ‘exceptional performance is the passing grade.’ Ability trumps seniority. We believe the best teams are built on talent density — every hire should raise the bar. We reward contribution, not titles or tenure. We give ownership to those who earn it, and we all hold each other to a high standard. A-players want to work with other A-players — that’s how we win. First-Principles Thinking Strip a problem to physics-level facts, ignore industry dogma, rebuild the solution from scratch. We don’t copy-paste solutions. We go back to basics, ask why things are the way they are, and rebuild from the ground up if needed. This mindset pushes us to innovate, challenge stale assumptions, and move faster than incumbents. It’s how we build what others think is impossible.The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through security@happyrobot.ai subject to the GDPR.For more information, visit https://www.happyrobot.ai/privacy-policyBy submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.
Data Engineer
Data Science & Analytics
Apply
Hidden link
taktile1_logo

Backend Engineer - Data Warehouse

Taktile
-
GE.svg
Germany
Full-time
Remote
false
About the roleJoin our Optimization team at Taktile as a Data Warehouse Engineer. You will build isolated, scalable data warehouse infrastructure for finance teams on AWS using Iceberg/Athena/Jupyter/Parquet. Support operational, ML/AI and visualizations tools to help customers derive value from their data. Your contributions will directly enhance our automated decisioning platform, allowing users to improve financial decision policies at scale through the use of production data. Location: Taktile operates on a hybrid model. This role is based out of our Berlin HQ.What You'll DoBuild and maintain isolated, regionalized data warehouse infrastructure for use in customer facing features.Develop data tools for improving policy performance, such as training ML models on historical data and backtesting at scale.Design and develop scalable and network optimized RESTful APIs using Python on AWS, leveraging services such as Lambda, S3, SQL and Parquet.Optimize data warehouse efficiency, conduct peer code reviews, and produce technical documentation.Collaborate with cross-functional teams in an Agile environment to translate business requirements into technical solutions.Requirements:Minimum of 5 years of experience in Python and SQLPrior experience in Data Platform Engineering/Data Viz/Machine Learning or Artificial IntelligenceFluency in English, both written and spoken, is crucial to lead communication in our globally distributed environment.Ideal, But Not RequiredExperience in Iceberg/AthenaExperience in Python, FastAPIExpertise in data engineering topics, SQL, parquetExperience with AWS services and serverless architectures.What we offer:Work with colleagues that lift you up, challenge you, celebrate you and help you grow. We come from many different backgrounds, but what we have in common is the desire to operate at the very top of our fields. If you are similarly capable, caring, and driven, you'll find yourself at home here.Make an impact and meaningfully shape an early-stage company.Experience a truly flat hierarchy and communicate directly with founding team members. Having an opinion and voicing your ideas is not only welcome but encouraged, especially when they challenge the status quo.Learn from experienced mentors and achieve tremendous personal and professional growth. Get to know and leverage our network of leading tech investors and advisors around the globe.Receive a top-of-market equity and cash compensation package.Get access to a self-development budget you can use to e.g. attend conferences, buy books or take classes.Receive a new Apple MacBook Pro, as well as meaningful home office set-up.Our stance:We're eager to meet talented and driven candidates regardless of whether they tick all the boxes. We're looking for someone who will add to our culture, not just fit within it. We strongly encourage individuals from groups traditionally underestimated and underrepresented in tech to applyWe seek to actively recognize and combat racism, sexism, ableism and ageism. We embrace and support all gender identities and expressions, and celebrate love in its many forms. We won't inquire about how you identify or if you've experienced discrimination, but if you want to tell your story, we are all earsAbout us:Taktile is building the world's leading software platform for running critical and highly-automated decisions. Our customers use our product to catch fraudsters, prevent money laundering, and expand access to credit for small businesses, among many other use cases. Taktile is already making millions of such decisions across the globe every day.Taktile is based in Berlin, London and New York City. It was founded by machine learning and data science veterans with extensive experience building and running production ML in financial services. Our team consists of engineers, entrepreneurs, and researchers with a diverse set of backgrounds. Some of us attended top universities such as Harvard, Oxford, and Stanford and some of us have no degree at all. We have accumulated extensive work experience at leading tech companies, startups, and the enterprise software sphere.Our backers include Y Combinator, Index Ventures, and stellar angels such as the founders of Looker, GitHub, Mulesoft, Datadog and UiPath.
Data Engineer
Data Science & Analytics
Apply
Hidden link
file.jpeg

Big Data Architect

Databricks
0
0
-
0
GE.svg
Germany
Remote
false
CSQ426R218 We have 5 open positions based in our Germany offices.  As a Big Data Solutions Architect (Resident Solutions Architect) in our Professional Services team you will work with clients on short to medium term customer engagements on their big data challenges using the Databricks Data Intelligence Platform. You will provide data engineering, data science, and cloud technology projects which require integrating with client systems, training, and other technical tasks to help customers to get most value out of their data. RSAs are billable and know how to complete projects according to specification with excellent customer service. You will report to the regional Manager/Lead. The impact you will have: You will work on a variety of impactful customer technical projects which may include designing and building reference architectures, creating how-to's and productionalizing customer use cases Work with engagement managers to scope variety of  professional services work with input from the customer Guide strategic customers as they implement transformational big data projects, 3rd party migrations, including end-to-end design, build and deployment of industry-leading big data and AI applications Consult on architecture and design; bootstrap or implement customer projects which leads to a customers' successful understanding, evaluation and adoption of Databricks. Provide an escalated level of support for customer operational issues. You will work with the Databricks technical team, Project Manager, Architect and Customer team to ensure the technical components of the engagement are delivered to meet customer's needs. Work with Engineering and Databricks Customer Support to provide product and implementation feedback and to guide rapid resolution for engagement specific product and support issues. What we look for: Proficient in data engineering, data platforms, and analytics with a strong track record of successful projects and in-depth knowledge of industry best practices Comfortable writing code in either Python or Scala Enterprise Data Warehousing experience (Teradata / Synapse/ Snowflake or SAP) Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) with expertise in at least one Deep experience with distributed computing with Apache Spark™ and knowledge of Spark runtime internals Familiarity with CI/CD for production deployments Working knowledge of MLOps  Design and deployment of performant end-to-end data architectures Experience with technical project delivery - managing scope and timelines. Documentation and white-boarding skills. Experience working with clients and managing conflicts. Build skills in technical areas which support the deployment and integration of Databricks-based solutions to complete customer projects. Travel is required up to 10%, more at peak times. Databricks Certification About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook. Benefits At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.  Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Data Engineer
Data Science & Analytics
Solutions Architect
Software Engineering
Apply
Hidden link
synthesia_technologies_logo

Finance Data Engineer

Synthesia
-
GB.svg
United Kingdom
Full-time
Remote
true
Welcome to the video first world From your everyday PowerPoint presentations to Hollywood movies, AI will transform the way we create and consume content. Today, people want to watch and listen, not read — both at home and at work. If you’re reading this and nodding, check out our brand video. Despite the clear preference for video, communication and knowledge sharing in the business environment are still dominated by text, largely because high-quality video production remains complex and challenging to scale—until now…. Meet Synthesia We're on a mission to make video easy for everyone. Born in an AI lab, our AI video communications platform simplifies the entire video production process, making it easy for everyone, regardless of skill level, to create, collaborate, and share high-quality videos. Whether it's for delivering essential training to employees and customers or marketing products and services, Synthesia enables large organizations to communicate and share knowledge through video quickly and efficiently. We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2. In 2023, we were one of 7 European companies to reach unicorn status. In February 2024, G2 named us as the fastest growing company in the world. We’ve raised over $150M in funding from top-tier investors, including Accel, Nvidia, Kleiner Perkins, Google and top founders and operators including Stripe, Datadog, Miro, Webflow, and Facebook. About the role… Synthesia is building a modern, AI-powered back office. We are hiring a full-stack engineer to design and run the data layer that powers Finance ensuring data from finance systems land securely in a data warehouse and is easily consumed via Copilot (LLM) and Omni dashboards. You’ll be a cornerstone in our shift to modern finance workflows. Key Responsibilities: Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and securely store data from NetSuite and other finance systems into Snowflake, ensuring data integrity, compliance, and security best practices (e.g., encryption, access controls, and auditing). Collaborate with finance and data teams to define data models, schemas, and governance policies that support modern finance workflows, including automated reporting, forecasting, and anomaly detection. Implement data retrieval mechanisms optimized for LLM-based querying via Co-pilot and/or similar tools, enabling natural language access to financial data while maintaining accuracy and contextual relevance. Build and optimize interactive dashboards in Omni for real-time visualization and analysis of key metrics, such as financial performance and operational KPIs. Monitor and troubleshoot data pipelines, performing root-cause analysis on issues related to data quality, latency, or availability, and implementing proactive solutions to ensure high reliability. Document processes, architectures, and best practices to facilitate knowledge sharing and scalability within the team. Qualifications: Bachelor's or Master's degree in Computer Science, Finance, Information Systems, or a related field. 5+ years of experience as a data engineer or analytics engineer, with a proven track record in full stack data development (from ingestion to visualization). Strong expertise in Snowflake, including data modeling, warehousing, and performance optimization. Hands-on experience with ETL tools (e.g., Apache Airflow, dbt, Fivetran) and integrating data from ERP systems like NetSuite. Proficiency in SQL, Python, and/or other scripting languages for data processing and automation. Familiarity with LLM integrations (e.g., for natural language querying) and dashboarding tools like Omni or similar (e.g., Tableau, Looker). Solid understanding of data security principles, including GDPR/CCPA compliance, role-based access, and encryption in cloud environments. Excellent problem-solving skills, with the ability to work cross-functionally in agile teams. At Synthesia we expect everyone to... Put the Customer First Own it & Go Direct Be Fast & Experimental Make the Journey Fun You can read more about this in our public Notion page. Location: LON or UK Remote UK Benefits 📍A hybrid, flexible approach to work where you have access to a lovely office space in Oxford Circus with free lunches on a Wednesday and Friday 💸 A competitive salary + stock options 🏝 25 days of annual leave + public holidays 🏥 Private healthcare through AXA ❣️ Pension contribution - Synthesia contributes 3% and employees contribute 5% on qualifying earnings 🍼 Paid parental leave entitling primary caregivers to 16 weeks of full pay, and secondary 5 weeks of full pay 👉 You can participate in a generous recruitment referral scheme if you help us to hire 💻 The equipment you need to be successful in your role
Data Engineer
Data Science & Analytics
Apply
Hidden link
sierra_logo

Data Engineer

Sierra
USD
0
220000
-
340000
US.svg
United States
Full-time
Remote
false
About usAt Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco, with growing offices in Atlanta, New York, and London.We are guided by a set of values that are at the core of our actions and define our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These values are the foundation of our work, and we are committed to upholding them in everything we do.Our co-founders are Bret Taylor and Clay Bavor. Bret currently serves as Board Chair of OpenAI. Previously, he was co-CEO of Salesforce (which had acquired the company he founded, Quip) and CTO of Facebook. Bret was also one of Google's earliest product managers and co-creator of Google Maps. Before founding Sierra, Clay spent 18 years at Google, where he most recently led Google Labs. Earlier, he started and led Google’s AR/VR effort, Project Starline, and Google Lens. Before that, Clay led the product and design teams for Google Workspace. What you'll doSierra is in the process of building out its core data foundations, and you’ll play a pivotal role in shaping the company’s data strategy and infrastructure. Partnering with engineering, product, and GTM teams, you’ll design and operate scalable batch and real-time data systems, create trusted data models, and build the pipelines that power experimentation, analytics, and AI development.You’ll ensure that every area of the business—from customer experience to go-to-market execution—has access to high-quality, reliable data to drive insight and innovation. Beyond infrastructure, you’ll influence how data is captured, governed, and leveraged across Sierra, empowering decision-making at scale.This is a unique opportunity to establish the foundations of Sierra’s data ecosystem, drive standards for reliability and trust, and enable the next generation of AI-powered customer interactions.What you'll bringProven Experience: Extensive experience in data engineering, with a track record of designing and operating data pipelines, systems, and models at scale.Curiosity & Customer Obsession: Passion for building trustworthy data systems that empower teams to better understand users and deliver impactful product experiences.Adaptability and Resilience: Comfort working in a fast-paced startup environment, able to adapt to evolving priorities and deliver reliable solutions amidst ambiguity.Technical Proficiency: Strong proficiency in SQL and Python, with expertise in distributed data processing frameworks (e.g., Spark, Flink, Kafka) and cloud-based platforms (AWS, GCP).Data Architecture Skills: Deep experience with data modeling, warehousing, and designing schemas optimized for analytics, experimentation, and AI/ML workloads.Data Quality & Governance: Strong understanding of data validation, monitoring, compliance, and best practices for ensuring data integrity across pipelines.Excellent Communication: Ability to translate technical infrastructure and data design trade-offs into clear recommendations for product, engineering, and business stakeholders.Great Collaboration: Proven ability to partner closely with product, ML, analytics, and GTM teams to deliver data foundations that unlock business and product innovation.Even better...Experience with (open to equivalents) - AWS Glue, Athena, Kafka, Flink/Spark, dbt, Airflow/Dagster, Terraform.Experience working with large language models (LLMs), conversational AI, or agent-based systems.Familiarity with building or improving data platforms for advanced analytics.Our valuesTrust: We build trust with our customers with our accountability, empathy, quality, and responsiveness. We build trust in AI by making it more accessible, safe, and useful. We build trust with each other by showing up for each other professionally and personally, creating an environment that enables all of us to do our best work.Customer Obsession: We deeply understand our customers’ business goals and relentlessly focus on driving outcomes, not just technical milestones. Everyone at the company knows and spends time with our customers. When our customer is having an issue, we drop everything and fix it.Craftsmanship: We get the details right, from the words on the page to the system architecture. We have good taste. When we notice something isn’t right, we take the time to fix it. We are proud of the products we produce. We continuously self-reflect to continuously self-improve.Intensity: We know we don’t have the luxury of patience. We play to win. We care about our product being the best, and when it isn’t, we fix it. When we fail, we talk about it openly and without blame so we succeed the next time.Family: We know that balance and intensity are compatible, and we model it in our actions and processes. We are the best technology company for parents. We support and respect each other and celebrate each other’s personal and professional achievements.What we offerWe want our benefits to reflect our values and offer the following to full-time employees:Flexible (Unlimited) Paid Time OffMedical, Dental, and Vision benefits for you and your familyLife Insurance and Disability BenefitsRetirement Plan (e.g., 401K, pension) with Sierra matchParental LeaveFertility and family building benefits through CarrotLunch, as well as delicious snacks and coffee to keep you energized Discretionary Benefit Stipend giving people the ability to spend where it matters mostFree alphorn lessonsThese benefits are further detailed in Sierra's policies and are subject to change at any time, consistent with the terms of any applicable compensation or benefits plans. Eligible full-time employees can participate in Sierra's equity plans subject to the terms of the applicable plans and policies.Be you, with usWe're working to bring the transformative power of AI to every organization in the world. To do so, it is important to us that the diversity of our employees represents the diversity of our customers. We believe that our work and culture are better when we encourage, support, and respect different skills and experiences represented within our team. We encourage you to apply even if your experience doesn't precisely match the job description. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, national origin, age, disability, veteran status, pregnancy, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.
Data Engineer
Data Science & Analytics
Apply
Hidden link
metropolisio_logo

Data Annotation Intern

Metropolis
-
IN.svg
India
Intern
Remote
false
Who we are Metropolis is an artificial intelligence company that uses computer vision technology to enable frictionless, checkout-free experiences in the real world. Today, we are reimagining parking to enable millions of consumers to just "drive in and drive out." We envision a future where people transact in the real world with a speed, ease and convenience that is unparalleled, even online. Tomorrow, we will power checkout-free experiences anywhere you go to make the everyday experiences of living, working and playing remarkable - giving us back our most valuable asset, time. Who you are We are seeking a meticulous and detail-oriented Data Annotator Intern to join our data team. In this role, you will play a crucial part in the development of our machine learning and AI models by accurately labeling, tagging, and categorizing large datasets. Your work will directly impact the performance and quality of our AI systems, requiring a high level of concentration and adherence to specific project guidelines. This is an entry-level position perfect for individuals who are organized, focused, and want to contribute to the cutting edge of technology. Duration: 6 Months What you'll do Accurately label and annotate data according to project specifications and guidelines. Work with various data types, including images, text, to perform tasks such as: Image Annotation: Drawing bounding boxes, polygons, or keypoints to identify objects in images. Text Annotation: Categorizing text, sentiment analysis, and named entity recognition. Review and quality-check annotated data to ensure a high level of accuracy and consistency. Communicate with team leads to clarify project guidelines or address challenges. Maintain data security and confidentiality at all times. What we're looking for Graduate or equivalent is required Exceptional attention to detail and a methodical approach to tasks Strong comprehension skills and the ability to follow complex instructions precisely Good communication skills to collaborate with a team Ability to work in a fast-paced environment and meet daily quality and productivity targets. While not required, these are a plus: Previous experience in data entry, quality assurance, or other detail-oriented roles is a plus, but not required. We provide all necessary training. Our Stack Languages + Frameworks: TypeScript, React, Scala (principally), Java (limited) Datastores: MySQL, PostgreSQL, Snowflake Cloud: AWS Version control: Git & GitHub AI Tooling: Copilot on GitHub Observability: Datadog When you join Metropolis, you’ll join a team of world-class product leaders and engineers, building an ecosystem of technologies at the intersection of parking, mobility, and real estate. Our goal is to build an inclusive culture where everyone has a voice and the best idea wins. You will play a key role in building and maintaining this culture as our organization grows. #LI-AR1 #LI-OnsiteMetropolis values in-person collaboration to drive innovation, strengthen culture, and enhance the Member experience. Our corporate team members hold to our office-first model, which requires employees to be on-site at least four days a week, fostering organic interactions that spark creativity and connection Metropolis may utilize an automated employment decision tool (AEDT) to assess or evaluate your candidacy for employment or promotion. AEDTs are used to assist in assessing a candidate’s application relative to the required job qualifications and responsibilities listed in the job posting. As part of this process, Metropolis retains data relevant to your candidacy, including personal information, for a period that is reasonably necessary for the use of the tool. If you are hired for the position, your data may become part of your employee records. Metropolis Technologies is an equal opportunity employer. We make all hiring decisions based on merit, qualifications, and business needs, without regard to race, color, religion, sex (including gender identity, sexual orientation, or pregnancy), national origin, disability, veteran status, or any other protected characteristic under federal, state, or local law.
Data Engineer
Data Science & Analytics
Apply
Hidden link
maincodehq_logo

Senior Data Engineer

Maincode
AUD
0
150000
-
180000
AU.svg
Australia
Full-time
Remote
false
OverviewMaincode is building sovereign AI models in Australia. We are training foundation models from scratch, designing new reasoning architectures, and deploying them on state-of-the-art GPU clusters. Our models are built on datasets we create ourselves, curated, cleaned, and engineered for performance at scale. This is not buying off-the-shelf corpora or scraping without thought. This is building world-class datasets from the ground up.As a Senior Data Engineer, you will lead the design and construction of these datasets. You will work hands-on to source, clean, transform, and structure massive amounts of raw data into training-ready form. You will design the architecture that powers data ingestion, validation, and storage for multi-terabyte to petabyte-scale AI training. You will collaborate with AI Researchers and Engineers to ensure every byte is high quality, relevant, and optimised for training cutting-edge large language models and other architectures.This is a deep technical role. You will be writing code, building pipelines, defining schemas, and debugging unusual data edge cases at scale. You will think like both a data scientist and a systems engineer, designing for correctness, scalability, and future proofing. If you want to build the datasets that power sovereign AI from first principles, this is your team. What you’ll doDesign and build large-scale data ingestion and curation pipelines for AI training datasetsSource, filter, and process diverse data types including text, structured data, code, and multimodal, from raw form to model-ready formatImplement robust quality control and validation systems to ensure dataset integrity, relevance, and ethical complianceArchitect storage and retrieval systems optimised for distributed training at scaleBuild tooling to track dataset lineage, reproducibility, and metadata at all stages of the pipelineWork closely with AI Researchers to align datasets with evolving model architectures and training objectivesCollaborate with DevOps and ML engineers to integrate data systems into large-scale training workflowsContinuously improve ingestion speed, preprocessing efficiency, and data freshness for iterative training cycles Who you arePassionate about building world-class datasets for AI training from raw source to training-readyExperienced in Python and data engineering frameworks such as Apache Spark, Ray, or DaskSkilled in working with distributed data storage and processing systems such as S3, HDFS, or cloud object storageStrong understanding of data quality, validation, and reproducibility in large-scale ML workflowsFamiliar with ML frameworks like PyTorch or JAX, and how data pipelines interact with themComfortable working with multi-terabyte or larger datasetsHands-on and pragmatic, you like solving real data problems with code and automationMotivated to help build sovereign AI capability in Australia Why MaincodeWe are a small team building some of the most advanced AI systems in Australia. We create new foundation models from scratch, not just fine-tune existing ones, and we build the datasets they run on from the ground up.We operate our own GPU clusters, run large-scale training, and integrate research and engineering closely to push the frontier of what is possible.You will be surrounded by people who:Care deeply about data quality and architecture, not just volumeBuild systems that scale reliably and repeatablyTake pride in learning, experimenting, and shippingWant to help Australia build independent, world-class AI systems
Data Engineer
Data Science & Analytics
Apply
Hidden link
worth_ai_logo

Principal Data Engineer

Worth AI
-
US.svg
United States
Full-time
Remote
true
Worth AI, a leader in the computer software industry, is looking for a talented and experienced Principal Data Engineer to join their innovative team. At Worth AI, we are on a mission to revolutionize decision-making with the power of artificial intelligence while fostering an environment of collaboration, and adaptability, aiming to make a meaningful impact in the tech landscape.. Our team values include extreme ownership, one team and creating reaving fans both for our employees and customers.Worth is looking for a Principal Data Engineer to own the company-wide data architecture and platform. Design and scale reliable batch/streaming pipelines, institute data quality and governance, and enable analytics/ML with secure, cost-efficient systems. Partner with engineering, product, analytics, and security to turn business needs into durable data products.ResponsibilitiesWhat you will do: Architecture & Strategy Define end-to-end data architecture (lake/lakehouse/warehouse, batch/streaming, CDC, metadata). Set standards for schemas, contracts, orchestration, storage layers, and semantic/metrics models. Publish roadmaps, ADRs/RFCs, and “north star” target states; guide build vs. buy decisions. Platform & Pipelines Design and build scalable, observable ELT/ETL and event pipelines. Establish ingestion patterns (CDC, file, API, message bus) and schema-evolution policies. Provide self-service tooling for analysts/scientists (dbt, notebooks, catalogs, feature stores). Ensure workflow reliability (idempotency, retries, backfills, SLAs). Data Quality & Governance Define dataset SLAs/SLOs, freshness, lineage, and data certification tiers. Enforce contracts and validation tests; deploy anomaly detection and incident runbooks. Partner with governance on cataloging, PII handling, retention, and access policies. Reliability, Performance & Cost Lead capacity planning, partitioning/clustering, and query optimization. Introduce SRE-style practices for data (error budgets, postmortems). Drive FinOps for storage/compute; monitor and reduce cost per TB/query/job. Security & Compliance Implement encryption, tokenization, and row/column-level security; manage secrets and audits. Align with SOC 2 and privacy regulations (e.g., GDPR/CCPA; HIPAA if applicable). ML & Analytics Enablement Deliver versioned, documented datasets/features for BI and ML. Operationalize training/serving data flows, drift signals, and feature-store governance. Build and maintain the semantic layer and metrics consistency for experimentation/BI. Leadership & Collaboration Provide technical leadership across squads; mentor senior/staff engineers. Run design reviews and drive consensus on complex trade-offs. Translate business goals into data products with product/analytics leaders. Requirements 10+ years in data engineering (including 3+ years as staff/principal or equivalent scope). Proven leadership of company-wide data architecture and platform initiatives. Deep experience with at least one cloud (AWS) and a modern warehouse or lakehouse (e.g., Snowflake, Redshift, Databricks). Strong SQL and one programming language (Python or Scala/Java). Orchestration (Airflow/Dagster/Prefect), transformations (dbt or equivalent), and streaming (Kafka/Kinesis/PubSub). Data modeling (3NF, star, data vault) and semantic/metrics layers. Data quality testing, lineage, and observability in production environments. Security best practices: RBAC/ABAC, encryption, key management, auditability. Nice to Have Feature stores and ML data ops; experimentation frameworks. Cost optimization at scale; multi-tenant architectures. Governance tools (DataHub/Collibra/Alation), OpenLineage, and testing frameworks (Great Expectations/Deequ). Compliance exposure (SOC 2, GDPR/CCPA; HIPAA/PCI where relevant). Model features sourced from complex 3rd-party data (KYB/KYC, credit bureaus, fraud detection APIs) Benefits Health Care Plan (Medical, Dental & Vision) Retirement Plan (401k, IRA) Life Insurance Unlimited Paid Time Off 9 paid Holidays Family Leave Work From Home Free Food & Snacks (Access to Industrious Co-working Membership!) Wellness Resources
Data Engineer
Data Science & Analytics
Apply
Hidden link
No job found
There is no job in this category at the moment. Please try again later