Top Data Engineer Jobs Openings in 2025
Looking for opportunities in Data Engineer? This curated list features the latest Data Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.
Founding Data Engineer
Elicit
11-50
USD
0
185000
-
305000
United States
Full-time
Remote
false
About ElicitElicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty.What we're aiming for:Elicit radically increases the amount of good reasoning in the world.For experts, Elicit pushes the frontier forward.For non-experts, Elicit makes good reasoning more accessible. People who don't have the tools, expertise, time, or mental energy to make carefully-reasoned decisions on their own can do so with Elicit.Elicit is a scalable ML system based on human-understandable task decompositions, with supervision of process, not outcomes. This expands our collective understanding of safe AGI architectures.Visit our Twitter to learn more about how Elicit is helping researchers and making progress on our mission.Why we're hiring for this roleTwo main reasons:Currently, Elicit operates over academic papers and clinical trials. One of your key initial responsibilities will be to build a complete corpus of these documents, available as soon as they're published, combining different data sources and ingestion methods. Once that's done there is a growing list of other document types and sources we'd love to integrate!One of our main initiatives is to broaden the sorts of tasks you can complete in Elicit. We need a data engineer to figure out the best way to ingest massive amounts of heterogeneous data in such a way as to make it usable by LLMs. We need your help to integrate into our customers' custom data providers to that they can create task-specific workflows over them.In general, we're looking for someone who can architect and implement robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.Our tech stackData pipeline: Python, Flyte, SparkProbably less relevant to you, but ICOI:Backend: Node and Python, event sourcingFrontend: Next.js, TypeScript, and TailwindWe like static type checking in Python and TypeScript!All infrastructure runs in Kubernetes across a couple of cloudsWe use GitHub for code reviews and CIWe deploy using the gitops pattern (i.e. deploys are defined and tracked by diffs in our k8s manifests)Am I a good fit?Consider the questions:How would you optimize a Spark job that's processing a large amount of data but running slowly?What are the differences between RDD, DataFrame, and Dataset in Spark? When would you use each?How does data partitioning work in distributed systems, and why is it important?How would you implement a data pipeline to handle regular updates from multiple academic paper sources, ensuring efficient deduplication?If you have a solid answer for these—without reference to documentation—then we should chat!Location and travelWe have a lovely office in Oakland, CA; there are people there every day but we don't all work from there all the time. It's important to us to spend time with our teammates, however, so we ask that all Elicians spend about 1 week out of every 6 with teammates.We wrote up more details on this page.What you'll bring to the role5+ years of experience as a data engineer: owning make-or-break decisions about how to ingest, manage, and use dataStrong proficiency in Python (5+ years experience)You have created and owned a data platform at rapidly-growing startups—gathering needs from colleagues, planning an architecture, deploying the infrastructure, and implementing the toolingExperience with architecting and optimizing large data pipelines, ideally with particular experience with Spark; ideally these are pipelines which directly support user-facing features (rather than internal BI, for example)Strong SQL skills, including understanding of aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering approachesExperience with columnar data storage formats like ParquetStrong opinions, weakly-held about approaches to data quality managementCreative and user-centric problem-solvingYou should be excited to play a key role in shipping new features to users—not just building out a data platform!Nice to HaveExperience in developing deduplication processes for large datasetsHands-on experience with full-text extraction and processing from various document formats (PDF, HTML, XML, etc.)Familiarity with machine learning concepts and their application in search technologiesExperience with distributed computing frameworks beyond Spark (e.g., Dask, Ray)Experience in science and academia: familiarity with academic publications, and the ability to accurately model the needs of our usersHands-on experience with industry standard tools like Airflow, DBT, or HadoopHands-on experience with standard paradigms like data lake, data warehouse, or lakehouseWhat you'll doYou'll own:Building and optimizing our academic research paper pipelineYou'll architect and implement robust, scalable systems to handle data ingestion while maintaining high performance and quality.You'll work on efficiently deduplicating hundreds of millions of research papers, and calculating embeddings.Your goal will be to make Elicit the most complete and up-to-date database of scholarly sources.Expanding the datasets Elicit works overOur users want Elicit to work over court documents, SEC filings, … your job will be to figure out how to ingest and index a rapidly increasing ontology of documents.We also want to support less structured documents, spreadsheets, presentations, all the way up to rich media like audio and video.Larger customers often want for us to integrate private data into Elicit for their organisation to use. We'll look to you to define and build a secure, reliable, fast, and auditable approach to these data connectors.Data for our ML systemsYou'll figure out the best way to preprocess all these data mentioned above to make them useful to models.We often need datasets for our model fine-tuning. You'll work with our ML engineers and evaluation experts to find, gather, version, and apply these datasets in training runs.Your first week:Start building foundational contextGet to know your team, our stack (including Python, Flyte, and Spark), and the product roadmap.Familiarize yourself with our current data pipeline architecture and identify areas for potential improvement.Make your first contribution to ElicitComplete your first Linear issue related to our data pipeline or academic paper processing.Have a PR merged into our monorepo, demonstrating your understanding of our development workflow.Gain understanding of our CI/CD pipeline, monitoring, and logging tools specific to our data infrastructure.Your first month:You'll complete your first multi-issue projectTackle a significant data pipeline optimization or enhancement project.Collaborate with the team to implement improvements in our academic paper processing workflow.You're actively improving the teamContribute to regular team meetings and hack days, sharing insights from your data engineering expertise.Add documentation or diagrams explaining our data pipeline architecture and best practices.Suggest improvements to our data processing and storage methodologies.Your first quarter:You're flying soloIndependently implement significant enhancements to our data pipeline, improving efficiency and scalability.Make impactful decisions regarding our data architecture and processing strategies.You've developed an area of expertiseBecome the go-to resource for questions related to our academic paper processing pipeline and data infrastructure.Lead discussions on optimizing our data storage and retrieval processes for academic literature.You actively research and improve the productPropose and scope improvements to make Elicit more comprehensive and up-to-date in terms of scholarly sources.Identify and implement technical improvements to surpass competitors like Google Scholar in terms of coverage and data quality.Compensation, benefits, and perksIn addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location):Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking eventsFully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your familyFlexible vacation policy, with a minimum recommendation of 20 days/year + company holidays401K with a 6% employer matchA new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools to incorporate into your workflow, take courses, purchase educational resources, or attend AI-focused conferences and eventsA team administrative assistant who can help you with personal and work tasksYou can find more reasons to work with us in this thread!For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of:Senior (L4): $185-270k + equityExpert (L5): $215-305k + equityPrincipal (L6): >$260 + significant equityWe're optimizing for a hire who can contribute at a L4/senior-level or above.We also offer above-market equity for all roles at Elicit, as well as employee-friendly equity terms (10-year exercise periods).
Data Engineer
Data Science & Analytics
Apply
September 24, 2025
Data Operations Engineer
Labelbox
201-500
USD
70000
-
150000
United States
Poland
Full-time
Remote
false
Shape the Future of AI At Labelbox, we're building the critical infrastructure that powers breakthrough AI models at leading research labs and enterprises. Since 2018, we've been pioneering data-centric approaches that are fundamental to AI development, and our work becomes even more essential as AI capabilities expand exponentially. About Labelbox We're the only company offering three integrated solutions for frontier AI development: Enterprise Platform & Tools: Advanced annotation tools, workflow automation, and quality control systems that enable teams to produce high-quality training data at scale Frontier Data Labeling Service: Specialized data labeling through Alignerr, leveraging subject matter experts for next-generation AI models Expert Marketplace: Connecting AI teams with highly skilled annotators and domain experts for flexible scaling Why Join Us High-Impact Environment: We operate like an early-stage startup, focusing on impact over process. You'll take on expanded responsibilities quickly, with career growth directly tied to your contributions. Technical Excellence: Work at the cutting edge of AI development, collaborating with industry leaders and shaping the future of artificial intelligence. Innovation at Speed: We celebrate those who take ownership, move fast, and deliver impact. Our environment rewards high agency and rapid execution. Continuous Growth: Every role requires continuous learning and evolution. You'll be surrounded by curious minds solving complex problems at the frontier of AI. Clear Ownership: You'll know exactly what you're responsible for and have the autonomy to execute. We empower people to drive results through clear ownership and metrics. Role Overview We are seeking a skilled and detail-oriented Data Operations Engineer to support our data annotation and data quality assurance processes. In this role, you will play a critical part in optimizing, maintaining, and scaling our data labeling workflows, primarily using Labelbox. You will ensure that labelers are able to efficiently and accurately generate human-labeled data by building tools, using LLM models, automating common project management tasks, and troubleshooting complex issues within the production pipeline. Your ability to script in Python and apply engineering problem-solving principles to data operations will be key to improving both efficiency and quality across our projects. Your Impact Build, deploy, and maintain Python automation scripts and other tools to streamline the data annotation process, automate repetitive tasks, and reduce manual effort. Identify bottlenecks in the data labeling pipeline and implement solutions to enhance throughput, accuracy, and scalability of labeling operations. Work closely with the Project Management team to ensure that data labeling meets accuracy standards and troubleshoot any issues related to data quality. Plan quality assurance workflows to use GenAI and open-source models to find data anomalies. Set up monitoring tools to track the performance of data annotation operations, reporting key metrics and areas for improvement to leadership. Integrate and manage third-party api tools with Labelbox, ensuring seamless operation and data flow across platforms. Ability to build and maintain internal tools with retool and similar tools. Provide ongoing technical support to the project managers and labelers, assisting with technical challenges in Labelbox and associated tools. What You Bring 3+ years of working experience in a technical role, interfacing with technical and non-technical teams, and writing Python scripts for data processing. 2+ years of experience using LLMs in prompting frameworks (e.g. LLM-as-a-judge). Some experience with machine learning models in scripts or data pipelines. Bachelor’s Degree in Engineering, Computer Science, or a technical field. Practical experience using LLMs or traditional models to assist annotation QA or generate/transform data. Proficiency in Python scripting and experience with automation of operational tasks. Experience with Labelbox or similar data annotation platforms. Strong analytical and problem-solving skills with a demonstrated ability to optimize processes. Experience with data pipelines, data analysis, and data workflow management. Familiarity with cloud platforms such as AWS, GCP, or Azure. English fluency. Knowledge of Statistical Analysis techniques to uncover bad patterns in human-labeled data. Nice to have Experience with Labelbox or similar data annotation platforms. Prior experience in a production or process engineering role, especially in data operations or similar environments. Understanding of project management methodologies and the ability to work collaboratively across teams. Alignerr Services at Labelbox As part of the Alignerr Services team, you'll lead implementation of customer projects and manage our elite network of AI experts who deliver high-quality human feedback crucial for AI advancement. Your team will oversee 250,000+ monthly hours of specialized work across RLHF, complex reasoning, and multimodal AI projects, resulting in quality improvements for Frontier AI Labs. You'll leverage our AI-powered talent acquisition system and exclusive access to 16M+ specialized professionals to rapidly build and deploy expert teams that help customers, which include the majority of leading AI labs and AI disruptors, achieve breakthrough AI capabilities through precisely aligned human data—directly contributing to the critical human element in advancing artificial intelligence.Labelbox strives to ensure pay parity across the organization and discuss compensation transparently. The expected annual base salary range for United States-based candidates is below. This range is not inclusive of any potential equity packages or additional benefits. Exact compensation varies based on a variety of factors, including skills and competencies, experience, and geographical location.Annual base salary range$70,000—$150,000 USDLife at Labelbox Location: Join our dedicated tech hubs in San Francisco or Wrocław, Poland Work Style: Hybrid model with 2 days per week in office, combining collaboration and flexibility Environment: Fast-paced and high-intensity, perfect for ambitious individuals who thrive on ownership and quick decision-making Growth: Career advancement opportunities directly tied to your impact Vision: Be part of building the foundation for humanity's most transformative technology Our Vision We believe data will remain crucial in achieving artificial general intelligence. As AI models become more sophisticated, the need for high-quality, specialized training data will only grow. Join us in developing new products and services that enable the next generation of AI breakthroughs. Labelbox is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. Our customers include Fortune 500 enterprises and leading AI labs. Your Personal Data Privacy: Any personal information you provide Labelbox as a part of your application will be processed in accordance with Labelbox’s Job Applicant Privacy notice. Any emails from Labelbox team members will originate from a @labelbox.com email address. If you encounter anything that raises suspicions during your interactions, we encourage you to exercise caution and suspend or discontinue communications.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
September 19, 2025
Member of Engineering (Human Data)
Poolside
201-500
-
No items found.
Full-time
Remote
true
ABOUT POOLSIDEIn this decade, the world will create artificial intelligence that reaches human level intelligence (and beyond) by combining learning and search. There will only be a small number of companies who will achieve this. Their ability to stack advantages and pull ahead will determine who survives and wins. These companies will move faster than anyone else. They will attract the world's most capable talent. They will be on the forefront of applied research and engineering at scale. They will create powerful economic engines. They will continue to scale their training to larger & more capable models. They will be given the right to raise large amounts of capital along their journey to enable this.poolside exists to be one of these companies - to build a world where AI will drive the majority of economically valuable work and scientific progress.We believe that software development will be the first major capability in neural networks that reaches human-level intelligence because it's the domain where we can combine Search and Learning approaches the best.At poolside we believe our applied research needs to culminate in products that are put in the hands of people. Today we focus on building for a developer-led increasingly AI-assisted world. We believe that current capabilities of AI lead to incredible tooling that can assist developers in their day to day work. We also believe that as we increase the capabilities of our models, we increasingly empower anyone in the world to be able to build software. We envision a future where not 100 million people can build software but 2 billion people can.View GDPR PolicyABOUT OUR TEAMWe are a remote-first team that sits across Europe and North America and comes together once a month in-person for 3 days and for longer offsites twice a year.Our R&D and production teams are a combination of more research and more engineering-oriented profiles, however, everyone deeply cares about the quality of the systems we build and has a strong underlying knowledge of software development. We believe that good engineering leads to faster development iterations, which allows us to compound our efforts.ABOUT THE ROLEAs a Member of Engineering (Human Data), you will lead the development and management of high-quality data labeling pipelines that support our large language models. This role involves building an internal labeling team, working closely with vendors, and designing scalable processes for data annotation.While the position does not include customer-facing responsibilities, your work will be critical to the success of our AI models, ensuring that they are trained on top-tier labeled data using crowdsourcing and other data collection techniques.YOUR MISSIONTo build and optimize scalable data labeling pipelines that power the success of our machine learning models.RESPONSIBILITIESDesign, develop, and implement scalable data labeling pipelines that integrate into model training workflows. Manage and expand the internal data labeling team to meet the company's growing needsCollaborate with external vendors to source and manage crowdsourced data labeling efforts, ensuring timely and high-quality deliveryMonitor and improve labeling processes by conducting experiments, ensuring data quality, and optimizing performance across labeling projectsSet up metrics and QA processes to evaluate the quality of labeled data and continuously improve outputWork cross-functionally with researchers and engineers to align labeling pipelines with model training needsIdentify new tools and technologies to streamline labeling processes and increase efficiencySKILLS & EXPERIENCEExperience with designing and managing data labeling processes, with a strong emphasis on crowdsourcing solutions2+ years of experience in a technical role such as Data Engineer, Data Scientist, Technical Project Manager, or similar, ideally in machine learning/data-focused environmentsFamiliarity with managing vendors and crowdsourcing platforms to handle large-scale data labeling effortsStrong understanding of data quality metrics such as accuracy, precision, recall, and F1 scoreProven ability to develop complex pipelines with multiple stages, particularly for data annotation and machine learning trainingExperience with cloud platforms and tools such as AWS, GCP, Kubernetes, and CI/CD systems is a plusAbility to collaborate with technical teams and ensure labeling processes align with overall model development needsMandatory experience with crowdsourcing platforms (e.g., ScaleAI, Toloka, or similar) for data labelingStrong problem-solving skills and ability to work independently in a fast-paced environmentPROCESSIntro call with Eiso, our CTO & Co-FounderTechnical Interview(s) with one of our Founding EngineersTeam fit call with the People teamFinal interview with Eiso, our CTO & Co-FounderBENEFITSFully remote work & flexible hours37 days/year of vacation & holidaysHealth insurance allowance for you and dependentsCompany-provided equipmentWellbeing, always-be-learning and home office allowancesFrequent team get togethersGreat diverse & inclusive people-first culture
Data Engineer
Data Science & Analytics
Data Scientist
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
September 11, 2025
Analytics Data Engineer
HappyRobot
51-100
USD
0
120000
-
220000
United States
Full-time
Remote
false
About HappyrobotHappyRobot is a platform to build and deploy AI workers that automate communication. See a demoOur AI workers connect to any system or data source to handle phone calls, email, messages…We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working with freight brokers, 3PLs, freight forwarders, shippers, warehouses, & other supply chain enterprises and tech startups.We’re thrilled to share that with our $44M Series B, HappyRobot has now raised a total of $62M — backed by leading investors who believe in our mission and vision for the future.We're looking for rockstars with a relentless drive, unstoppable energy, and a true passion for building something great—ready to embrace the challenge, push limits, and thrive in a fast-paced, high-intensity environment.About the RoleBuild foundational data products, dashboards and tools to enable self-serve analytics to scale across the company.Develop insightful and reliable dashboards to track performance of core metrics that will deliver insights to the whole company.Build and maintain robust data pipelines and models to ensure data quality.Partner with Product, Engineering, and Design teams to inform decisions.Translate complex data into clear, actionable insights for product teams.Love working with data. Love making product better. Love finding the story behind the numbers.ResponsibilitiesDefine, build, and maintain product metrics, dashboards, and pipelines.Write SQL and Python code to extract, transform, and analyze data.Design and run experiments (A/B tests) to support product development.Proactively explore data to identify product opportunities and insights.Collaborate with cross-functional teams to ensure data-driven decisions.Ensure data quality, reliability, and documentation across analytics efforts.Must Have3+ years of experience as an Analytics Data Engineer or similar Data Science & Analytics roles, preferably partnering with GTM and Product leads to build and report on key company-wide metrics.Strong SQL and data engineering skills to transform data into accurate, clean data models (e.g., dbt, Airflow, data warehouses).Advanced analytics experience: segmentation and cohort analysis.Proficiency in Python for data analysis and modeling.Excellent communication skills: able to explain complex data insights clearly.Curious, collaborative, and driven to make an impact in a fast-paced environment.Nice to HaveExperience in B2B SaaS or AI/ML products.Familiarity with product analytics tools (e.g., Mixpanel, Amplitude).Exposure to machine learning concepts or AI-powered systems.Why join us?Opportunity to work at a high-growth AI startup, backed by top investors.Fast Growth - Backed by a16z and YC, on track for double-digit ARR.Ownership & Autonomy - Take full ownership of projects and ship fast.Top-Tier Compensation - Competitive salary + equity in a high-growth startup.Comprehensive Benefits - Healthcare, dental, vision coverage.Work With the Best - Join a world-class team of engineers and builders.Our Operating Principles
Extreme Ownership We take full responsibility for our work, outcomes, and team success. No excuses, no blame-shifting — if something needs fixing, we own it and make it better. This means stepping up, even when it’s not “your job.” If a ball is dropped, we pick it up. If a customer is unhappy, we fix it. If a process is broken, we redesign it. We don’t wait for someone else to solve it — we lead with accountability and expect the same from those around us. Craftsmanship Putting care and intention into every task, striving for excellence, and taking deep ownership of the quality and outcome of your work. Craftsmanship means never settling for “just fine.” We sweat the details because details compound. Whether it’s a product feature, an internal doc, or a sales call — we treat it as a reflection of our standards. We aim to deliver jaw-dropping customer experiences by being curious, meticulous, and proud of what we build — even when nobody’s watching. We are “majos”
Be friendly & have fun with your coworkers. Always be genuine & honest, but kind. “Majo” is our way of saying: be a good human. Be approachable, helpful, and warm. We’re building something ambitious, and it’s easier (and more fun) when we enjoy the ride together. We give feedback with kindness, challenge each other with respect, and celebrate wins together without ego. Urgency with Focus
Create the highest impact in the shortest amount of time. Move fast, but in the right direction. We operate with speed because time is our most limited resource. But speed without focus is chaos. We prioritize ruthlessly, act decisively, and stay aligned. We aim for high leverage: the biggest results from the simplest, smartest actions. We’re running a high-speed marathon — not a sprint with no strategy. Talent Density and Meritocracy
Hire only people who can raise the average; ‘exceptional performance is the passing grade.’ Ability trumps seniority. We believe the best teams are built on talent density — every hire should raise the bar. We reward contribution, not titles or tenure. We give ownership to those who earn it, and we all hold each other to a high standard. A-players want to work with other A-players — that’s how we win. First-Principles Thinking
Strip a problem to physics-level facts, ignore industry dogma, rebuild the solution from scratch. We don’t copy-paste solutions. We go back to basics, ask why things are the way they are, and rebuild from the ground up if needed. This mindset pushes us to innovate, challenge stale assumptions, and move faster than incumbents. It’s how we build what others think is impossible.The personal data provided in your application and during the selection process will be processed by Happyrobot, Inc., acting as Data Controller.By sending us your CV, you consent to the processing of your personal data for the purpose of evaluating and selecting you as a candidate for the position. Your personal data will be treated confidentially and will only be used for the recruitment process of the selected job offer.In relation to the period of conservation of your personal data, these will be eliminated after three months of inactivity in compliance with the GDPR and legislation on the protection of personal data.If you wish to exercise your rights of access, rectification, deletion, portability or opposition in relation to your personal data, you can do so through security@happyrobot.ai subject to the GDPR.For more information, visit https://www.happyrobot.ai/privacy-policyBy submitting your request, you confirm that you have read and understood this clause and that you agree to the processing of your personal data as described.
Data Engineer
Data Science & Analytics
Apply
September 6, 2025
Big Data Architect
Databricks
5000+
0
0
-
0
Germany
Remote
false
CSQ426R218 We have 5 open positions based in our Germany offices. As a Big Data Solutions Architect (Resident Solutions Architect) in our Professional Services team you will work with clients on short to medium term customer engagements on their big data challenges using the Databricks Data Intelligence Platform. You will provide data engineering, data science, and cloud technology projects which require integrating with client systems, training, and other technical tasks to help customers to get most value out of their data. RSAs are billable and know how to complete projects according to specification with excellent customer service. You will report to the regional Manager/Lead. The impact you will have: You will work on a variety of impactful customer technical projects which may include designing and building reference architectures, creating how-to's and productionalizing customer use cases Work with engagement managers to scope variety of professional services work with input from the customer Guide strategic customers as they implement transformational big data projects, 3rd party migrations, including end-to-end design, build and deployment of industry-leading big data and AI applications Consult on architecture and design; bootstrap or implement customer projects which leads to a customers' successful understanding, evaluation and adoption of Databricks. Provide an escalated level of support for customer operational issues. You will work with the Databricks technical team, Project Manager, Architect and Customer team to ensure the technical components of the engagement are delivered to meet customer's needs. Work with Engineering and Databricks Customer Support to provide product and implementation feedback and to guide rapid resolution for engagement specific product and support issues. What we look for: Proficient in data engineering, data platforms, and analytics with a strong track record of successful projects and in-depth knowledge of industry best practices Comfortable writing code in either Python or Scala Enterprise Data Warehousing experience (Teradata / Synapse/ Snowflake or SAP) Working knowledge of two or more common Cloud ecosystems (AWS, Azure, GCP) with expertise in at least one Deep experience with distributed computing with Apache Spark™ and knowledge of Spark runtime internals Familiarity with CI/CD for production deployments Working knowledge of MLOps Design and deployment of performant end-to-end data architectures Experience with technical project delivery - managing scope and timelines. Documentation and white-boarding skills. Experience working with clients and managing conflicts. Build skills in technical areas which support the deployment and integration of Databricks-based solutions to complete customer projects. Travel is required up to 10%, more at peak times. Databricks Certification About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on Twitter, LinkedIn and Facebook.
Benefits
At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.
Our Commitment to Diversity and Inclusion At Databricks, we are committed to fostering a diverse and inclusive culture where everyone can excel. We take great care to ensure that our hiring practices are inclusive and meet equal employment opportunity standards. Individuals looking for employment at Databricks are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical and mental ability, political affiliation, race, religion, sexual orientation, socio-economic status, veteran status, and other protected characteristics. Compliance If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Data Engineer
Data Science & Analytics
Solutions Architect
Software Engineering
Apply
September 1, 2025
Finance Data Engineer
Synthesia
501-1000
-
United Kingdom
Full-time
Remote
true
Welcome to the video first world From your everyday PowerPoint presentations to Hollywood movies, AI will transform the way we create and consume content. Today, people want to watch and listen, not read — both at home and at work. If you’re reading this and nodding, check out our brand video. Despite the clear preference for video, communication and knowledge sharing in the business environment are still dominated by text, largely because high-quality video production remains complex and challenging to scale—until now…. Meet Synthesia
We're on a mission to make video easy for everyone. Born in an AI lab, our AI video communications platform simplifies the entire video production process, making it easy for everyone, regardless of skill level, to create, collaborate, and share high-quality videos. Whether it's for delivering essential training to employees and customers or marketing products and services, Synthesia enables large organizations to communicate and share knowledge through video quickly and efficiently. We’re trusted by leading brands such as Heineken, Zoom, Xerox, McDonald’s and more. Read stories from happy customers and what 1,200+ people say on G2. In 2023, we were one of 7 European companies to reach unicorn status. In February 2024, G2 named us as the fastest growing company in the world. We’ve raised over $150M in funding from top-tier investors, including Accel, Nvidia, Kleiner Perkins, Google and top founders and operators including Stripe, Datadog, Miro, Webflow, and Facebook. About the role… Synthesia is building a modern, AI-powered back office. We are hiring a full-stack engineer to design and run the data layer that powers Finance ensuring data from finance systems land securely in a data warehouse and is easily consumed via Copilot (LLM) and Omni dashboards. You’ll be a cornerstone in our shift to modern finance workflows. Key Responsibilities: Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and securely store data from NetSuite and other finance systems into Snowflake, ensuring data integrity, compliance, and security best practices (e.g., encryption, access controls, and auditing). Collaborate with finance and data teams to define data models, schemas, and governance policies that support modern finance workflows, including automated reporting, forecasting, and anomaly detection. Implement data retrieval mechanisms optimized for LLM-based querying via Co-pilot and/or similar tools, enabling natural language access to financial data while maintaining accuracy and contextual relevance. Build and optimize interactive dashboards in Omni for real-time visualization and analysis of key metrics, such as financial performance and operational KPIs. Monitor and troubleshoot data pipelines, performing root-cause analysis on issues related to data quality, latency, or availability, and implementing proactive solutions to ensure high reliability. Document processes, architectures, and best practices to facilitate knowledge sharing and scalability within the team. Qualifications: Bachelor's or Master's degree in Computer Science, Finance, Information Systems, or a related field. 5+ years of experience as a data engineer or analytics engineer, with a proven track record in full stack data development (from ingestion to visualization). Strong expertise in Snowflake, including data modeling, warehousing, and performance optimization. Hands-on experience with ETL tools (e.g., Apache Airflow, dbt, Fivetran) and integrating data from ERP systems like NetSuite. Proficiency in SQL, Python, and/or other scripting languages for data processing and automation. Familiarity with LLM integrations (e.g., for natural language querying) and dashboarding tools like Omni or similar (e.g., Tableau, Looker). Solid understanding of data security principles, including GDPR/CCPA compliance, role-based access, and encryption in cloud environments. Excellent problem-solving skills, with the ability to work cross-functionally in agile teams. At Synthesia we expect everyone to... Put the Customer First Own it & Go Direct Be Fast & Experimental Make the Journey Fun You can read more about this in our public Notion page. Location: LON or UK Remote UK Benefits 📍A hybrid, flexible approach to work where you have access to a lovely office space in Oxford Circus with free lunches on a Wednesday and Friday 💸 A competitive salary + stock options 🏝 25 days of annual leave + public holidays 🏥 Private healthcare through AXA ❣️ Pension contribution - Synthesia contributes 3% and employees contribute 5% on qualifying earnings 🍼 Paid parental leave entitling primary caregivers to 16 weeks of full pay, and secondary 5 weeks of full pay 👉 You can participate in a generous recruitment referral scheme if you help us to hire 💻 The equipment you need to be successful in your role
Data Engineer
Data Science & Analytics
Apply
August 27, 2025
Data Engineer
Sierra
201-500
USD
0
220000
-
340000
United States
Full-time
Remote
false
About usAt Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco, with growing offices in Atlanta, New York, and London.We are guided by a set of values that are at the core of our actions and define our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These values are the foundation of our work, and we are committed to upholding them in everything we do.Our co-founders are Bret Taylor and Clay Bavor. Bret currently serves as Board Chair of OpenAI. Previously, he was co-CEO of Salesforce (which had acquired the company he founded, Quip) and CTO of Facebook. Bret was also one of Google's earliest product managers and co-creator of Google Maps. Before founding Sierra, Clay spent 18 years at Google, where he most recently led Google Labs. Earlier, he started and led Google’s AR/VR effort, Project Starline, and Google Lens. Before that, Clay led the product and design teams for Google Workspace. What you'll doSierra is in the process of building out its core data foundations, and you’ll play a pivotal role in shaping the company’s data strategy and infrastructure. Partnering with engineering, product, and GTM teams, you’ll design and operate scalable batch and real-time data systems, create trusted data models, and build the pipelines that power experimentation, analytics, and AI development.You’ll ensure that every area of the business—from customer experience to go-to-market execution—has access to high-quality, reliable data to drive insight and innovation. Beyond infrastructure, you’ll influence how data is captured, governed, and leveraged across Sierra, empowering decision-making at scale.This is a unique opportunity to establish the foundations of Sierra’s data ecosystem, drive standards for reliability and trust, and enable the next generation of AI-powered customer interactions.What you'll bringProven Experience: Extensive experience in data engineering, with a track record of designing and operating data pipelines, systems, and models at scale.Curiosity & Customer Obsession: Passion for building trustworthy data systems that empower teams to better understand users and deliver impactful product experiences.Adaptability and Resilience: Comfort working in a fast-paced startup environment, able to adapt to evolving priorities and deliver reliable solutions amidst ambiguity.Technical Proficiency: Strong proficiency in SQL and Python, with expertise in distributed data processing frameworks (e.g., Spark, Flink, Kafka) and cloud-based platforms (AWS, GCP).Data Architecture Skills: Deep experience with data modeling, warehousing, and designing schemas optimized for analytics, experimentation, and AI/ML workloads.Data Quality & Governance: Strong understanding of data validation, monitoring, compliance, and best practices for ensuring data integrity across pipelines.Excellent Communication: Ability to translate technical infrastructure and data design trade-offs into clear recommendations for product, engineering, and business stakeholders.Great Collaboration: Proven ability to partner closely with product, ML, analytics, and GTM teams to deliver data foundations that unlock business and product innovation.Even better...Experience with (open to equivalents) - AWS Glue, Athena, Kafka, Flink/Spark, dbt, Airflow/Dagster, Terraform.Experience working with large language models (LLMs), conversational AI, or agent-based systems.Familiarity with building or improving data platforms for advanced analytics.Our valuesTrust: We build trust with our customers with our accountability, empathy, quality, and responsiveness. We build trust in AI by making it more accessible, safe, and useful. We build trust with each other by showing up for each other professionally and personally, creating an environment that enables all of us to do our best work.Customer Obsession: We deeply understand our customers’ business goals and relentlessly focus on driving outcomes, not just technical milestones. Everyone at the company knows and spends time with our customers. When our customer is having an issue, we drop everything and fix it.Craftsmanship: We get the details right, from the words on the page to the system architecture. We have good taste. When we notice something isn’t right, we take the time to fix it. We are proud of the products we produce. We continuously self-reflect to continuously self-improve.Intensity: We know we don’t have the luxury of patience. We play to win. We care about our product being the best, and when it isn’t, we fix it. When we fail, we talk about it openly and without blame so we succeed the next time.Family: We know that balance and intensity are compatible, and we model it in our actions and processes. We are the best technology company for parents. We support and respect each other and celebrate each other’s personal and professional achievements.What we offerWe want our benefits to reflect our values and offer the following to full-time employees:Flexible (Unlimited) Paid Time OffMedical, Dental, and Vision benefits for you and your familyLife Insurance and Disability BenefitsRetirement Plan (e.g., 401K, pension) with Sierra matchParental LeaveFertility and family building benefits through CarrotLunch, as well as delicious snacks and coffee to keep you energized Discretionary Benefit Stipend giving people the ability to spend where it matters mostFree alphorn lessonsThese benefits are further detailed in Sierra's policies and are subject to change at any time, consistent with the terms of any applicable compensation or benefits plans. Eligible full-time employees can participate in Sierra's equity plans subject to the terms of the applicable plans and policies.Be you, with usWe're working to bring the transformative power of AI to every organization in the world. To do so, it is important to us that the diversity of our employees represents the diversity of our customers. We believe that our work and culture are better when we encourage, support, and respect different skills and experiences represented within our team. We encourage you to apply even if your experience doesn't precisely match the job description. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, national origin, age, disability, veteran status, pregnancy, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.
Data Engineer
Data Science & Analytics
Apply
August 21, 2025
Senior Data Engineer
Maincode
11-50
AUD
0
150000
-
180000
Australia
Full-time
Remote
false
OverviewMaincode is building sovereign AI models in Australia. We are training foundation models from scratch, designing new reasoning architectures, and deploying them on state-of-the-art GPU clusters. Our models are built on datasets we create ourselves, curated, cleaned, and engineered for performance at scale. This is not buying off-the-shelf corpora or scraping without thought. This is building world-class datasets from the ground up.As a Senior Data Engineer, you will lead the design and construction of these datasets. You will work hands-on to source, clean, transform, and structure massive amounts of raw data into training-ready form. You will design the architecture that powers data ingestion, validation, and storage for multi-terabyte to petabyte-scale AI training. You will collaborate with AI Researchers and Engineers to ensure every byte is high quality, relevant, and optimised for training cutting-edge large language models and other architectures.This is a deep technical role. You will be writing code, building pipelines, defining schemas, and debugging unusual data edge cases at scale. You will think like both a data scientist and a systems engineer, designing for correctness, scalability, and future proofing. If you want to build the datasets that power sovereign AI from first principles, this is your team.
What you’ll doDesign and build large-scale data ingestion and curation pipelines for AI training datasetsSource, filter, and process diverse data types including text, structured data, code, and multimodal, from raw form to model-ready formatImplement robust quality control and validation systems to ensure dataset integrity, relevance, and ethical complianceArchitect storage and retrieval systems optimised for distributed training at scaleBuild tooling to track dataset lineage, reproducibility, and metadata at all stages of the pipelineWork closely with AI Researchers to align datasets with evolving model architectures and training objectivesCollaborate with DevOps and ML engineers to integrate data systems into large-scale training workflowsContinuously improve ingestion speed, preprocessing efficiency, and data freshness for iterative training cycles
Who you arePassionate about building world-class datasets for AI training from raw source to training-readyExperienced in Python and data engineering frameworks such as Apache Spark, Ray, or DaskSkilled in working with distributed data storage and processing systems such as S3, HDFS, or cloud object storageStrong understanding of data quality, validation, and reproducibility in large-scale ML workflowsFamiliar with ML frameworks like PyTorch or JAX, and how data pipelines interact with themComfortable working with multi-terabyte or larger datasetsHands-on and pragmatic, you like solving real data problems with code and automationMotivated to help build sovereign AI capability in Australia
Why MaincodeWe are a small team building some of the most advanced AI systems in Australia. We create new foundation models from scratch, not just fine-tune existing ones, and we build the datasets they run on from the ground up.We operate our own GPU clusters, run large-scale training, and integrate research and engineering closely to push the frontier of what is possible.You will be surrounded by people who:Care deeply about data quality and architecture, not just volumeBuild systems that scale reliably and repeatablyTake pride in learning, experimenting, and shippingWant to help Australia build independent, world-class AI systems
Data Engineer
Data Science & Analytics
Apply
August 14, 2025
Data Engineer
Krea
51-100
-
United States
Full-time
Remote
false
About KreaAt Krea, we are building next-generation AI creative tools.We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it.We believe AI is a new medium that allows us to express ourselves through various formats—text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium. This jobData is one of the fundamental pieces of Krea. Huge amounts of data power our AI training pipelines, our analytics and observability, and many of the core systems that make Krea tick.As a data engineer, you will…… build distributed systems to process gigantic (petabytes) amounts of files of all kinds (images, video, and even 3D data). You should feel comfortable solving scaling problems as you go.… work closely with our research team to build ML pipelines and deploy models to make sense of raw data.… play with massive amounts of compute on huge kubernetes GPU clusters - our main GPU cluster takes up an entire datacenter from our provider.… learn machine learning engineering (ML experience is a bonus, but you can also learn it on the job) from world-class researchers on a small yet highly effective tight-knit team.Example projectsFind clean scenes in millions of videos, running distributed data pipelines that detect shot boundaries and saving timestamps of clips.Solve orchestration and scaling issues with a large-scale distributed GPU job processing system on kubernetess.Build systems to deploy and combine different LLMs to caption massive amounts of multimedia data in a variety of different ways.Design multi-stage pipelines to turn petabytes of raw data into clean downstream datasets, with metadata, annotations, and filters.Strong candidates may have experience with…Python, PyArrow, DuckDB, SQL, massive relational databases, PyTorch, Pandas, NumPy…KubernetesDesigning and implementing large-scale ETL systemsFundamental knowledge of containerization, operating systems, file-systems, and networking.Distributed systems designAbout usWe’re building AI creative tooling.We’ve raised over $83M from the best investors in Silicon Valley.We’re a team of 12 with millions of active users scaling aggressively.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
July 31, 2025
Staff Data Engineer
Thoughtful
101-200
USD
0
190000
-
250000
United States
Full-time
Remote
true
Join Our Mission to Revolutionize Healthcare Thoughtful is pioneering a new approach to automation for all healthcare providers! Our AI-powered Revenue Cycle Automation platform enables the healthcare industry to automate and improve its core business operations. We're looking for Staff Data Engineers to help scale and strengthen our data platform. Our data stack today consists of Aurora RDS, AWS Glue, Apache Iceberg, S3 (Parquet), Spark and Athena - supporting a range of use cases from operational reporting to downstream services. We’re looking to grow the team with engineers who can help improve performance, increase reliability, and expand the platform's capabilities as our data volume and complexity continue to grow. You’ll work closely with other engineers to evolve our existing pipelines, improve observability and data quality, and enable faster, more flexible access to data across the company. The platform is deployed on AWS using OpenTofu, and we’re looking for engineers who bring strong cloud infrastructure fundamentals alongside deep experience in data engineering. Your Role: Build: Develop and maintain data pipelines and transformations across the stack. Starting from ingesting transactional data into the data lakehouse to refining data up the medallion data architecture. Optimize: Tune performance, storage layout, and cost-efficiency across our data storage and query engines. Extend: Help design and implement new data ingestion patterns and improve platform observability and reliability. Collaborate: Partner with engineering, product, and operations teams to deliver well-structured, trustworthy data for diverse use cases. Contribute: Help establish and evolve best practices for our data infrastructure, from pipeline design to OpenTofu-managed resource provisioning. Secure: Help design and implement a data governance strategy to secure our data lakehouse. Your Qualifications: 8-10+ years of experience building and maintaining data pipelines in production environments Strong knowledge of the data lakehouse ecosystem, with an emphasis on AWS data services - particularly Glue, S3, Athena/Trino/PrestoDB, and Aurora Proficiency in Python, Spark and Athena/Trino/PrestoDB for data transformation and orchestration Experience managing infrastructure with OpenTofu/Terraform or other Infrastructure-as-Code tools Solid understanding of data modeling, partitioning strategies, schema evolution, and performance tuning Comfortable working with cloud-native data pipelines and batch processing (streaming experience is a plus but not required) What Sets You Apart: Systems thinker - you understand the tradeoffs in data architecture and design for long-term stability and clarity Outcome-driven - you focus on building useful, maintainable systems that serve real business needs Strong collaborator - you're comfortable working across teams and surfacing data requirements early Practical and hands-on - able to dive into logs, schemas, and IAM policies when needed Thoughtful contributor - committed to improving code quality, developer experience, and documentation across the board Why Thoughtful? Competitive compensation Equity participation: Employee Stock Options. Health benefits: Comprehensive medical, dental, and vision insurance. Time off: Generous leave policies and paid company holidays. California Salary Range $190,000—$250,000 USD
Data Engineer
Data Science & Analytics
Apply
July 28, 2025
Data Center Research & Development Engineer - Stargate
OpenAI
5000+
USD
240000
-
400000
United States
Full-time
Remote
false
About the Team:OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. The Data Center Engineering team is at the core of this mission. This team sets the infrastructure strategy, develops cutting-edge engineering solutions, partners with research teams to define the infrastructure performance requirements, and creates reference designs to enable rapid global expansion in collaboration with our partners. As a key member of this team, you will help design and deliver next-generation power, cooling, and hardware solutions for high-density rack deployments in some of the largest data centers in the world. You will work closely with stakeholders across research, site selection, design, construction, commissioning, hardware engineering, deployment, operations, and global partners to bring OpenAI’s infrastructure vision to life.About the Role:We’re seeking a seasoned data center R&D engineer with extensive experience in designing, performing validation testing, commissioning, and operating large-scale power, cooling, and high-performance computing systems. This role focuses on developing and validating new infrastructure and hardware, including high-voltage rectifiers, UPS systems, battery storage, transformers, DC to DC converters, and power supplies. The role will lead the design and buildout of a hardware validation lab, create detailed models and test procedures, and ensure hardware compatibility across edge-case data center operating conditions. Additionally, the role involves working closely with hardware vendors to assess manufacturing test protocols, throughput, and liquid-cooled GPU rack performance. A strong foundation in technical design, operational leadership, and vendor collaboration is critical, with an opportunity to lead high-impact infrastructure programs.You Might Thrive in this Role:Oversee electrical, mechanical, controls, and telemetry design and operations for large-scale data centers, including review of building and MEP drawings across all project phases from concept design to permitting, construction, commissioning and production.Develop, test and implement operational procedures and workflows from design through commissioning and deployment.Perform validation testing of all critical equipment and hardware in partnership with equipment vendors and ODMs.Lead buildout of R&D lab, including equipment selection, test infrastructure for high-density liquid-cooled racks, and staffing plans.Select and manage engineering tools (CAD, CFD, PLM, PDM, electrical, mechanical, power/network management).Collaborate with external vendors to select, procure, and manage critical infrastructure equipment (e.g., UPS, generators, transformers, DC to DC converters, chillers, VFDs).Ensure seamless integration of power, cooling, controls, networking, and construction systems into facility design.Provide technical direction to teams and vendors, ensuring safety, quality, and compliance with local codes, standards and regulations.Manage vendor relationships and ensure adherence to safety, performance, and operational standards.Qualifications:20+ years of experience in data center design, operations, and critical systems maintenance.Proven leadership across design, commissioning, and operation of large-scale data center campuses.Deep expertise in infrastructure systems (power, cooling, controls, networking) and operational workflows.Hands-on experience with critical infrastructure equipment and testing protocols.Strong track record in lab development, equipment selection, and facility operations.Familiarity with engineering tools (CAD, CFD, PLM, etc.) and their integration across teams.Experience navigating regulatory environments and working with government agencies.Excellent cross-functional communication and stakeholder collaboration.Bachelor’s degree in engineering required; advanced degree and PE certification preferred.Preferred Skills:Expertise in equipment design, agency certification, and validation testing.Experience in global, matrixed organizations and multi-site operations.Skilled in vendor negotiations and supply chain management.Familiarity with sustainable and energy-efficient data center design principles.About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Data Engineer
Data Science & Analytics
Apply
July 17, 2025
Staff Data Engineer
Glean Work
1001-5000
-
India
Full-time
Remote
false
About Glean: Founded in 2019, Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams. By integrating seamlessly with tools like Google Drive, Slack, and Microsoft Teams, Glean ensures employees can access the right knowledge at the right time, boosting productivity and collaboration. The company’s cutting-edge AI technology simplifies knowledge discovery, making it faster and more efficient for teams to leverage their collective intelligence. Glean was born from Founder & CEO Arvind Jain’s deep understanding of the challenges employees face in finding and understanding information at work. Seeing firsthand how fragmented knowledge and sprawling SaaS tools made it difficult to stay productive, he set out to build a better way - an AI-powered enterprise search platform that helps people quickly and intuitively access the information they need. Since then, Glean has evolved into the leading Work AI platform, combining enterprise-grade search, an AI assistant, and powerful application- and agent-building capabilities to fundamentally redefine how employees work.About the Role: Glean is building a world-class Data Organization composed of data science, applied science, data engineering and business intelligence groups. Our data engineering group is based in our Bangalore, India office. In this role, you will work on customer-facing and Glean employee-facing analytics initiatives: Customer-facing analytics initiatives: Customers rely on in-product dashboards and if they have the willingness and resources, self-serve data analytics to understand how Glean’s being used at their company in order to get a better sense of Glean’s ROI and partner with Glean to increase user adoption. You’re expected to partner with backend and data science to maintain and improve the data platform behind these operations reflect usage on new features reflect changes on the underlying product usage logs on existing features identify and close data quality issues, e.g. gaps with internal tracking, and backfill the changes triage issues customers report to us within appropriate SLAs help close customer-facing technical documentation gaps You will: Help improve the availability of high-value upstream raw data by channeling inputs from data science and business intelligence to identify biggest gaps in data foundations partnering with Go-to-Market & Finance operations groups to create streamlined data management processes in enterprise apps like Salesforce, Marketo and various accounting software partnering with Product Engineering teams as they craft product logging initiatives & processes Architect and implement key tables that transform structured and unstructured data into usable models by the data, operations, and engineering orgs. Ensure and maintain the quality and availability of internally used tables within reasonable SLAs Own and improve the reliability, efficiency and scalability of ETL tooling, including but not limited to dbt, BigQuery, Sigma. This includes identifying implementing and disseminating best practices as well. About you: You have 9+ yrs of work experience in software/data engineering (former is strongly preferred) as a bachelor degree holder. This requirement is 7+ for masters degree holders and 5+ for PhD Degree holders. You’ve served as a tech lead and mentored several data engineers before. Customer-facing analytics initiatives: You have experience in architecting, implementing and maintaining robust data platform solutions for external-facing data products. You have experience with implementing and maintaining large-scale data processing tools like Beam and Spark. You have experience working with stakeholders and peers from different time zones and roles, e.g. ENG, PM, data science, GTM, often as the main data engineering point of contact. Internal-facing analytics initiatives: You have experience in full-cycle data warehousing projects, including requirements analysis, proof-of-concepts, design, development, testing, and implementation You have experience in database designing, architecture, and cost-efficient scaling You have experience with cloud-based data tools like BigQuery and dbt You have experience with data pipelining tools like Airbyte, Apache, Stitch, Hevo Data, and Fivetran General qualifications: You have a high degree of proficiency with SQL and are able to set best practices and up-level our growing SQL user base within the organization You are proficient in at least one of Python, Java and Golang. You are familiar with cloud computing services like GCP and/or AWS. You are concise and precise in written and verbal communication. Technical documentation is your strong suit. You are a particularly good fit if: You have 1+ years of tech lead management experience. Note this is distinct from having a tech lead experience, and involves formally managing others. You have experience working with customers directly in a B2B setting. You have experience with Salesforce, Marketo, and Google Analytics. You have experience in distributed data processing & storage, e.g. HDFS Location: This role is hybrid (3 days a week in our Bangalore office) We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.
Data Engineer
Data Science & Analytics
Apply
July 14, 2025
Senior Analytics Engineer
Harvey
501-1000
USD
0
170000
-
200000
United States
Full-time
Remote
false
Why HarveyHarvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with reasoning-adept LLMs that have been customized and developed by our expert team of lawyers, engineers and research scientists. We’ve found product market fit and are scaling our team very quickly. Some reasons to join Harvey are:Exceptional product market fit: We have partnered with the largest law firms and professional service providers in the world, including Paul Weiss, A&O Shearman, Ashurst, O'Melveny & Myers, PwC, KKR, and many others.Strategic investors: Raised over $500 million from strategic investors including Sequoia, Google Ventures, Kleiner Perkins, and OpenAI.World-class team: Harvey is hiring the best talent from DeepMind, Google Brain, Stripe, FAIR, Tesla Autopilot, Glean, Superhuman, Figma, and more.Partnerships: Our engineers and researchers work directly with OpenAI to build the future of generative AI and redefine professional services.Performance: 4x ARR in 2024.Competitive compensation.Role OverviewWe’re looking for a versatile Senior Analytics Engineer to architect the data backbone that powers decision-making at Harvey. With product-market fit already proven and demand surging across diverse customer segments, you’ll design clean, reliable pipelines and semantic data models that turn raw events into immediately usable insights. As the first Analytics Engineer on our team, you’ll choose and implement the right data stack, champion best practices in testing and documentation, and collaborate closely with product, GTM, and leadership to ensure every team can answer its own questions with confidence. If you combine engineering rigor with a love of storytelling through data—and want to shape analytics from the ground up—we’d love to meet you.What You’ll DoDesign and build scalable data models and pipelines using dbt to transform raw data into clean, reliable assets that power company-wide analytics and decision-making.Define and implement a robust semantic layer (e.g. LookML/Omni) that standardizes key business metrics, dimensions, and data products, ensuring self-serve capabilities for stakeholders across teams.Partner cross-functionally with Product, GTM, Finance, and the Exec Team to deliver intuitive, consistent dashboards and analytical tools that surface real-time business health metrics.Establish and champion data modeling standards and best practices, guiding the organization in how to model data for accuracy, usability, and long-term maintainability.Collaborate with engineering to make key decisions on data architecture, co-design data schemas, and implement orchestration strategies that ensure reliability and performance of the data warehouse.Lead data governance initiatives, ensuring high standards of data quality, consistency, documentation, and access control across the analytics ecosystem.Empower stakeholders with data by making analytical assets easily discoverable, reliable, and well-documented—turning complex datasets into actionable insights for the business.
What You Have5+ years of experience in Analytics Engineering, Data Engineering, Data Science, or similar fieldDeep expertise in SQL, dbt, Python, and modern BI/semantic layer tools like Looker or Omni.Skilled at defining core business and product metrics, uncovering insights, and resolving data inconsistencies across complex systems.Strong familiarity with version control (GitHub), CI/CD, and modern development workflows.Bias for action — you prefer launching usable, iterative data models that deliver immediate value over waiting for perfect solutions.Strong communicator who can build trusted partnerships across Product, GTM, Finance, and Exec stakeholders.Comfortable working through ambiguity in fast-moving, cross-functional environments.Balances big-picture thinking with precision in execution — knowing when to sweat the details and when to move quickly.Experience operating in a B2B or commercial setting, with an understanding of customer lifecycle and revenue-driving metrics.BonusEarly employee at a hyper-growth startupExperience with or knowledge of AI and LLMsData Engineering ExperienceExperience managing data warehouse (preferably Snowflake)Experience at world-class enterprise orgs (ex: Brex, Ramp, Stripe, Palantir)Compensation Range $170,000 - $200,000 USDPlease find our CA applicant privacy notice here.Harvey is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made by emailing interview-help@harvey.ai.
Data Engineer
Data Science & Analytics
Data Scientist
Data Science & Analytics
Apply
July 3, 2025
Data Infrastructure Engineer
HeyGen
201-500
-
United States
Canada
Full-time
Remote
false
About HeyGen At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
Learn more at www.heygen.com. Visit our Mission and Culture doc here. Position Summary: At HeyGen, we are at the forefront of developing applications powered by our cutting-edge AI research. As a Data Infrastructure Engineer, you will lead the development of fundamental data systems and infrastructure. These systems are essential for powering our innovative applications, including Avatar IV, Photo Avatar, Instant Avatar, Interactive Avatar, and Video Translation. Your role will be crucial in enhancing the efficiency and scalability of these systems, which are vital to HeyGen's success. Key Responsibilities: Design, build, and maintain the data infrastructure and systems needed to support our AI applications. Examples include Large scale data acquisition Multi-modal data processing framework and applications Storage and computation efficiency AI model evaluation and productionization infrastructure Collaborate with data scientists and machine learning engineers to understand their computational and data needs and provide efficient solutions. Stay up-to-date with the latest industry trends in data infrastructure technologies and advocate for best practices and continuous improvement. Assist in budget planning and management of cloud resources and other infrastructure expenses. Qualifications: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field Proven experience in managing infrastructure for large-scale AI or machine learning projects Excellent problem-solving skills and the ability to work independently or as part of a team. Proficiency in Python Experience optimizing computational workflows Familiarity with AI and machine learning frameworks like TensorFlow or PyTorch. Preferred Qualifications: Experience with GPU computing Experience with distributed data processing system Experience building large scale batch inference system Prior experience in a startup or fast-paced tech environment. What HeyGen Offers Competitive salary and benefits package. Dynamic and inclusive work environment. Opportunities for professional growth and advancement. Collaborative culture that values innovation and creativity. Access to the latest technologies and tools. HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
July 2, 2025
Member of Technical Staff, Multilingual
Cohere
501-1000
0
0
-
0
United Kingdom
Canada
United States
Full-time
Remote
true
Who are we?Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.Join us on our mission and shape the future!Why this role?
As a Member of Technical Staff in the Multilingual team, you will play a crucial role in developing and enhancing our language models to support a wide range of languages. Your primary focus will be on data engineering tasks, including data collection, cleaning, and preparation, to ensure our models perform exceptionally across various linguistic contexts. You will work closely with our research scientists, machine learning engineers, and product teams to deliver high-quality, multilingual AI solutions.This role…Is perfect for someone passionate about languages and AI, with a keen eye for detail and a talent for coding.Offers the opportunity to contribute to cutting-edge language technology, making a global impact.Requires a self-starter who can work independently and deliver results efficiently.Please Note: We have offices in London, Toronto, and New York, but we also embrace being remote-friendly! Applicants for this role may work anywhere between UTC−06:00 and UTC+01:00.As a Member of Technical Staff for Multilingual you will:Design and implement data pipelines to process and prepare multilingual datasets.Collaborate with researchers to understand data requirements and model performance.Develop tools and scripts to automate data-related tasks and improve efficiency.Ensure data quality and integrity through rigorous testing and validation.Stay updated on the latest advancements in multilingual data processing and contribute to the team's knowledge base.You may be a good fit if you have:A Bachelor's degree in Computer Science, Data Science, or a related field (Master's preferred), or equivalent experience.Experience analyzing datasets with respect to quality and suitability for training ML models.Strong software engineering skills, particularly in Python.A strong understanding of data structures, algorithms, and software design principles.Excellent problem-solving skills and the ability to work in a fast-paced environment.Excellent communication skills to collaborate effectively with cross-functional teams and present findings.Prior experience with multilingual data and a passion for natural language processing is a plus.Bonus: One or more first-author papers at top-tier venues (such as NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! If you want to work really hard on a glorious mission with teammates that want the same thing, Cohere is the place for you.We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.Full-Time Employees at Cohere enjoy these Perks:🤝 An open and inclusive culture and work environment 🧑💻 Work closely with a team on the cutting edge of AI research 🍽 Weekly lunch stipend, in-office lunches & snacks🦷 Full health and dental benefits, including a separate budget to take care of your mental health 🐣 100% Parental Leave top-up for 6 months for employees based in Canada, the US, and the UK🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement🏙 Remote-flexible, offices in Toronto, New York, San Francisco and London and co-working stipend✈️ 6 weeks of vacationNote: This post is co-authored by both Cohere humans and Cohere technology.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
June 24, 2025
Data Engineer
Imbue
51-100
USD
0
170000
-
350000
United States
Full-time
Remote
false
SummaryWe’re a small, cross-functional team focused on building AI systems that reason and code. We care deeply about understanding how people interact with these systems and how we can use data to make them safer, smarter, and more useful .
We're looking for a Data Engineer to build and own the pipelines and data infrastructure that power our product and research efforts. Your work will directly support model training, evaluation, product analytics, and safety systems. You’ll partner closely with team members building our coding agents to make sure we’re capturing the right signals and using them well.
If you’re excited about turning messy product data into actionable insights, and building systems that can scale with our research, we’d love to get connected!
Example Projects• Combine synthetic data generation with human annotation platforms to produce high quality data that advances our product and research roadmap.• Design and build resilient, scalable pipelines (ETL and ELT) for batch and streaming data.• Develop and maintain infrastructure to support self-serve analytics, experimentation, and dataset generation. Prototype, evaluate, and make “build vs buy” decisions.• Help define and improve data modeling practices across the company, including instrumentation standards, dimensional modeling for analytics and feature stores for machine learning (ML).• Build integrations with ML infrastructure to support training pipelines, inference logging, and model monitoring (MLOps).• Debug pipeline failures, automate deployment processes, and improve data quality and reusability.
You are• A strong software engineer with 5+ years of experience, ideally working with large-scale data systems.• Experienced in designing and maintaining data pipelines and infrastructure, especially for analytics, experimentation, and ML.• Comfortable with tools for data orchestration (Airflow, Prefect), batch or streaming processing (Spark, Ray, Flink), and event tracking and analytics (Amplitude, PostHog).• Experienced with cloud-based infrastructure and storage (e.g., S3, GCP, Snowflake, or Redshift), and thoughtful about cost-performance tradeoffs.• Exposure to MLOps, model serving infrastructure, or ML workflows.• Pragmatic and principled! You know when to optimize and when to ship.
Compensation and Benefits• Competitive compensation, equity, and benefits• Lunch provided daily to onsite employees• $250 lifestyle stipend per month• Generous budget for self-improvement: coaching, courses, conferences, etc• Actively co-create and participate in a positive, intentional team culture• Spend time learning, reading papers, and deeply understanding prior work.• Frequent team events, dinners, off-sites, and hanging out.• Compensation packages are highly variable based on a variety of factors. If your salary requirements fall outside of the stated range, we still encourage you to apply. The range for this role is $170,000–$350,000 cash, $10,000–$2,000,000 in equity.
How to applyAll submissions are reviewed by a person, so we encourage you to include notes on why you're interested in working with us. If you have any other work that you can showcase (open source code, side projects, etc.), certainly include it! We know that talent comes from many backgrounds, and we aim to build a team with diverse skillsets that spike strongly in different areas.
About usImbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents.
We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.
Data Engineer
Data Science & Analytics
Apply
June 20, 2025
Data Center Technician
TensorWave
51-100
-
United States
Full-time
Remote
false
Data Engineer
Data Science & Analytics
Apply
June 10, 2025
Audio Data Engineer – Speech Cleaning & Pipeline Automation (TTS)
Hippocratic AI
201-500
-
United States
Full-time
Remote
false
Data Engineer
Data Science & Analytics
Apply
June 10, 2025
QA automation engineer
Writer
1001-5000
-
United States
Full-time
Remote
true
Data Engineer
Data Science & Analytics
Apply
June 6, 2025
Datacenter Operations Technician
X AI
5000+
-
United States
Full-time
Remote
false
Data Engineer
Data Science & Analytics
Apply
May 21, 2025
No job found
There is no job in this category at the moment. Please try again later