Top Data Engineer Jobs Openings in 2025
Looking for opportunities in Data Engineer? This curated list features the latest Data Engineer job openings from AI-native companies. Whether you're an experienced professional or just entering the field, find roles that match your expertise, from startups to global tech leaders. Updated everyday.
Senior Data Engineer
Maincode
1-10
AUD
0
150000
-
180000
Australia
Full-time
Remote
false
OverviewMaincode is building sovereign AI models in Australia. We are training foundation models from scratch, designing new reasoning architectures, and deploying them on state-of-the-art GPU clusters. Our models are built on datasets we create ourselves, curated, cleaned, and engineered for performance at scale. This is not buying off-the-shelf corpora or scraping without thought. This is building world-class datasets from the ground up.As a Senior Data Engineer, you will lead the design and construction of these datasets. You will work hands-on to source, clean, transform, and structure massive amounts of raw data into training-ready form. You will design the architecture that powers data ingestion, validation, and storage for multi-terabyte to petabyte-scale AI training. You will collaborate with AI Researchers and Engineers to ensure every byte is high quality, relevant, and optimised for training cutting-edge large language models and other architectures.This is a deep technical role. You will be writing code, building pipelines, defining schemas, and debugging unusual data edge cases at scale. You will think like both a data scientist and a systems engineer, designing for correctness, scalability, and future proofing. If you want to build the datasets that power sovereign AI from first principles, this is your team.
What you’ll doDesign and build large-scale data ingestion and curation pipelines for AI training datasetsSource, filter, and process diverse data types including text, structured data, code, and multimodal, from raw form to model-ready formatImplement robust quality control and validation systems to ensure dataset integrity, relevance, and ethical complianceArchitect storage and retrieval systems optimised for distributed training at scaleBuild tooling to track dataset lineage, reproducibility, and metadata at all stages of the pipelineWork closely with AI Researchers to align datasets with evolving model architectures and training objectivesCollaborate with DevOps and ML engineers to integrate data systems into large-scale training workflowsContinuously improve ingestion speed, preprocessing efficiency, and data freshness for iterative training cycles
Who you arePassionate about building world-class datasets for AI training from raw source to training-readyExperienced in Python and data engineering frameworks such as Apache Spark, Ray, or DaskSkilled in working with distributed data storage and processing systems such as S3, HDFS, or cloud object storageStrong understanding of data quality, validation, and reproducibility in large-scale ML workflowsFamiliar with ML frameworks like PyTorch or JAX, and how data pipelines interact with themComfortable working with multi-terabyte or larger datasetsHands-on and pragmatic, you like solving real data problems with code and automationMotivated to help build sovereign AI capability in Australia
Why MaincodeWe are a small team building some of the most advanced AI systems in Australia. We create new foundation models from scratch, not just fine-tune existing ones, and we build the datasets they run on from the ground up.We operate our own GPU clusters, run large-scale training, and integrate research and engineering closely to push the frontier of what is possible.You will be surrounded by people who:Care deeply about data quality and architecture, not just volumeBuild systems that scale reliably and repeatablyTake pride in learning, experimenting, and shippingWant to help Australia build independent, world-class AI systems
Data Engineer
Data Science & Analytics
Apply
August 14, 2025
Principal Data Engineer
Worth AI
11-50
-
United States
Full-time
Remote
true
Worth AI, a leader in the computer software industry, is looking for a talented and experienced Principal Data Engineer to join their innovative team. At Worth AI, we are on a mission to revolutionize decision-making with the power of artificial intelligence while fostering an environment of collaboration, and adaptability, aiming to make a meaningful impact in the tech landscape.. Our team values include extreme ownership, one team and creating reaving fans both for our employees and customers.Worth is looking for a Principal Data Engineer to own the company-wide data architecture and platform. Design and scale reliable batch/streaming pipelines, institute data quality and governance, and enable analytics/ML with secure, cost-efficient systems. Partner with engineering, product, analytics, and security to turn business needs into durable data products.ResponsibilitiesWhat you will do: Architecture & Strategy Define end-to-end data architecture (lake/lakehouse/warehouse, batch/streaming, CDC, metadata). Set standards for schemas, contracts, orchestration, storage layers, and semantic/metrics models. Publish roadmaps, ADRs/RFCs, and “north star” target states; guide build vs. buy decisions. Platform & Pipelines Design and build scalable, observable ELT/ETL and event pipelines. Establish ingestion patterns (CDC, file, API, message bus) and schema-evolution policies. Provide self-service tooling for analysts/scientists (dbt, notebooks, catalogs, feature stores). Ensure workflow reliability (idempotency, retries, backfills, SLAs). Data Quality & Governance Define dataset SLAs/SLOs, freshness, lineage, and data certification tiers. Enforce contracts and validation tests; deploy anomaly detection and incident runbooks. Partner with governance on cataloging, PII handling, retention, and access policies. Reliability, Performance & Cost Lead capacity planning, partitioning/clustering, and query optimization. Introduce SRE-style practices for data (error budgets, postmortems). Drive FinOps for storage/compute; monitor and reduce cost per TB/query/job. Security & Compliance Implement encryption, tokenization, and row/column-level security; manage secrets and audits. Align with SOC 2 and privacy regulations (e.g., GDPR/CCPA; HIPAA if applicable). ML & Analytics Enablement Deliver versioned, documented datasets/features for BI and ML. Operationalize training/serving data flows, drift signals, and feature-store governance. Build and maintain the semantic layer and metrics consistency for experimentation/BI. Leadership & Collaboration Provide technical leadership across squads; mentor senior/staff engineers. Run design reviews and drive consensus on complex trade-offs. Translate business goals into data products with product/analytics leaders. Requirements 10+ years in data engineering (including 3+ years as staff/principal or equivalent scope). Proven leadership of company-wide data architecture and platform initiatives. Deep experience with at least one cloud (AWS) and a modern warehouse or lakehouse (e.g., Snowflake, Redshift, Databricks). Strong SQL and one programming language (Python or Scala/Java). Orchestration (Airflow/Dagster/Prefect), transformations (dbt or equivalent), and streaming (Kafka/Kinesis/PubSub). Data modeling (3NF, star, data vault) and semantic/metrics layers. Data quality testing, lineage, and observability in production environments. Security best practices: RBAC/ABAC, encryption, key management, auditability. Nice to Have Feature stores and ML data ops; experimentation frameworks. Cost optimization at scale; multi-tenant architectures. Governance tools (DataHub/Collibra/Alation), OpenLineage, and testing frameworks (Great Expectations/Deequ). Compliance exposure (SOC 2, GDPR/CCPA; HIPAA/PCI where relevant). Model features sourced from complex 3rd-party data (KYB/KYC, credit bureaus, fraud detection APIs)
Benefits Health Care Plan (Medical, Dental & Vision) Retirement Plan (401k, IRA) Life Insurance Unlimited Paid Time Off 9 paid Holidays Family Leave Work From Home Free Food & Snacks (Access to Industrious Co-working Membership!) Wellness Resources
Data Engineer
Data Science & Analytics
Apply
August 14, 2025
Profasee - Senior Python Engineer(Data Pipelines & APIs)
Darwin AI
51-100
USD
0
60000
-
80000
Argentina
Full-time
Remote
true
Architect the Pipes. Fuel the Models. Move the Market.Location: Remote-first (strong, reliable internet required)Team: Data Engineering & ML (reporting to CTO)What we are looking forWe are looking for an entrepreneurial and business minded Python Engineer to help us disrupt the e-commerce industry. We believe it is time for technology and data to help merchants deliver the best pricing to their customers, no matter what. This is the opportunity to join a start-up led by a proven founder in an industry that is growing exponentially and an opportunity to work alongside a small and focused Data Science (DS) & Machine Learning (ML) Engineering team, the chance to work on something not only exciting and fun, but also creative in ways that most people will never experience in their lifetime.What you will do.You will work side by side with e-commerce experts and the DS / ML team to build, test, and maintain systems that collect, manage, and convert raw data into usable information for the DS / ML team to interpret. You will be responsible for building and maintaining data pipelines, storage systems, and APIs to access the data. You will also provide support to the ML engineering team to help with data engineering.The ideal candidate writes well-tested code that is easy to read and understand, communicates effectively with team members, asks for help and feedback when needed, and shares knowledge.Write code and tests for pulling data from 3rd party integrations and loading into a structured database for each data source (ETL).Provide API access to the data collected from 3rd parties.Analyze data sources to help decide which are useful for the DS / ML team.Maintain the data pipeline architecture, automation, scalability, and error handling.Monitor pipeline performance and stability.Who you are.You enjoy building the tools and systems for delivering a production product.You are comfortable in a small, dynamic team, where you will be designing and leading initiatives.You are self-motivated and comfortable working in a 100% remote environment.You can balance the need to move quickly with the desire to build for the future.You are willing to bring your expertise to the table and willing let others bring their expertise as well.Requirements5+ years of software development experienceProficient with Python programming languageProficient with SQL programming language and databases (PostgreSQL, MySQL, etc)Proficient integrating with APIs, data formats (JSON, CSV, XML, etc), rate limiting, and error handling.Experience with distributed task queues (i.e. Celery), multiprocessing, job scheduling.Understanding of the infrastructure used to build a production application (ex. NGINX, RabbitMQ, Error and Performance monitoring tools, AWS services such as ECS/EKS, etc.)Strong communication skills, especially with non-software developersExperience with containerization (Docker/Kubernetes)Experience designing and delivering the architecture required to run code in productionExperience with ETL pipelinesGood to haveExperience working in early-stage startups and/or developing your own projectsExperience with NoSQL and non-relational databasesExperience with machine learning and data science.Experience with distributed computing frameworks such as Apache Spark, Apache Iceberg or HadoopExperience with data orchestration tools such as Apache Airflow, Prefect, or LuigiInterview ProcessSilver Screening Interview.Silver Technical Interview.Logic Quizz.Client Screening interview.Client Live coding Interview.Client Behavioral Interview.
Data Engineer
Data Science & Analytics
Apply
August 14, 2025
Data Engineer
Air Ops
1-10
-
United States
Full-time
Remote
false
About AirOpsToday thousands of leading brands and agencies use AirOps to win the battle for attention with content that both humans and agents love.We’re building the platform and profession that will empower a million marketers to become modern leaders — not spectators — as AI reshapes how brands reach their audiences.We’re backed by awesome investors, including Unusual Ventures, Wing VC, Founder Collective, XFund, Village Global, and Alt Capital, and we’re building a world-class team with in-person hubs in San Francisco, New York, and Montevideo, Uruguay.What You’ll OwnWe’re hiring a Data Engineer to design and maintain the high-scale data infrastructure that powers the AirOps platform. You will build robust ingestion, cleanup, and integration pipelines, ensuring that the data our customers rely on for brand visibility insights is accurate, reliable, and ready for analysis.ResponsibilitiesDesign, build, and maintain scalable ETL/ELT pipelines for ingesting and transforming large volumes of dataImplement automated data validation, monitoring, and alerting to ensure quality and reliabilityIntegrate diverse internal and external data sources into unified, queryable datasetsOptimize storage and query performance for analytical workloadsCollaborate with data scientists to productionize ML models and ensure they run reliably at scaleWork with product and engineering teams to meet data needs for new features and insightsMaintain cost efficiency and operational excellence in cloud environmentsYour Experience4+ years of experience in data engineering, ideally in AI, SaaS, or data-intensive productsStrong fluency in Python and SQLExperience with modern data modeling tools such as dbtExperience with data warehouses and OLAP databases (e.g., Redshift, Snowflake, BigQuery, ClickHouse)Proven ability to design and maintain production-grade data pipelines in cloud environments (AWS, GCP, or similar)Familiarity with orchestration frameworks (Airflow, Dagster, Prefect)Comfort operating in fast-paced, ambiguous environments where you ship quickly and iterateAbout YouYou love building systems that make data accurate, reliable, and accessible at scaleYou think in terms of automation and scalability, not manual workaroundsYou collaborate well with data scientists, product managers, and engineersYou enjoy working with large, complex datasets and solving performance challengesYou take pride in operational excellence and care about the quality of the data you deliverOur Guiding PrinciplesExtreme OwnershipQualityCuriosity and PlayMake Our Customers HeroesRespectful CandorBenefitsEquity in a fast-growing startupCompetitive benefits package tailored to your locationFlexible time off policyGenerous parental leaveA fun-loving and (just a bit) nerdy team that loves to move fast!
Data Engineer
Data Science & Analytics
Apply
August 11, 2025
Data Engineer
Taktile
101-200
-
Germany
Full-time
Remote
false
About the role:Join our Optimization team at Taktile as a Data Engineer. You will build isolated, scalable data warehouse infrastructure for finance teams on AWS using Iceberg/Athena/Jupyter/Parquet. Support operational, ML/AI and visualizations tools to help customers derive value from their data. Your contributions will directly enhance our automated decisioning platform, allowing users to improve financial decision policies at scale through the use of production data.
Location: Taktile operates on a hybrid model. This role is based out of our Berlin HQ.What you'll do:Build and maintain isolated, regionalized data warehouse infrastructure for use in customer facing features.Develop data tools for improving policy performance, such as training ML models on historical data and backtesting at scale.Design and develop scalable and network optimized RESTful APIs using Python on AWS, leveraging services such as Lambda, S3, SQL and Parquet.Optimize data warehouse efficiency, conduct peer code reviews, and produce technical documentation.Collaborate with cross-functional teams in an Agile environment to translate business requirements into technical solutions.Requirements:Minimum of 5 years of experience in Python and SQLPrior experience in Data Engineering/Data Viz/Machine Learning or Artificial IntelligenceFluency in English, both written and spoken, is crucial to lead communication in our globally distributed environment.Ideal, but not required:Experience in Iceberg/AthenaExperience in Python, FastAPIExpertise in data engineering topics, SQL, parquetExperience with AWS services and serverless architectures.What we offer:Work with colleagues that lift you up, challenge you, celebrate you and help you grow. We come from many different backgrounds, but what we have in common is the desire to operate at the very top of our fields. If you are similarly capable, caring, and driven, you'll find yourself at home here.Make an impact and meaningfully shape an early-stage company.Experience a truly flat hierarchy and communicate directly with founding team members. Having an opinion and voicing your ideas is not only welcome but encouraged, especially when they challenge the status quo.Learn from experienced mentors and achieve tremendous personal and professional growth. Get to know and leverage our network of leading tech investors and advisors around the globe.Receive a top-of-market equity and cash compensation package.Get access to a self-development budget you can use to e.g. attend conferences, buy books or take classes.Receive a new Apple MacBook Pro, as well as meaningful home office set-up.Our stance:We're eager to meet talented and driven candidates regardless of whether they tick all the boxes. We're looking for someone who will add to our culture, not just fit within it. We strongly encourage individuals from groups traditionally underestimated and underrepresented in tech to applyWe seek to actively recognize and combat racism, sexism, ableism and ageism. We embrace and support all gender identities and expressions, and celebrate love in its many forms. We won't inquire about how you identify or if you've experienced discrimination, but if you want to tell your story, we are all earsAbout us:Taktile is building the world's leading software platform for running critical and highly-automated decisions. Our customers use our product to catch fraudsters, prevent money laundering, and expand access to credit for small businesses, among many other use cases. Taktile is already making millions of such decisions across the globe every day.Taktile is based in Berlin, London and New York City. It was founded by machine learning and data science veterans with extensive experience building and running production ML in financial services. Our team consists of engineers, entrepreneurs, and researchers with a diverse set of backgrounds. Some of us attended top universities such as Harvard, Oxford, and Stanford and some of us have no degree at all. We have accumulated extensive work experience at leading tech companies, startups, and the enterprise software sphere.Our backers include Y Combinator, Index Ventures, and stellar angels such as the founders of Looker, GitHub, Mulesoft, Datadog and UiPath.
Data Engineer
Data Science & Analytics
Apply
August 8, 2025
Data Engineer, Public Sector
Scale AI
5000+
USD
0
119000
-
155000
United States
Full-time
Remote
false
Data Engineer, Public Sector As a Data Engineer for the Public Sector business unit, you will build Scale's analytical and business-intelligence infrastructure. Scale's customers process millions of tasks through our APIs, and we're looking for a talented Data Engineer to build scalable solutions to support this growth. You will have widespread purview, with responsibility for understanding, mining, aggregating, and exposing data across the entire business unit to support timely and efficient decision-making and data exploration. You will also implement Scale's data warehouse, data mart, and business intelligence reporting environments, and help users transition their workflows to these systems. This role requires collaboration with leadership and cross-functional teams to solve complex problems and develop sustainable, scalable data solutions. Your responsibilities will include both ad-hoc analyses and the creation of core data models and pipelines, directly impacting how Scale operates and evaluates its performance. You will: Work with operations, finance, and engineering to drive the development of pipelines that provide single-source-of-truth foundational accuracy Continually improve ongoing data pipelines and simplify self-service support for business stakeholders Perform regular system audits, and create data quality tests to ensure complete and accurate reporting of data/metrics Develop repeatable, scalable analytical solutions, such as data models, improved pipelines, or better underlying tables Have an active Secret security clearance (Top Secret preferred) Ideally You’d Have: 2+ years of relevant work experience in a role requiring application of data modeling and analytic skills Ability to create extensible and scalable data schema and pipelines that lay the foundation for downstream analysis Mastery of SQL and relational databases; experience with programming languages (e.g., Python/R) Experience building a reliable transformation layer and pipelines from ambiguous business processes using tools such DBT to create a foundation for data insights Compensation packages at Scale for eligible roles include base salary, equity, and benefits. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training. Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval. Your recruiter can share more about the specific salary range for your preferred location during the hiring process, and confirm whether the hired role will be eligible for equity grant. You’ll also receive benefits including, but not limited to: Comprehensive health, dental and vision coverage, retirement benefits, a learning and development stipend, and generous PTO. Additionally, this role may be eligible for additional benefits such as a commuter stipend.The base salary range for this full-time position in the location of Washington DC is:$119,000—$155,000 USDPLEASE NOTE: Our policy requires a 90-day waiting period before reconsidering candidates for the same role. This allows us to ensure a fair and thorough evaluation of all applicants. About Us: At Scale, we believe that the transition from traditional software to AI is one of the most important shifts of our time. Our mission is to make that happen faster across every industry, and our team is transforming how organizations build and deploy AI. Our products power the world's most advanced LLMs, generative models, and computer vision models. We are trusted by generative AI companies such as OpenAI, Meta, and Microsoft, government agencies like the U.S. Army and U.S. Air Force, and enterprises including GM and Accenture. We are expanding our team to accelerate the development of AI applications. We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status. We are committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at accommodations@scale.com. Please see the United States Department of Labor's Know Your Rights poster for additional information. We comply with the United States Department of Labor's Pay Transparency provision. PLEASE NOTE: We collect, retain and use personal data for our professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the personal data we collect to that which we believe is appropriate and necessary to manage applicants’ needs, provide our services, and comply with applicable laws. Any information we collect in connection with your application will be treated in accordance with our internal policies and programs designed to protect personal data. Please see our privacy policy for additional information.
Data Engineer
Data Science & Analytics
Apply
August 7, 2025
Cashea - Data Engineering Manager
Darwin AI
51-100
USD
0
80000
-
90000
Argentina
Full-time
Remote
true
Hello! We’re Cashea 👋, and we're on a mission to bring optimism back to Venezuelans 🇻🇪 by building innovative financial products. Since our launch in 2022, we've been dedicated to democratizing financial inclusion through cutting-edge technology. In that time, we've grown to serve over 6 million customers, developed a range of products for both consumers and merchants, and become a trusted name in Venezuela—winning both minds and hearts 💛.About the role:
As Data Engineering Manager, you will design and develop scalable real-time and batch data pipelines to support analytical products like dashboards, recommendation engines, and automation tools. You will collaborate with data scientists and business teams to create APIs and data services that expose data and machine learning outputs. You’ll build backend infrastructure for analytics platforms using GCP services like BigQuery, Dataflow, and Pub/Sub, and work with front-end teams to support the creation of user-facing analytical tools. Additionally, you will lead the implementation of cost-effective architectures, ensure data quality and security, and mentor junior engineers while contributing to a strong engineering culture.
Overall you will...Design and develop scalable, real-time and batch pipelines to support analytical products such as dashboards, recommendation engines, and task automation tools.Collaborate with data scientists and business teams to create APIs and data services that expose data and machine learning outputs to internal and external consumers.Build the backend infrastructure for advanced analytics platforms using GCP services such as BigQuery, Dataflow, and Pub/Sub or similar.Work with front-end and full-stack developers to support the creation of user-facing analytical tools and dashboards.Lead the implementation of scalable, cost-effective architectures for data products.Participate in the design of data models and schemas that support analytics-driven software.Ensure data quality, security, and compliance with regulatory requirements such as GDPR and PCI DSS.Mentor junior engineers and contribute to building a strong engineering culture.Qualifications6+ years of experience in Data Architecture and/or Software Engineering, plus at least +1 years of experience as Lead, ideally in fintech, SaaS, or analytics-heavy industries.Strong experience with GCP technologies is mandatory (BigQuery, Dataflow, Pub/Sub, Cloud Run) Proficiency in building RESTful APIs or other data services for analytics consumption.Expertise in ETL/ELT pipelines using tools like dbt/Dataform or Airflow.Solid programming skills in Spark, Python or Java and SQL expertise.Familiarity with creating infrastructure for data products and integrating with front-end or external systems.Knowledge of governance and data security standards.Mentor junior engineers and contribute to building a strong engineering culture.Ability to work cross-functionally with data scientists, product managers, and business teams.Spanish and English proficiency
Data Engineer
Data Science & Analytics
Apply
August 6, 2025
Director of Data Engineering
Ironclad
501-1000
USD
0
265000
-
294000
United States
Full-time
Remote
false
Ironclad is the leading AI-powered contract lifecycle management platform, processing billions of contracts every year.
Every business is powered by contracts, but managing them can slow companies down and cost millions of dollars. Global innovators like L’Oréal, OpenAI, and Salesforce trust Ironclad to transform contracting into a strategic advantage - accelerating revenue, reducing risk, and driving efficiency. It’s the only platform that manages every type of contract workflow, whether a sales agreement, an HR agreement or a complex NDA.
We’re building the future of intelligent contracting and writing the narrative for how contracts unlock strategic growth. Forrester Wave and Gartner Magic Quadrant have consistently recognized Ironclad as a leader in our category. We’ve also been named one of Fortune’s Great Places to Work six years running, featured on Glassdoor’s Best Places to Work, and recognized by Forbes’ 50 Most Promising AI Companies.
We’re backed by leading investors like Accel, Sequoia, Y Combinator, and BOND. We’d love for you to join us!
This is a hybrid role. Office attendance is required at least twice a week on Tuesdays and Thursdays for collaboration and connection. There may be additional in-office days for team or company events.
About Ironclad ITOur IT team plays a pivotal role in creating platforms for success so our Ironclad team can execute on our vision. As an IT teammate, you’ll partner with our Operations team on growth strategies and execute on technology needs as we grow. You also bring empathy to understand employee technology needs. You’ll enable the team, build out new offices, and design IT strategy for a fast-paced, fast-growing technology company.Roles & Responsibilities:Lead, mentor, and manage a team of data engineers, fostering a culture of collaboration, innovation, and continuous improvement.Collaborate with other technical and non-technical teams to understand business requirements and translate them into efficient data engineering solutions.Design, implement, and maintain data pipelines and ETL processes that support data integration, transformation, and loading into the data warehouse.Ensure data quality, reliability, and consistency by implementing robust testing, monitoring, and validation processes.Stay current with industry trends and best practices in data engineering, cloud technologies, and data architecture.Drive the adoption of best practices for data governance, data security, and compliance within the data engineering team.Manage project timelines, resource allocation, and priorities to ensure timely delivery of data engineering projects.Collaborate with other directors and leaders to align data engineering initiatives with broader business objectives.Work closely with the IT & Security DS and other cross-functional DS teams on design, reference architecture, and Data Strategy.Experience setting journey driven by enterprise business outcomes, modernization, and stakeholder alignment.Excellent communication skills with technical and non‑technical audiences.Aptitude for budgeting, vendor negotiation, and operational efficiency.Qualifications:Strong data infrastructure and data architecture skills.A proven track record of leading and scaling large data teams.Strong operational skills to drive efficiency and speed.Strong project management leadership.Strong vision for how Data Engineering can proactively improve companies.12+ years of experience in Analytics, BI, and Data Warehousing.Experience scaling and managing 20+ person teams and in managing managers.Experience leading in a highly cross-functional environment, likely collaborating closely with Engineering, Product Management, and/or Data Science.Communication and leadership experience, with experience initiating and driving large-scale strategic initiatives.Data architecture experience.Experience in SQL or similar languages.Development experience in at least one object-oriented language (Python, Java, etc.).Solid understanding of how experimentation (A/B testing) works.Requirements:Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Proven experience (8+ years) in data engineering, with at least 5 years in a senior leadership role.Strong hands-on experience with cloud-based data engineering platforms preferred (e.g. GCP, Google Big Query, Looker, Databricks, Snowflake, dbt Labs, Airflow, Fivetran).Proficiency in SQL, Python, and other relevant programming languages.Experience with data modeling, ETL processes, and data integration patterns.Excellent communication skills with the ability to articulate complex technical concepts to both technical and non-technical stakeholders.Strong leadership and team management skills, with the ability to inspire and develop a high-performing team.Familiarity with agile development methodologies and project management practices.Adept at problem-solving, critical thinking, and decision-making in a fast-paced environment.Practical experience with data governance and compliance frameworks.Hands-on proficiency with data streaming and real-time data processing technologies.Benefits:Health, dental, and vision insurance401kWellness reimbursementTake what you need vacation policyGenerous parental leave for both primary and secondary caregiversBase Salary Range: $265,000 - $294,000The base salary range represents the minimum and maximum of the salary range for this position based at our San Francisco headquarters. The actual base salary offered for this position will depend on numerous factors, including individual proficiency, anticipated performance, and the location of the selected candidate. Our base salary is just one component of Ironclad’s competitive total rewards package, which also includes equity awards (a new hire grant, along with opportunities for additional awards throughout your tenure), competitive health and wellness benefits, and a commitment to career growth and development.Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Data Engineer
Data Science & Analytics
Apply
August 4, 2025
Senior Data Engineer (Intent)
Demandbase
1001-5000
USD
0
165000
-
284000
United States
Full-time
Remote
true
Introduction to Demandbase: Demandbase is the leading account-based GTM platform for B2B enterprises to identify and target the right customers, at the right time, with the right message. With a unified view of intent data, AI-powered insights, and prescriptive actions, go-to-market teams can seamlessly align and execute with confidence. Thousands of businesses depend on Demandbase to maximize revenue, minimize waste, and consolidate their data and technology stacks - all in one platform. As a company, we’re as committed to growing careers as we are to building world-class technology. We invest heavily in people, our culture, and the community around us. We have offices in the San Francisco Bay Area, Seattle, and India, as well as a team in the UK, and allow employees to work remotely. We have also been continuously recognized as one of the best places to work in the San Francisco Bay Area including, “Best Workplaces for Millennials” and “Best Workplaces for Parents”! We're committed to attracting, developing, retaining, and promoting a diverse workforce. By ensuring that every Demandbase employee is able to bring a diversity of talents to work, we're increasingly capable of living out our mission to transform how B2B goes to market. We encourage people from historically underrepresented backgrounds and all walks of life to apply. Come grow with us at Demandbase! The base compensation range for this position for candidates in the SF Bay Area is: $165,000 - $284,000. For all other locations, the base compensation range is based on the primary work location of the candidate as our ranges are location specific. Actual compensation packages are based on a wide array of factors unique to each candidate, including but not limited to skillset, years of experience, and depth of experience. About the Role: As a Senior Data Engineer, you will play a pivotal role in architecting and scaling our data platforms, building robust and scalable data pipelines, and enabling data-driven decision-making across the company. You’ll work with massive, complex datasets from a variety of third-party and internal sources — driving data infrastructure and platform evolution to support both real-time and batch processing needs. In this role, you’ll not only write code but also influence the data strategy, mentor junior engineers, and collaborate cross-functionally with product, analytics, and platform teams. This role supports our Intent product where you will improve the core pipelines as well as design new processes that enable our data science team to test and deploy new ML/AI models. The product delivered by this team is integrated into the core product stack and is a critical component of Demandbase’s account intelligence platform. This is a high-impact individual contributor role for someone who combines deep technical knowledge with strategic thinking and a bias for action. What You’ll Be Doing: Design & Architect: Lead the end-to-end design and evolution of scalable, resilient data pipelines and infrastructure, driving architecture decisions that impact the company’s data platform long-term. Build & Scale: Develop and optimize large-scale data processing workflows (batch and streaming), using Spark and related technologies, ingesting data from diverse internal and external sources. Mentor & Lead: Provide technical leadership and mentorship to mid- and junior-level engineers. Review design docs, PRs, and contribute to engineering best practices across the team. Improve Reliability: Build fault-tolerant, observable systems with self-healing and robust monitoring using tools like Airflow, Datadog, or equivalent. Collaborate: Partner with cross-functional stakeholders in Product, Analytics, and Infrastructure to ensure data architecture aligns with business needs and SLAs. Own & Operate: Take full lifecycle ownership of key data pipelines and integrations—from design to deployment to production support. What we’re looking for: Bachelor’s degree in computer science, engineering, mathematics, or related field 7+ years of experience in software/data engineering roles, with deep expertise in building and maintaining large-scale distributed data systems. Scala experience required. Comfort with purely functional programming is a plus. Strong CS fundamentals, including algorithms, data structures, and system design. Strong background in data modeling, performance tuning, and data integration best practices. Experience owning end-to-end systems, including production monitoring, incident response, and system reliability engineering. Proficiency in cloud-native data platforms (e.g., GCP or AWS), including managed services for analytics and orchestration. Familiarity with real-time data processing, streaming architectures, and event-driven design. Excellent verbal and written communication skills; comfortable explaining complex concepts to technical and non-technical stakeholders. A strong sense of ownership, initiative, and accountability. BS or MS in Computer Science required Benefits: We offer a comprehensive benefits package designed to support your health, well-being, and financial security. Our employees enjoy up to 100% paid premiums for Medical and Vision coverage, ensuring access to top-tier care for you and your loved ones. In addition, we provide a range of mental wellness resources, including access to Modern Health, to help support your emotional well-being. We believe in a healthy work-life harmony, which is why we offer a flexible PTO policy, 15 paid holidays in 2025—including a three-day break around July 4th and a full week off for Thanksgiving—and No Internal Meetings Fridays to give you uninterrupted time to focus on what matters most. For your financial future, we offer a competitive 401(k) plan, short-term and long-term disability coverage, life insurance, and other valuable benefits to ensure your financial peace of mind. Our Commitment to Diversity, Equity, and Inclusion at Demandbase: At Demandbase, we believe in creating a workplace culture that values and celebrates diversity in all its forms. We recognize that everyone brings unique experiences, perspectives, and identities to the table, and we are committed to building a community where everyone feels valued, respected, and supported. Discrimination of any kind is not tolerated, and we strive to ensure that every individual has an equal opportunity to succeed and grow, regardless of their gender identity, sexual orientation, disability, race, ethnicity, background, marital status, genetic information, education level, veteran status, national origin, or any other protected status. We do not automatically disqualify applicants with criminal records and will consider each applicant on a case-by-case basis. We recognize that not all candidates will have every skill or qualification listed in this job description. If you feel you have the level of experience to be successful in the role, we encourage you to apply! We acknowledge that true diversity and inclusion requires ongoing effort, and we are committed to doing the work required to make our workplace a safe and equitable space for all. Join us in building a community where we can learn from each other, celebrate our differences, and work together. Personal information that you submit will be used by Demandbase for recruiting and other business purposes. Our Privacy Policy explains how we collect and use personal information.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
August 4, 2025
Freelance AI Agent Assistant
Mindrift
1001-5000
-
Mexico
Part-time
Remote
true
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleIf you’re a professional who works with AI Data Annotation and friendly user of LLMs, Mindrift offers a unique opportunity to apply your editing, annotating, fact-checking and creative skills to an AI training project.This is a freelance role for a project, and your typical tasks may include: Conduct high-quality web searches to verify facts, gather supporting data, and cross-check AI responses. Perform fact-checking and intent verification to ensure AI responses align with the user's goals. Carefully review and flag any inaccuracies, inconsistencies, or irrelevant answers. Provide structured feedback on AI-generated content to help improve model performance. Work effectively with large language models (LLMs), understanding their capabilities and limitations, and applying best practices when interacting with them. Prompt generation with a purpose to receive the best quality result of LLMs. How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.Requirements You are currently enrolled in or completed a Bachelor's degree or higher. You have professional and/or educational experience in data annotation, demonstrate a deeper-than-user-level interest in AI, and possess intellectual breadth and curiosity. You are skilled in web searching, fact-checking, intent-checking, able to work with LLMs and have great attention to detail. Your level of English is upper-intermediate (B2) or above. You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines. Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge. BenefitsWhy this freelance opportunity might be a great fit for you? Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments. Work on advanced AI projects and gain valuable experience that enhances your portfolio. Influence how future AI models understand and communicate in your field of expertise.
Data Engineer
Data Science & Analytics
Apply
August 3, 2025
Senior Data Engineer
SignalFire
51-100
USD
0
170000
-
270000
United States
Full-time
Remote
false
About SignalFireSignalFire is the first VC firm built like a technology company to better serve the needs of founders as they build and scale their startups. With approximately $3B in assets under management, SignalFire invests in applied AI companies from pre-seed to Series B in key sectors, including healthcare, cybersecurity, infrastructure, consumer, and other enterprise verticals.The firm’s Beacon AI platform tracks over 650M employees and 80M organizations, giving the firm an unmatched data advantage in identifying and supporting world-class startups. Its sector-focused investors and a dedicated team of seasoned operators drive SignalFire at startup speed. They provide support across a company’s full lifecycle through data and resources tailored by growth stage, plus a diverse ecosystem of partners and customers. Notable investments include Frame.io, Grammarly, Grow Therapy, EvenUp, and Stampli.Learn more at www.signalfire.comRole DescriptionSignalfire is looking for an exceptional engineer who is passionate about building robust data infrastructure that powers complex analytical workflows, machine learning processes, and data-intensive web application. As our data engineer, you will work closely with our investment, portfolio, talent team, and the rest of our engineering organization to build best-in-class tools that help us explore potential investment areas and support our portfolio companies.What you’ll be doing:Building industry-standard backend architecture and data pipelines for our AI platform, Beacon which helps our firm invest in the best startups in the marketProcuring and evaluating potential new data sources in service of identifying, diligencing, and supporting startupsBuild and maintain scalable, maintainable data transformation pipelines using Python, Spark, dbt, apache/airflow to power the whole company's data science and analytics needsStand up new low-latency data processing infrastructure to provide clean data to machine learning modelsImprove workflow orchestration and automation processes for existing systemsYour background:2 to 6 years of experience working as a data engineerA solid conceptual grasp (and preferably hands-on experience with) orchestration frameworks like Airflow, Dagster, Prefect, or similarExperience with ETL, data processing, pipelining, and sanitizingExperience working on complex data pipelines using SQL, Redshift, Python, dbtExperience building backend infrastructure and architecture required to power frontend featuresUnderstanding of data modeling strategies for both transactional and data warehousing workloadsExceptional problem solving, analysis, decomposition, and communication skills applied within an agile development environmentBS/MS in statistics, computer science, economics, or a similarly quantitative fieldTakes pride working on large projects from scratch to completionSan Francisco Bay Area or New York basedWhy SignalFire?In addition to working directly with our team at the forefront of a hybrid human + technology investment approach, you’ll also work closely with founders to help them solve major problems.Our culture is one that breeds creativity and impact and puts you in control of your own destiny.End-to-end ownership: Engineers at SignalFire own the full stack—from data acquisition and model development to infrastructure and product delivery. This extends beyond code; you'll work directly with the investment team, gaining insight into how your work shapes strategic decisions.Empowered decision-making: We’ve cut the bureaucracy of big companies but kept their resources. Have a strong idea? Pitch it. You’ll get budget and ownership. We move like a startup but execute with the discipline of a mature company.Versatile technical expertise: We value engineers who blend startup agility with enterprise depth. You'll wear many hats—tuning infrastructure one day, refining algorithms or designing features the next—building a skillset rarely found elsewhere.Creative autonomy: At SignalFire, creativity isn’t just valued—it’s essential. We tackle problems without clear precedents, where sharp judgment and innovation matter more than technical orthodoxy. Success means balancing analytical rigor with a bias to ship, learn, and iterate.You can read more about life on the engineering team here: https://www.signalfire.com/blog/engineering-at-the-heart-of-venture-capitalCompensationUSD 170k-270k base salary/year based on experience + Carried InterestDoes this sound like you? If so, we'd love to hear from you!
Data Engineer
Data Science & Analytics
Apply
August 1, 2025
Data Engineer
Krea
11-50
-
United States
Full-time
Remote
false
About KreaAt Krea, we are building next-generation AI creative tools.We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it.We believe AI is a new medium that allows us to express ourselves through various formats—text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium. This jobData is one of the fundamental pieces of Krea. Huge amounts of data power our AI training pipelines, our analytics and observability, and many of the core systems that make Krea tick.As a data engineer, you will…… build distributed systems to process gigantic (petabytes) amounts of files of all kinds (images, video, and even 3D data). You should feel comfortable solving scaling problems as you go.… work closely with our research team to build ML pipelines and deploy models to make sense of raw data.… play with massive amounts of compute on huge kubernetes GPU clusters - our main GPU cluster takes up an entire datacenter from our provider.… learn machine learning engineering (ML experience is a bonus, but you can also learn it on the job) from world-class researchers on a small yet highly effective tight-knit team.Example projectsFind clean scenes in millions of videos, running distributed data pipelines that detect shot boundaries and saving timestamps of clips.Solve orchestration and scaling issues with a large-scale distributed GPU job processing system on kubernetess.Build systems to deploy and combine different LLMs to caption massive amounts of multimedia data in a variety of different ways.Design multi-stage pipelines to turn petabytes of raw data into clean downstream datasets, with metadata, annotations, and filters.Strong candidates may have experience with…Python, PyArrow, DuckDB, SQL, massive relational databases, PyTorch, Pandas, NumPy…KubernetesDesigning and implementing large-scale ETL systemsFundamental knowledge of containerization, operating systems, file-systems, and networking.Distributed systems designAbout usWe’re building AI creative tooling.We’ve raised over $83M from the best investors in Silicon Valley.We’re a team of 12 with millions of active users scaling aggressively.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
July 31, 2025
Staff Data Engineer
Thoughtful
101-200
USD
0
190000
-
250000
United States
Full-time
Remote
true
Join Our Mission to Revolutionize Healthcare Thoughtful is pioneering a new approach to automation for all healthcare providers! Our AI-powered Revenue Cycle Automation platform enables the healthcare industry to automate and improve its core business operations. We're looking for Staff Data Engineers to help scale and strengthen our data platform. Our data stack today consists of Aurora RDS, AWS Glue, Apache Iceberg, S3 (Parquet), Spark and Athena - supporting a range of use cases from operational reporting to downstream services. We’re looking to grow the team with engineers who can help improve performance, increase reliability, and expand the platform's capabilities as our data volume and complexity continue to grow. You’ll work closely with other engineers to evolve our existing pipelines, improve observability and data quality, and enable faster, more flexible access to data across the company. The platform is deployed on AWS using OpenTofu, and we’re looking for engineers who bring strong cloud infrastructure fundamentals alongside deep experience in data engineering. Your Role: Build: Develop and maintain data pipelines and transformations across the stack. Starting from ingesting transactional data into the data lakehouse to refining data up the medallion data architecture. Optimize: Tune performance, storage layout, and cost-efficiency across our data storage and query engines. Extend: Help design and implement new data ingestion patterns and improve platform observability and reliability. Collaborate: Partner with engineering, product, and operations teams to deliver well-structured, trustworthy data for diverse use cases. Contribute: Help establish and evolve best practices for our data infrastructure, from pipeline design to OpenTofu-managed resource provisioning. Secure: Help design and implement a data governance strategy to secure our data lakehouse. Your Qualifications: 8-10+ years of experience building and maintaining data pipelines in production environments Strong knowledge of the data lakehouse ecosystem, with an emphasis on AWS data services - particularly Glue, S3, Athena/Trino/PrestoDB, and Aurora Proficiency in Python, Spark and Athena/Trino/PrestoDB for data transformation and orchestration Experience managing infrastructure with OpenTofu/Terraform or other Infrastructure-as-Code tools Solid understanding of data modeling, partitioning strategies, schema evolution, and performance tuning Comfortable working with cloud-native data pipelines and batch processing (streaming experience is a plus but not required) What Sets You Apart: Systems thinker - you understand the tradeoffs in data architecture and design for long-term stability and clarity Outcome-driven - you focus on building useful, maintainable systems that serve real business needs Strong collaborator - you're comfortable working across teams and surfacing data requirements early Practical and hands-on - able to dive into logs, schemas, and IAM policies when needed Thoughtful contributor - committed to improving code quality, developer experience, and documentation across the board Why Thoughtful? Competitive compensation Equity participation: Employee Stock Options. Health benefits: Comprehensive medical, dental, and vision insurance. Time off: Generous leave policies and paid company holidays. California Salary Range $190,000—$250,000 USD
Data Engineer
Data Science & Analytics
Apply
July 28, 2025
Data Center Research & Development Engineer - Stargate
OpenAI
5000+
USD
240000
-
400000
United States
Full-time
Remote
false
About the Team:OpenAI, in close collaboration with our capital partners, is embarking on a journey to build the world’s most advanced AI infrastructure ecosystem. The Data Center Engineering team is at the core of this mission. This team sets the infrastructure strategy, develops cutting-edge engineering solutions, partners with research teams to define the infrastructure performance requirements, and creates reference designs to enable rapid global expansion in collaboration with our partners. As a key member of this team, you will help design and deliver next-generation power, cooling, and hardware solutions for high-density rack deployments in some of the largest data centers in the world. You will work closely with stakeholders across research, site selection, design, construction, commissioning, hardware engineering, deployment, operations, and global partners to bring OpenAI’s infrastructure vision to life.About the Role:We’re seeking a seasoned data center R&D engineer with extensive experience in designing, performing validation testing, commissioning, and operating large-scale power, cooling, and high-performance computing systems. This role focuses on developing and validating new infrastructure and hardware, including high-voltage rectifiers, UPS systems, battery storage, transformers, DC to DC converters, and power supplies. The role will lead the design and buildout of a hardware validation lab, create detailed models and test procedures, and ensure hardware compatibility across edge-case data center operating conditions. Additionally, the role involves working closely with hardware vendors to assess manufacturing test protocols, throughput, and liquid-cooled GPU rack performance. A strong foundation in technical design, operational leadership, and vendor collaboration is critical, with an opportunity to lead high-impact infrastructure programs.You Might Thrive in this Role:Oversee electrical, mechanical, controls, and telemetry design and operations for large-scale data centers, including review of building and MEP drawings across all project phases from concept design to permitting, construction, commissioning and production.Develop, test and implement operational procedures and workflows from design through commissioning and deployment.Perform validation testing of all critical equipment and hardware in partnership with equipment vendors and ODMs.Lead buildout of R&D lab, including equipment selection, test infrastructure for high-density liquid-cooled racks, and staffing plans.Select and manage engineering tools (CAD, CFD, PLM, PDM, electrical, mechanical, power/network management).Collaborate with external vendors to select, procure, and manage critical infrastructure equipment (e.g., UPS, generators, transformers, DC to DC converters, chillers, VFDs).Ensure seamless integration of power, cooling, controls, networking, and construction systems into facility design.Provide technical direction to teams and vendors, ensuring safety, quality, and compliance with local codes, standards and regulations.Manage vendor relationships and ensure adherence to safety, performance, and operational standards.Qualifications:20+ years of experience in data center design, operations, and critical systems maintenance.Proven leadership across design, commissioning, and operation of large-scale data center campuses.Deep expertise in infrastructure systems (power, cooling, controls, networking) and operational workflows.Hands-on experience with critical infrastructure equipment and testing protocols.Strong track record in lab development, equipment selection, and facility operations.Familiarity with engineering tools (CAD, CFD, PLM, etc.) and their integration across teams.Experience navigating regulatory environments and working with government agencies.Excellent cross-functional communication and stakeholder collaboration.Bachelor’s degree in engineering required; advanced degree and PE certification preferred.Preferred Skills:Expertise in equipment design, agency certification, and validation testing.Experience in global, matrixed organizations and multi-site operations.Skilled in vendor negotiations and supply chain management.Familiarity with sustainable and energy-efficient data center design principles.About OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Data Engineer
Data Science & Analytics
Apply
July 17, 2025
Staff Data Engineer
Glean Work
1001-5000
-
India
Full-time
Remote
false
About Glean: Founded in 2019, Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams. By integrating seamlessly with tools like Google Drive, Slack, and Microsoft Teams, Glean ensures employees can access the right knowledge at the right time, boosting productivity and collaboration. The company’s cutting-edge AI technology simplifies knowledge discovery, making it faster and more efficient for teams to leverage their collective intelligence. Glean was born from Founder & CEO Arvind Jain’s deep understanding of the challenges employees face in finding and understanding information at work. Seeing firsthand how fragmented knowledge and sprawling SaaS tools made it difficult to stay productive, he set out to build a better way - an AI-powered enterprise search platform that helps people quickly and intuitively access the information they need. Since then, Glean has evolved into the leading Work AI platform, combining enterprise-grade search, an AI assistant, and powerful application- and agent-building capabilities to fundamentally redefine how employees work.About the Role: Glean is building a world-class Data Organization composed of data science, applied science, data engineering and business intelligence groups. Our data engineering group is based in our Bangalore, India office. In this role, you will work on customer-facing and Glean employee-facing analytics initiatives: Customer-facing analytics initiatives: Customers rely on in-product dashboards and if they have the willingness and resources, self-serve data analytics to understand how Glean’s being used at their company in order to get a better sense of Glean’s ROI and partner with Glean to increase user adoption. You’re expected to partner with backend and data science to maintain and improve the data platform behind these operations reflect usage on new features reflect changes on the underlying product usage logs on existing features identify and close data quality issues, e.g. gaps with internal tracking, and backfill the changes triage issues customers report to us within appropriate SLAs help close customer-facing technical documentation gaps You will: Help improve the availability of high-value upstream raw data by channeling inputs from data science and business intelligence to identify biggest gaps in data foundations partnering with Go-to-Market & Finance operations groups to create streamlined data management processes in enterprise apps like Salesforce, Marketo and various accounting software partnering with Product Engineering teams as they craft product logging initiatives & processes Architect and implement key tables that transform structured and unstructured data into usable models by the data, operations, and engineering orgs. Ensure and maintain the quality and availability of internally used tables within reasonable SLAs Own and improve the reliability, efficiency and scalability of ETL tooling, including but not limited to dbt, BigQuery, Sigma. This includes identifying implementing and disseminating best practices as well. About you: You have 9+ yrs of work experience in software/data engineering (former is strongly preferred) as a bachelor degree holder. This requirement is 7+ for masters degree holders and 5+ for PhD Degree holders. You’ve served as a tech lead and mentored several data engineers before. Customer-facing analytics initiatives: You have experience in architecting, implementing and maintaining robust data platform solutions for external-facing data products. You have experience with implementing and maintaining large-scale data processing tools like Beam and Spark. You have experience working with stakeholders and peers from different time zones and roles, e.g. ENG, PM, data science, GTM, often as the main data engineering point of contact. Internal-facing analytics initiatives: You have experience in full-cycle data warehousing projects, including requirements analysis, proof-of-concepts, design, development, testing, and implementation You have experience in database designing, architecture, and cost-efficient scaling You have experience with cloud-based data tools like BigQuery and dbt You have experience with data pipelining tools like Airbyte, Apache, Stitch, Hevo Data, and Fivetran General qualifications: You have a high degree of proficiency with SQL and are able to set best practices and up-level our growing SQL user base within the organization You are proficient in at least one of Python, Java and Golang. You are familiar with cloud computing services like GCP and/or AWS. You are concise and precise in written and verbal communication. Technical documentation is your strong suit. You are a particularly good fit if: You have 1+ years of tech lead management experience. Note this is distinct from having a tech lead experience, and involves formally managing others. You have experience working with customers directly in a B2B setting. You have experience with Salesforce, Marketo, and Google Analytics. You have experience in distributed data processing & storage, e.g. HDFS Location: This role is hybrid (3 days a week in our Bangalore office) We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.
Data Engineer
Data Science & Analytics
Apply
July 14, 2025
Head of Data
Moonvalley
101-200
0
0
-
0
United States
Full-time
Remote
true
About UsMoonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from Deepmind, Google, Microsoft, Meta & Snap, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we've raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we're just getting started.Role Summary:We're seeking a Head of Data to define and execute our comprehensive data strategy as we scale our next-generation generative video models. You will build and lead the data organization, ensuring Moonvalley maintains its competitive advantage through exclusive access to the highest quality, legally compliant datasets in the industry.You'll establish data governance frameworks, drive acquisition partnerships with content creators and studios, oversee compliance and licensing, and manage the complete data lifecycle powering our AI models. This role requires technical depth, strategic thinking, and executive leadership to navigate the complex landscape of data rights, quality, and scale in generative AI.What you'll do:Define and execute comprehensive data strategy, roadmap, and organizational visionBuild and scale the data team, managing data engineers, scientists, analysts, and compliance specialistsLead negotiations for large-scale data acquisition deals with studios, content creators, and media companiesEstablish data governance policies, compliance frameworks, and quality standardsOversee data infrastructure, pipelines, and processing systems architecturePartner with research, legal, and executive teams to align data initiatives with business objectivesManage data licensing, copyright compliance, and regulatory requirementsDrive innovation in data processing, curation, and quality assessment methodologiesWhat we're looking for:8+ years of data leadership experience building and scaling data organizations at high-growth technology companiesDeep experience with large-scale data infrastructure, ML pipelines, and distributed systemsProven track record developing data strategies and managing cross-functional teamsExperience with dataset licensing, intellectual property rights, and compliance frameworksUnderstanding of machine learning workflows and model training data requirementsStrong business acumen with ability to negotiate complex partnerships and contractsExceptional communication skills for executive presentations and external stakeholder managementExperience with multi-modal datasets, particularly video, and associated technical challengesNice to Haves:Experience at a generative AI company or with foundation model training datasetsBackground in media/entertainment industry with existing industry relationshipsExperience with international data regulations and cross-border complianceAdvanced degree in computer science, data science, or related technical fieldIn our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational workMoonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.
Data Engineer
Data Science & Analytics
Machine Learning Engineer
Data Science & Analytics
Apply
July 10, 2025
Senior Data Engineer - Analytics
Evenup
501-1000
-
United States
Canada
Full-time
Remote
false
EvenUp is one of the fastest-growing generative AI startups in history, on a mission to level the playing field for personal injury victims, which range from motor vehicle accidents to child abuse cases. Our products empower law firms to secure faster settlements, higher payouts, and better outcomes for those who need it most.The Analytics team at EvenUp plays a critical role in driving data-informed decision-making across the organization. We partner closely with Product, Engineering, Operations, and Executive teams to uncover insights, build scalable data solutions, and enable a culture of data-driven decision making and continuous improvement. Our team is responsible for everything from foundational data modeling and reporting to advanced analytics and forecasting.
We’re looking for a Senior Data Engineer - Analytics to join our fast-paced, data-driven team. You’ll play a key role in building and scaling the data foundations that enable fast, reliable, and actionable insights. You’ll work closely with partner teams to drive end-to-end analytics initiatives and work alongside Data Scientists, ML Engineers, Product Managers, Software Engineers and Operations.
Please Note: This is a hybrid role with the expectation of working at least 3 days a week from one of our office hubs in San Francisco OR Toronto, Canada.
What You'll DoPartner with product, analytics, and operations teams to understand data needs and translate them into scalable models and pipelines.Build and maintain robust, well-documented ELT pipelines using tools like dbt, Airflow/Dagster, and Snowflake (or similar).Own the design and evolution of data models that support reporting, experimentation, and product feature instrumentation.Improve data quality and reliability by establishing and maintaining testing, monitoring, and alerting practices.Help drive our data architecture forward with a focus on usability, performance, strong data governance, and scalability.Mentor other engineers and analysts on data best practices, modeling standards, and tooling.
What We're Looking For4+ years of experience as a Data Engineer, Analytics Engineer or Business Intelligence Engineer, ideally in product-focused, fast-paced environmentsStrong experience with designing dimensional models and defining clear source-of-truth metricsStrong proficiency in SQL for data transformation, comfort in at least one functional/OOP language such as Python or Scala for data wrangling or orchestration logicExpertise in creating compelling reporting and data visualization solutions using dashboarding tools (e.g., Looker, Tableau, Metabase)A self starter, comfort working in fast paced and evolving environments with a bias toward actionExcellent communication skills and experience collaborating closely with product managers and business stakeholdersStrong interpersonal skills, with the ability to build relationships and trust across functions and work collaborativelyStrong attention to detail, structured thinking, and experience developing processes to reduce human errorBonus PointsExperience with DBT / Bigquery / Dagster / PythonExperience working as a product software engineerExperience working in the legal domainExperience working with document processing with unstructured dataNotice to Candidates:EvenUp has been made aware of fraudulent job postings and unaffiliated third parties posing as our recruiting team – please know that we have no affiliation or connection to these situations. We only post open roles on our career page (https://jobs.ashbyhq.com/evenup) or reputable job boards like our official LinkedIn or Indeed pages, and all official EvenUp recruitment emails will come from the domains @evenuplaw.com, @evenup.ai, @ext-evenuplaw.com or no-reply@ashbyhq.com email address. If you receive communication from someone you believe is impersonating EvenUp, please report it to us by emailing talent-ops-team@evenuplaw.com. Examples of fraudulent email domains include “careers-evenuplaw.com” and “careers-evenuplaws.com”. Benefits & Perks:Our goal is to empower every team member to contribute to our mission of fostering a more just world, regardless of their role, location, or level of experience. To that end, here is a preview of what we offer:Choice of medical, dental, and vision insurance plans for you and your familyFlexible paid time off10 US observed holidays, and Canadian statutory holidays by provinceA home office stipend401(k) for US-based employeesPaid parental leaveSabbatical programA meet-up program to get together in person with colleagues in your areaOffices in San Francisco and TorontoPlease note the above benefits & perks are for full-time employees About EvenUp:EvenUp is on a mission to level the playing field in personal injury cases. EvenUp applies machine learning and its AI model known as Piai™ to reduce manual effort and maximize case outcomes across the personal injury value chain. Combining in-house human legal expertise with proprietary AI and software to analyze records. The Claims Intelligence Platform™ provides rich business insights, AI workflow automation, and best-in-class document creation for injury law firms. EvenUp is the trusted partner of personal injury law firms. Backed by top VCs, including Bessemer Venture Partners, Bain Capital Ventures (BCV), SignalFire, NFX, DCM, and more, EvenUp’s customers range from top trial attorneys to America’s largest personal injury firms. EvenUp was founded in late 2019 and is headquartered in San Francisco. Learn more at www.evenuplaw.com.EvenUp is an equal opportunity employer. We are committed to diversity and inclusion in our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Data Engineer
Data Science & Analytics
Apply
July 10, 2025
Senior Analytics Engineer
Harvey
501-1000
USD
0
170000
-
200000
United States
Full-time
Remote
false
Why HarveyHarvey is a secure AI platform for legal and professional services that augments productivity and automates complex workflows. Harvey uses algorithms with reasoning-adept LLMs that have been customized and developed by our expert team of lawyers, engineers and research scientists. We’ve found product market fit and are scaling our team very quickly. Some reasons to join Harvey are:Exceptional product market fit: We have partnered with the largest law firms and professional service providers in the world, including Paul Weiss, A&O Shearman, Ashurst, O'Melveny & Myers, PwC, KKR, and many others.Strategic investors: Raised over $500 million from strategic investors including Sequoia, Google Ventures, Kleiner Perkins, and OpenAI.World-class team: Harvey is hiring the best talent from DeepMind, Google Brain, Stripe, FAIR, Tesla Autopilot, Glean, Superhuman, Figma, and more.Partnerships: Our engineers and researchers work directly with OpenAI to build the future of generative AI and redefine professional services.Performance: 4x ARR in 2024.Competitive compensation.Role OverviewWe’re looking for a versatile Senior Analytics Engineer to architect the data backbone that powers decision-making at Harvey. With product-market fit already proven and demand surging across diverse customer segments, you’ll design clean, reliable pipelines and semantic data models that turn raw events into immediately usable insights. As the first Analytics Engineer on our team, you’ll choose and implement the right data stack, champion best practices in testing and documentation, and collaborate closely with product, GTM, and leadership to ensure every team can answer its own questions with confidence. If you combine engineering rigor with a love of storytelling through data—and want to shape analytics from the ground up—we’d love to meet you.What You’ll DoDesign and build scalable data models and pipelines using dbt to transform raw data into clean, reliable assets that power company-wide analytics and decision-making.Define and implement a robust semantic layer (e.g. LookML/Omni) that standardizes key business metrics, dimensions, and data products, ensuring self-serve capabilities for stakeholders across teams.Partner cross-functionally with Product, GTM, Finance, and the Exec Team to deliver intuitive, consistent dashboards and analytical tools that surface real-time business health metrics.Establish and champion data modeling standards and best practices, guiding the organization in how to model data for accuracy, usability, and long-term maintainability.Collaborate with engineering to make key decisions on data architecture, co-design data schemas, and implement orchestration strategies that ensure reliability and performance of the data warehouse.Lead data governance initiatives, ensuring high standards of data quality, consistency, documentation, and access control across the analytics ecosystem.Empower stakeholders with data by making analytical assets easily discoverable, reliable, and well-documented—turning complex datasets into actionable insights for the business.
What You Have5+ years of experience in Analytics Engineering, Data Engineering, Data Science, or similar fieldDeep expertise in SQL, dbt, Python, and modern BI/semantic layer tools like Looker or Omni.Skilled at defining core business and product metrics, uncovering insights, and resolving data inconsistencies across complex systems.Strong familiarity with version control (GitHub), CI/CD, and modern development workflows.Bias for action — you prefer launching usable, iterative data models that deliver immediate value over waiting for perfect solutions.Strong communicator who can build trusted partnerships across Product, GTM, Finance, and Exec stakeholders.Comfortable working through ambiguity in fast-moving, cross-functional environments.Balances big-picture thinking with precision in execution — knowing when to sweat the details and when to move quickly.Experience operating in a B2B or commercial setting, with an understanding of customer lifecycle and revenue-driving metrics.BonusEarly employee at a hyper-growth startupExperience with or knowledge of AI and LLMsData Engineering ExperienceExperience managing data warehouse (preferably Snowflake)Experience at world-class enterprise orgs (ex: Brex, Ramp, Stripe, Palantir)Compensation Range $170,000 - $200,000 USDPlease find our CA applicant privacy notice here.Harvey is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition, or any other basis protected by law.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made by emailing interview-help@harvey.ai.
Data Engineer
Data Science & Analytics
Data Scientist
Data Science & Analytics
Apply
July 3, 2025
Data Infrastructure Engineer
HeyGen
201-500
-
United States
Canada
Full-time
Remote
false
About HeyGen At HeyGen, our mission is to make visual storytelling accessible to all. Over the last decade, visual content has become the preferred method of information creation, consumption, and retention. But the ability to create such content, in particular videos, continues to be costly and challenging to scale. Our ambition is to build technology that equips more people with the power to reach, captivate, and inspire audiences.
Learn more at www.heygen.com. Visit our Mission and Culture doc here. Position Summary: At HeyGen, we are at the forefront of developing applications powered by our cutting-edge AI research. As a Data Infrastructure Engineer, you will lead the development of fundamental data systems and infrastructure. These systems are essential for powering our innovative applications, including Avatar IV, Photo Avatar, Instant Avatar, Interactive Avatar, and Video Translation. Your role will be crucial in enhancing the efficiency and scalability of these systems, which are vital to HeyGen's success. Key Responsibilities: Design, build, and maintain the data infrastructure and systems needed to support our AI applications. Examples include Large scale data acquisition Multi-modal data processing framework and applications Storage and computation efficiency AI model evaluation and productionization infrastructure Collaborate with data scientists and machine learning engineers to understand their computational and data needs and provide efficient solutions. Stay up-to-date with the latest industry trends in data infrastructure technologies and advocate for best practices and continuous improvement. Assist in budget planning and management of cloud resources and other infrastructure expenses. Qualifications: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field Proven experience in managing infrastructure for large-scale AI or machine learning projects Excellent problem-solving skills and the ability to work independently or as part of a team. Proficiency in Python Experience optimizing computational workflows Familiarity with AI and machine learning frameworks like TensorFlow or PyTorch. Preferred Qualifications: Experience with GPU computing Experience with distributed data processing system Experience building large scale batch inference system Prior experience in a startup or fast-paced tech environment. What HeyGen Offers Competitive salary and benefits package. Dynamic and inclusive work environment. Opportunities for professional growth and advancement. Collaborative culture that values innovation and creativity. Access to the latest technologies and tools. HeyGen is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
July 2, 2025
Senior Storage Engineer - Ceph
Lambda AI
501-1000
USD
0
380000
-
460000
United States
Full-time
Remote
false
Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably build, test and deploy AI products at scale. Lambda’s product portfolio includes on-prem GPU systems, hosted GPUs across public & private clouds and managed inference services – servicing government, researchers, startups and Enterprises world-wide.
If you'd like to build the world's best deep learning cloud, join us.
*Note: This position requires presence in our San Francisco office location 4 days per week; Lambda’s designated work-from-home day is currently Tuesday.
Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance.
Most people are aware that the AI revolution is driven by data. What most people don’t know is that this data is hosted on large, high-performance storage arrays measured in petabytes. At Lambda, the Infrastructure Storage Team’s job is to ensure that the data powering AI is fast, performant, and available.The Storage team is a multidisciplinary group of Storage Engineers, Software Engineers, and SREs. This close-knit team is motivated by a shared passion for delivering top-tier storage solutions to our customers. As both a product and operations team, we collaborate closely to accelerate development, deployment, and, most importantly, build resilient storage offerings.We are looking for a Ceph subject matter expert to support our latest Storage initiative. Starting with Object Storage, Ceph will form the backbone of our next-generation, differentiated storage solutions. The ideal candidate will train current Storage Engineers, advise on key Ceph decisions, and drive strategic Ceph initiatives.What You’ll Do:Design, deploy, and maintain highly available 40PB+ Ceph clustersPerform cluster upgrades, expansions, and performance optimizationsConfigure and optimize RBD, CephFS, and RadosGW servicesMonitor cluster health, performance metrics, and capacity utilizationDevelop capacity planning models and growth projectionsTrain others on CephYou Have:Bachelor's degree in Computer Science, Engineering, or equivalent experience5+ years of experience in storage engineering or distributed systems5+ years of hands-on experience with Ceph administration and troubleshootingStrong understanding of storage protocols (NFS, iSCSI, S3, Swift)Proficiency with Linux system administration and storage subsystemsExperience with storage hardware (SSDs, HDDs, NVMe) and networking technologiesKnowledge of monitoring tools (Prometheus, Grafana, Nagios) and log analysisUnderstanding of data protection concepts, backup strategies, and disaster recoveryNice to HaveCeph Certified Professional or equivalent certificationExperience with other distributed storage systems (GlusterFS, HDFS, MinIO)Experience with public cloud storage services (AWS S3, Azure Blob, GCP)Familiarity with storage benchmarking tools (fio, rados bench, COSBench)Experience with programming in Go is strongly preferred.Experience with configuration management tools (Ansible, Puppet, Chef)Understanding of network protocols and storage networking (RDMA, iSER)Salary Range InformationBased on market data and other factors, the annual salary range for this position is $380,000-$460,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.
About LambdaFounded in 2012, ~350 employees (2024) and growing fastWe offer generous cash & equity compensationOur investors include Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, US Innovative Technology, Gradient Ventures, Mercato Partners, SVB, 1517, Crescent Cove.We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitabilityOur research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOGHealth, dental, and vision coverage for you and your dependentsWellness and Commuter stipends for select roles401k Plan with 2% company match (USA employees)Flexible Paid Time Off Plan that we all actually useA Final Note:You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.Equal Opportunity EmployerLambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.
Data Engineer
Data Science & Analytics
MLOps / DevOps Engineer
Data Science & Analytics
Apply
June 30, 2025
No job found
There is no job in this category at the moment. Please try again later