Product & Operations

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

United Kingdom

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

12

0

-

12

Philippines

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

80

0

-

80

United States

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

No items found.

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

80

0

-

80

United States

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

1001-5000

USD

50

0

-

50

Denmark

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Freelance AI Evaluation Scenario Writer

Mindrift

1001-5000

USD

50

0

-

50

Spain

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

17

0

-

17

Argentina

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

1001-5000

USD

30

0

-

30

No items found.

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

45

0

-

45

Australia

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

80

0

-

80

United States

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Mathematician - Freelance AI Trainer

Mindrift

1001-5000

USD

17

0

-

17

Mexico

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleGenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Mathematics, you’ll have the opportunity to collaborate on these projects.Although every project is unique, you might typically: Generate prompts that challenge AI. Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. Correct the model’s responses based on your domain-specific knowledge. How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsYou hold a Bachelor's, Master’s or PhD Degree in Mathematics and/or in relevant area. You have at least 3 years of professional experience in relevant fields:Algebra (Intermediate Algebra, Algebra, pre-algebra), Geometry, Calculus (pre-calculus), Probability (counting and probability), Number theory.Your level of English is advanced (C1) or above. You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines. Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.BenefitsWhy this freelance opportunity might be a great fit for you? Get paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.Work on advanced AI projects and gain valuable experience that enhances your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

1001-5000

USD

45

0

-

45

Australia

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

Germany

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

30

0

-

30

No items found.

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

12

0

-

12

India

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

Denmark

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

1001-5000

USD

12

0

-

12

India

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Head of Forward Deployed Engineering

Lorikeet

51-100

0

-

0

United States

Full-time

Remote

About LorikeetLorikeet is the most powerful customer support AI for complex businesses like fintechs, healthtechs, marketplaces and delivery services.We’re doing this by building ground up from the premise that most support responses should be automated with transparent, customizable AI, and that support teams should spend their time managing automation and engaging with complex cases, not grinding through high volumes of simple tickets. Once teams are freed from reactive support, we want to help them tackle what’s next: providing personalized concierge services to their customers.To deliver this combination of powerful AI systems and well designed tooling we’re leveraging Jamie’s experience as an early member of Google’s generative AI team and Steve’s experience building for operational teams at Stripe, as well as the experience of our team who’ve joined us from places like Stripe, Canva, Atlassian, Dropbox and Dovetail.We are growing fast, have paying customers, real revenue, an exciting roadmap and a strong sales pipeline. We’ve raised over USD 50m from leading VCs and angel investors, including QED, Blackbird, Square Peg, Claire Hughes Johnson (ex Stripe COO), Cristina Cordova (Linear COO), Bob Van Winden (Stripe Head of Support), and Cos Nicolaescu (Brex CTO).Our global customers include:The largest telehealth company in Australia,The largest bank for teens in the US,One of the largest NFT marketplaces by trading volume,The leading virtual specialty-care platform in the US,One of the largest flexible rent-payment platforms in the US,One of the largest Web3 gaming companies… and a handful of other enterprise customers with over 1 million support tickets a year.What’s unique about this opportunity?Warm, mature, flexible culture. Low ego, high trust team. No tolerance for ‘talented jerks’. We embrace a) working efficiently, and b) working flexible hours to fit in life priorities outside of work. We’re committed to building a diverse team and really encourage folks from underrepresented backgrounds to reach out - we value user obsession and eagerness to learn over traditional credentials.High pay, high expectations, high performance. We’re building a small, great team. We aim to match unicorn / scale up pay at base salary and offer a potentially life-changing equity stake in the business. Our team get the same monthly updates we send to our investors because they’re investors and owners too.On the technical cutting edge. With our users we’re defining what an AI-first SaaS product looks like. No one has figured out what the UI/UX, capabilities and data models of an AI first company are - it’s white space for us to invent. The AI agent problems we’re solving are beyond the cutting edge at the biggest research labs. We’re building on a modern tech stack, with Typescript, React/Remix, PrismaORM, NestJS and some Python sprinkled in. Knowledge of that stack is nice, but we know good engineers will pick up new languages.No nonsense recruitment process. The process is: 1) informal chats with Steve, Remy, and JJ to hear our pitch and understand your interests and goals, 2) a ~two day paid work trial where you come in and ship with us. There’s no better way for each of us to figure out if we like working together than to work together!About the role and youWe are looking for a seasoned team lead for our growing team of Forward Deployed AI Engineers, who specialises in getting Lorikeet trained up and deployed for our global customers as fast as possible.As the Head of Forward Deployed Engineering at Lorikeet, you'll be in charge of the bridge between Lorikeet's technology and our clients' success. You’ll manage the team that serves as our customers’ AI experts, integration experts, and problem solvers, guiding them in forging strong relationships and great outcomes for their customers. You’ll empower the team to break new ground with our customers, ultimately ensuring quick integrations and a growing automation rate among our subscriber base. And finally, you’ll be responsible for accelerating hiring globally for the team, as well as technically ramping up new hires, and developing existing team members.What you’ll doBuild and lead a high-performing team of Forward Deployed AI Engineers who serve as trusted technical partners to our subscribers.Develop team’s expertise in Lorikeet’s AI platform and ensure they are equipped to confidently lead implementation projects from kickoff through deployment and ramp up.Coach the team on how to deeply understand customer workflows and business goals, and configure Lorikeet’s AI tools to drive meaningful outcomesFoster a team culture of curiosity and proactive solutioning, empowering team members to recommend AI-driven strategies that extend beyond traditional support use casesEnsure tight collaboration with the product team by helping the team capture and communicate customer feedback and insights to influence product direction.Support team members in navigating complex technical integrations with creativity, speed, and care.Create systems for ongoing skill development, customer empathy, and operational excellence across the solutions function.Serve as an advocate and unblocker for the team, ensuring they have the tools, clarity, and autonomy to build strong, trusted relationships with customers.The right candidateThe ideal candidate has a technical background, maybe as a former engineer, and then learned the business. You have experience scaling solutions engineering or other similar customer-facing technical teams.Candidly, this role is perfect for someone who is (1) deeply interested in AI, (2) is excited to mentor and develop others (3) is energized by operating at the nexus of the technical implementation of AI and ensuring customer success.We need someone who is technically-minded and excited to dive into the technical aspects of AI, loves building, learning, and tinkering, and most importantly, is hyper-focused on customer impact. You'll have the opportunity to shape not just individual client implementations, but also our overall approach to customer deployment and success.You might be a fit if you:Previously brought a technical foundation into client-facing roles, evolving from individual contributor to team lead within high-growth, cross-functional environmentsManaged and developed Solutions Engineering teams responsible for implementation, integration, and post-sales technical support across a range of enterprise and startup clientsHave previous experience leading daily team operations including project oversight, prioritization, unblocker support, and performance coachingServed as the connective tissue between Product, Engineering, and Customer teams, ensuring consistent feedback loops and roadmap alignmentHired and onboarded technical talent with both traditional (CS/Eng) and nontraditional (no-code, data, operations) backgroundsCoached team members to grow in client communication, technical depth, and problem-solving autonomyBuilt and refined internal processes for implementation timelines, stakeholder management, and scaling high-touch support in a startup contextHave previous experience operating in a data-driven manner; setting, tracking, and reporting progress against key business metricsHave actively explored and deployed AI tools to enhance team workflows, solution delivery, and internal enablementApplicants must be based in the United States and eligible to work in the US without sponsorship (remote work supported).If you don't quite match this and are from an under-represented background we strongly encourage you to reach out. We know first hand that diverse teams are higher performing and are proud that our team reflects a broad spectrum of identities and lived experiences.Lorikeet uses Automated Employment Decision Tools (AEDT) to assist in the candidate screening process. These systems help us efficiently review a high volume of applications by analyzing qualifications, skills, and experience against the requirements of the job description.The use of AI is intended to enhance, not replace, human judgment. Final hiring decisions are made by human recruiters and hiring managers who review all relevant information, including AI-generated assessments, to ensure a fair and comprehensive evaluation.Our AI systems are regularly audited and monitored to prevent and mitigate bias. We are committed to ensuring that our technology-assisted processes comply with all federal, state, and local anti-discrimination laws.If you require a reasonable accommodation to participate in our application process, please contact us at: people@lorikeetcx.ai

No items found.

Apply

January 6, 2026

Hidden link

Software Engineer, Agent Studio

Sierra

201-500

USD

390000

230000

-

390000

United States

Full-time

Remote

About usAt Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco, with growing offices in Atlanta, New York, London, France, Singapore, and Japan.We are guided by a set of values that are at the core of our actions and define our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These values are the foundation of our work, and we are committed to upholding them in everything we do.Our co-founders are Bret Taylor and Clay Bavor. Bret currently serves as Board Chair of OpenAI. Previously, he was co-CEO of Salesforce (which had acquired the company he founded, Quip) and CTO of Facebook. Bret was also one of Google's earliest product managers and co-creator of Google Maps. Before founding Sierra, Clay spent 18 years at Google, where he most recently led Google Labs. Earlier, he started and led Google’s AR/VR effort, Project Starline, and Google Lens. Before that, Clay led the product and design teams for Google Workspace. What you’ll doSierra’s engineering team has ~40 mostly senior engineers, including Mihai, Belinda, Arya, and Wei. We work in small, autonomous teams oriented around customer problems. Here are some examples of what you’ll work on:Simulation & Benchmarking: How can we craft a simulation platform to test AI agents against every real-world scenario imaginable? (See 𝜏-Bench)Content Management: How do we create intuitive, no-code tools that allow anyone to guide and test AI agents?Agent Development Lifecycle: How do we adapt traditional software development methodologies to accommodate AI agents’ non-deterministic behavior, natural language interactions, and reliance on large language models? (See Sierra’s ADLC)Generative Agent Development: How do we accelerate agent creation using generative tools like Cursor and Claude Code? Can we build self-improving systems based on real-world interactions, customer-driven feedback, and self-play?What you'll bringA passion for being on the frontier of AI products.Motivation and high-agency to drive outcomes - we have a high-autonomy culture and each team member has lots of agency to overcome obstacles and achieve customer impact.Alignment to our company values throughout your work, notably finding a balance between Craftsmanship, Customer Obsession, and Competitive Intensity.4+ years hands-on experience building production products and systems.Experience and comfort to build and ship full-stack solutions.Degree in Computer Science or related field, or equivalent professional experience.Even better...Experience building AI-powered products.A sharp eye for design.Experience with Go and Typescript.Experience building developer tooling, programming languages, or databases.Leadership experience on technical projects or teams.Our valuesTrust: We build trust with our customers with our accountability, empathy, quality, and responsiveness. We build trust in AI by making it more accessible, safe, and useful. We build trust with each other by showing up for each other professionally and personally, creating an environment that enables all of us to do our best work.Customer Obsession: We deeply understand our customers’ business goals and relentlessly focus on driving outcomes, not just technical milestones. Everyone at the company knows and spends time with our customers. When our customer is having an issue, we drop everything and fix it.Craftsmanship: We get the details right, from the words on the page to the system architecture. We have good taste. When we notice something isn’t right, we take the time to fix it. We are proud of the products we produce. We continuously self-reflect to continuously self-improve.Intensity: We know we don’t have the luxury of patience. We play to win. We care about our product being the best, and when it isn’t, we fix it. When we fail, we talk about it openly and without blame so we succeed the next time.Family: We know that balance and intensity are compatible, and we model it in our actions and processes. We are the best technology company for parents. We support and respect each other and celebrate each other’s personal and professional achievements.What we offerWe want our benefits to reflect our values and offer the following to full-time employees:Flexible (Unlimited) Paid Time OffMedical, Dental, and Vision benefits for you and your familyLife Insurance and Disability BenefitsRetirement Plan (e.g., 401K, pension) with Sierra matchParental LeaveFertility and family building benefits through CarrotLunch, as well as delicious snacks and coffee to keep you energized Discretionary Benefit Stipend giving people the ability to spend where it matters mostFree alphorn lessonsThese benefits are further detailed in Sierra's policies and are subject to change at any time, consistent with the terms of any applicable compensation or benefits plans. Eligible full-time employees can participate in Sierra's equity plans subject to the terms of the applicable plans and policies.Be you, with usWe're working to bring the transformative power of AI to every organization in the world. To do so, it is important to us that the diversity of our employees represents the diversity of our customers. We believe that our work and culture are better when we encourage, support, and respect different skills and experiences represented within our team. We encourage you to apply even if your experience doesn't precisely match the job description. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, national origin, age, disability, veteran status, pregnancy, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

No items found.

Apply

January 6, 2026

Hidden link

AI Product & Operation Jobs

All Jobs

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Evaluation Scenario Writer - AI Agent Testing Specialist

Freelance AI Evaluation Scenario Writer

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Evaluation Scenario Writer - AI Agent Testing Specialist

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mathematician - Freelance AI Trainer

Evaluation Scenario Writer - AI Agent Testing Specialist

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Evaluation Scenario Writer - AI Agent Testing Specialist

Head of Forward Deployed Engineering

Software Engineer, Agent Studio

Popular Categories