AI Creative & Design Jobs
Latest roles in AI Creative & Design, reviewed by real humans for quality and clarity.
People also search for:
All Jobs
Showing 61 – 79 of 79 jobs
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
40
0
-
40
Saudi Arabia
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $40/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
21
0
-
21
Mexico
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $21/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift
1001-5000
USD
30
0
-
30
Italy
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
24
0
-
24
South Africa
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $24/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Freelance AI Evaluation Scenario Writer
Mindrift
1001-5000
USD
45
0
-
45
Canada
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
45
0
-
45
Canada
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift
1001-5000
USD
12
0
-
12
Philippines
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
30
0
-
30
Poland
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift
1001-5000
USD
50
0
-
50
No items found.
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift
1001-5000
USD
24
0
-
24
South Africa
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleWe’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Although every project is unique, you might typically:Create structured test cases that simulate complex human workflows.Define gold-standard behavior and scoring logic to evaluate agent actions. Analyze agent logs, failure modes, and decision paths.Work with code repositories and test frameworks to validate your scenarios.Iterate on prompts, instructions, and test cases to improve clarity and difficulty.Ensure that scenarios are production-ready, easy to run, and reusable.How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsBachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields. Background in QA, software testing, data analysis, or NLP annotation.Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).Strong written communication skills in English.Comfortable with structured formats like JSON/YAML for scenario description.Can define expected agent behaviors (gold paths) and scoring logic.Basic experience with Python and JS.Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.Nice to HaveExperience in writing manual or automated test cases.Familiarity with LLM capabilities and typical failure modes.Understanding of scoring metrics (precision, recall, coverage, reward functions).BenefitsContribute on your own schedule, from anywhere in the world. This opportunity allows you to:Get paid for your expertise, with rates that can go up to $24/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Mathematician - Freelance AI Trainer
Mindrift
1001-5000
USD
30
0
-
30
Poland
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleGenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Mathematics, you’ll have the opportunity to collaborate on these projects.Although every project is unique, you might typically: Generate prompts that challenge AI. Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. Correct the model’s responses based on your domain-specific knowledge. How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsYou hold a Bachelor's, Master’s or PhD Degree in Mathematics and/or in relevant area. You have at least 3 years of professional experience in relevant fields:Algebra (Intermediate Algebra, Algebra, pre-algebra), Geometry, Calculus (pre-calculus), Probability (counting and probability), Number theory.Your level of English is advanced (C1) or above. You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines. Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.BenefitsWhy this freelance opportunity might be a great fit for you? Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.Work on advanced AI projects and gain valuable experience that enhances your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Mathematician - Freelance AI Trainer
Mindrift
1001-5000
USD
42
0
-
42
Germany
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleGenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Mathematics, you’ll have the opportunity to collaborate on these projects.Although every project is unique, you might typically: Generate prompts that challenge AI. Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. Correct the model’s responses based on your domain-specific knowledge. How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsYou hold a Bachelor's, Master’s or PhD Degree in Mathematics and/or in relevant area. You have at least 3 years of professional experience in relevant fields:Algebra (Intermediate Algebra, Algebra, pre-algebra), Geometry, Calculus (pre-calculus), Probability (counting and probability), Number theory.Your level of English is advanced (C1) or above. You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines. Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.BenefitsWhy this freelance opportunity might be a great fit for you? Get paid for your expertise, with rates that can go up to $42/hour depending on your skills, experience, and project needs.Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.Work on advanced AI projects and gain valuable experience that enhances your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
Mathematician - Freelance AI Trainer
Mindrift
1001-5000
USD
30
0
-
30
Spain
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.What we doThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleGenAI models are improving very quickly, and one of our goals is to make them capable of addressing specialized questions and achieving complex reasoning skills. If you join the platform as an AI Tutor in Mathematics, you’ll have the opportunity to collaborate on these projects.Although every project is unique, you might typically: Generate prompts that challenge AI. Define comprehensive scoring criteria to evaluate the accuracy of the AI’s answers. Correct the model’s responses based on your domain-specific knowledge. How to get startedSimply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.RequirementsYou hold a Bachelor's, Master’s or PhD Degree in Mathematics and/or in relevant area. You have at least 3 years of professional experience in relevant fields:Algebra (Intermediate Algebra, Algebra, pre-algebra), Geometry, Calculus (pre-calculus), Probability (counting and probability), Number theory.Your level of English is advanced (C1) or above. You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines. Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.BenefitsWhy this freelance opportunity might be a great fit for you? Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.Work on advanced AI projects and gain valuable experience that enhances your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
17
17
-
17
Brazil
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
80
0
-
80
United States
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
50
0
-
50
United Kingdom
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
12
0
-
12
Philippines
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
80
0
-
80
United States
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
50
0
-
50
No items found.
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
MCP & Tools Python Developer - Agent Evaluation Infrastructure
Mindrift
1001-5000
USD
80
0
-
80
United States
Part-time
Remote
false
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.
No items found.
Apply
January 6, 2026
No job found
Your search did not match any job. Please try again
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.