AI Software Engineer Jobs | Top AI Software Engineer Openings in 2025

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

30

0

-

30

No items found.

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

30

0

-

30

Spain

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

30

0

-

30

Italy

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

40

0

-

40

Singapore

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $40/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

France

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

40

0

-

40

Saudi Arabia

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $40/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

21

0

-

21

Mexico

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $21/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

24

0

-

24

South Africa

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $24/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

45

0

-

45

Canada

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

17

-

17

Brazil

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

United Kingdom

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

12

0

-

12

Philippines

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $12/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

80

0

-

80

United States

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

17

0

-

17

Argentina

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $17/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

45

0

-

45

Australia

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $45/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

80

0

-

80

United States

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $80/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

50

0

-

50

Germany

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Mindrift

1001-5000

USD

30

0

-

30

No items found.

Part-time

Remote

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What we doThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe. Who we're looking forCalling all security researchers, engineers, and penetration testers with a strong foundation in problem-solving, offensive security, and AI-related risk assessment.If you thrive on digging into complex systems, uncovering hidden vulnerabilities, and thinking creatively under constraints, join us! We’re looking for someone who can bring a hands-on approach to technical challenges, whether breaking into systems to expose weaknesses or building secure tools and processes. We value contributors with a passion for continuous learning, experimentation, and adaptability. About the projectWe’re on the hunt for hands-on Python engineers for a new project focused on developing Model Context Protocol (MCP) servers and internal tools for running and evaluating agent behavior. You’ll implement base methods for agent action verification, integrate with internal and client infrastructures, and help fill tooling gaps across the team. What you’ll be doing:Developing and maintaining MCP-compatible evaluation serversImplementing logic to check agent actions against scenario definitionsCreating or extending tools that writers and QAs use to test agentsWorking closely with infrastructure engineers to ensure compatibilityOccasionally helping with test writing or debug sessions when neededAlthough we’re only looking for experts for this current project, contributors with consistent high-quality submissions may receive an invitation for ongoing collaboration across future projects. How to get started:Apply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsThe ideal contributor will have:4+ years of Python development experience, ideally in backend or toolsSolid experience building APIs, testing frameworks, or protocol-based interfacesUnderstanding of Docker, Linux CLI, and HTTP-based communicationAbility to integrate new tools into existing infrastructuresFamiliarity with how LLM agents are prompted, executed, and evaluatedClear documentation and communication skills - you’ll work with QA and writersWe also value applicants who have:Experience with Model Context Protocol (MCP) or similar structured agent-server interfacesKnowledge of FastAPI or similar async web frameworksExperience working with LLM logs, scoring functions, or sandbox environmentsAbility to support dev environments (devcontainers, CI configs, linters)JS experienceBenefitsGet paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.Participate in an advanced AI project and gain valuable experience to enhance your portfolio.Influence how future AI models understand and communicate in your field of expertise.

No items found.

Apply

January 6, 2026

Hidden link

Software Engineer, Agent Studio

Sierra

201-500

USD

390000

230000

-

390000

United States

Full-time

Remote

About usAt Sierra, we’re creating a platform to help businesses build better, more human customer experiences with AI. We are primarily an in-person company based in San Francisco, with growing offices in Atlanta, New York, London, France, Singapore, and Japan.We are guided by a set of values that are at the core of our actions and define our culture: Trust, Customer Obsession, Craftsmanship, Intensity, and Family. These values are the foundation of our work, and we are committed to upholding them in everything we do.Our co-founders are Bret Taylor and Clay Bavor. Bret currently serves as Board Chair of OpenAI. Previously, he was co-CEO of Salesforce (which had acquired the company he founded, Quip) and CTO of Facebook. Bret was also one of Google's earliest product managers and co-creator of Google Maps. Before founding Sierra, Clay spent 18 years at Google, where he most recently led Google Labs. Earlier, he started and led Google’s AR/VR effort, Project Starline, and Google Lens. Before that, Clay led the product and design teams for Google Workspace. What you’ll doSierra’s engineering team has ~40 mostly senior engineers, including Mihai, Belinda, Arya, and Wei. We work in small, autonomous teams oriented around customer problems. Here are some examples of what you’ll work on:Simulation & Benchmarking: How can we craft a simulation platform to test AI agents against every real-world scenario imaginable? (See 𝜏-Bench)Content Management: How do we create intuitive, no-code tools that allow anyone to guide and test AI agents?Agent Development Lifecycle: How do we adapt traditional software development methodologies to accommodate AI agents’ non-deterministic behavior, natural language interactions, and reliance on large language models? (See Sierra’s ADLC)Generative Agent Development: How do we accelerate agent creation using generative tools like Cursor and Claude Code? Can we build self-improving systems based on real-world interactions, customer-driven feedback, and self-play?What you'll bringA passion for being on the frontier of AI products.Motivation and high-agency to drive outcomes - we have a high-autonomy culture and each team member has lots of agency to overcome obstacles and achieve customer impact.Alignment to our company values throughout your work, notably finding a balance between Craftsmanship, Customer Obsession, and Competitive Intensity.4+ years hands-on experience building production products and systems.Experience and comfort to build and ship full-stack solutions.Degree in Computer Science or related field, or equivalent professional experience.Even better...Experience building AI-powered products.A sharp eye for design.Experience with Go and Typescript.Experience building developer tooling, programming languages, or databases.Leadership experience on technical projects or teams.Our valuesTrust: We build trust with our customers with our accountability, empathy, quality, and responsiveness. We build trust in AI by making it more accessible, safe, and useful. We build trust with each other by showing up for each other professionally and personally, creating an environment that enables all of us to do our best work.Customer Obsession: We deeply understand our customers’ business goals and relentlessly focus on driving outcomes, not just technical milestones. Everyone at the company knows and spends time with our customers. When our customer is having an issue, we drop everything and fix it.Craftsmanship: We get the details right, from the words on the page to the system architecture. We have good taste. When we notice something isn’t right, we take the time to fix it. We are proud of the products we produce. We continuously self-reflect to continuously self-improve.Intensity: We know we don’t have the luxury of patience. We play to win. We care about our product being the best, and when it isn’t, we fix it. When we fail, we talk about it openly and without blame so we succeed the next time.Family: We know that balance and intensity are compatible, and we model it in our actions and processes. We are the best technology company for parents. We support and respect each other and celebrate each other’s personal and professional achievements.What we offerWe want our benefits to reflect our values and offer the following to full-time employees:Flexible (Unlimited) Paid Time OffMedical, Dental, and Vision benefits for you and your familyLife Insurance and Disability BenefitsRetirement Plan (e.g., 401K, pension) with Sierra matchParental LeaveFertility and family building benefits through CarrotLunch, as well as delicious snacks and coffee to keep you energized Discretionary Benefit Stipend giving people the ability to spend where it matters mostFree alphorn lessonsThese benefits are further detailed in Sierra's policies and are subject to change at any time, consistent with the terms of any applicable compensation or benefits plans. Eligible full-time employees can participate in Sierra's equity plans subject to the terms of the applicable plans and policies.Be you, with usWe're working to bring the transformative power of AI to every organization in the world. To do so, it is important to us that the diversity of our employees represents the diversity of our customers. We believe that our work and culture are better when we encourage, support, and respect different skills and experiences represented within our team. We encourage you to apply even if your experience doesn't precisely match the job description. We strive to evaluate all applicants consistently without regard to race, color, religion, gender, national origin, age, disability, veteran status, pregnancy, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

No items found.

Apply

January 6, 2026

Hidden link

Software Engineering Manager

Mirage

101-200

USD

350000

250000

-

350000

United States

Full-time

Remote

Mirage is the leading AI short-form video company. We’re building full-stack foundation models and products that redefine video creation, production and editing. Over 20 million creators and businesses use Mirage’s products to reach their full creative and commercial potential.We are a rapidly growing team of ambitious, experienced, and devoted engineers, researchers, designers, marketers, and operators based in NYC. As an early member of our team, you’ll have an opportunity to have an outsized impact on our products and our company's culture.Our ProductsCaptions Mirage Studio Our TechnologyAI Research @ MirageMirage Model AnnouncementSeeing Voices (white-paper)Press CoverageTechCrunchLenny’s PodcastForbes AI 50Fast CompanyOur InvestorsWe’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.** Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) We do not work with third-party recruiting agencies, please do not contact us** Software Engineering ManagerEngineering at Mirage isn’t narrowly defined. As a small, collaborative team, engineers are trusted with a high degree of ownership and agency. As an Engineering Manager, you’ll lead and grow a world-class team of software engineers while providing clear direction and stability. You’ll be responsible for driving technical excellence, fostering growth and collaboration, and building foundational capabilities across product, infrastructure, and core platforms. ResponsibilitiesOversee the design and operation of our core platform (3rd party providers, storage, billing, observability, security, API).Provide technical leadership for various product and platform featuresImprove developer experience so the whole team ships fasterGuide efforts that bridge AI research to production across all modalities (video, audio, image, text)Understand the capabilities and limitations of SOTA AI models and how to best leverage them in productsPartner with product, design, and research to keep our development tightly aligned with user needs and business objectivesWhat Makes You a Great FitA track record of building, mentoring, and managing high-performing engineering teamsExperience shipping high-impact systems, platforms, and products in productionExcellent judgment: you balance technical discernment with a strong sense of practicality and time managementAble to operate effectively in an extremely fast-paced environment, while setting your team up for sustainable successStrong communicator who can inspire and align a multidisciplinary teamEven Better If...You have worked extensively with LLMs, generative media models, and have a pulse on where the technology is goingYou are grounded, collaborative, and willing to do whatever it takes to help the team winYou have been a startup founder or an early engineer at oneBenefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite! Grubhub subscriptionHealth & Wellness Perks (Talkspace, Kindbody, One Medical subscription, HealthAdvocate, Teladoc)Multiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.

No items found.

Apply

January 5, 2026

Hidden link

Top Software Engineer Jobs Openings in 2025

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

MCP & Tools Python Developer - Agent Evaluation Infrastructure

Software Engineer, Agent Studio

Software Engineering Manager

Popular Categories