Freelance Agent Evaluation Engineer
Mindrift is looking for freelance engineers to evaluate AI coding agents by building challenging tasks and test criteria. You will work on creating realistic simulated developer environments using Python, Docker, and full-stack tools. The role requires deep experience in testing complex systems and understanding model failure modes. This is a project-based, remote opportunity for experienced software engineers.
50k new jobs listed every day. Install TAL to find more jobs like this.

Experience
5+ years
Function
Engineering
Work mode
Remote, India
Company
Tier 2
What you will work on
Mindrift is looking for freelance engineers to evaluate AI coding agents by building challenging tasks and test criteria. You will work on creating realistic simulated developer environments using Python, Docker, and full-stack tools. The role requires deep experience in testing complex systems and understanding model failure modes. This is a project-based, remote opportunity for experienced software engineers.
TAL's take
This is a freelance, project-based gig with low compensation, despite requiring significant engineering seniority.
The JD is highly specific regarding the responsibilities, tech stack, and the unique nature of evaluating AI agents.
Watchouts
- project-based
- not permanent employment
- freelance
- low compensation
Must haves
- 5+ years in software development
- Proficiency in Python (FastAPI, pytest, async/await)
- Experience in full-stack development with React
- Experience writing functional and integration tests
- Familiarity with Docker and infrastructure tools
- B2 level English proficiency
Tools and skills
About the company
Mindrift is an emerging platform for AI-related project work, categorized as a mid-tier entity in the AI services space.