-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Summary
Agents can exploit the benchmark by accessing git history to retrieve original function implementations instead of writing them from scratch.
Description
The Commit0 benchmark works by providing repositories where code has been stripped out and agents are instructed to implement the functions. However, the current setup clones repositories with full git history:
# In commit0/harness/spec.py line 117
f"git clone -o origin https://github.com/{repo} {self.repo_directory}"While git remote remove origin is called later (line 122), the full git history is already cloned locally. This allows agents to:
- Run
git logto see commit history - Run
git difforgit showto see the original implementations that were removed - Copy-paste the original code instead of implementing from scratch
Evidence
This was observed by @fjzzq2002 who found an agent on the "portalocker" repository:
- At turn 121, the agent ran
git login one-line format - The agent's reasoning included checking git history to restore the original implementation
- The agent then used git history to retrieve and restore original implementations
Impact
This undermines the validity of Commit0 benchmark results since agents can achieve high scores by exploiting git history rather than demonstrating actual code implementation capabilities.
Suggested Fix
Use shallow clone with --depth 1 to prevent access to git history:
f"git clone --depth 1 -o origin https://github.com/{repo} {self.repo_directory}"This should be applied in both Commit0Spec.make_repo_script_list() (line 117) and SWEBenchSpec.make_repo_script_list() (line 221).
Related
- OpenHands benchmarks fix: fix(commit0): use shallow clone to prevent reward hacking OpenHands/benchmarks#422
Thanks
Thanks to @fjzzq2002 (Ziqian Zhong) for discovering and reporting this vulnerability to the OpenHands team.