-
Notifications
You must be signed in to change notification settings - Fork 364
Description
Bug Report
Summary
The AutoForge server (uvicorn) crashes intermittently on Linux 6.x due to an unhandled OSError: [Errno 22] Invalid argument from psutil's wait_pid_pidfd_open(). This happens when psutil tries to wait on agent child processes that have already exited.
Environment
- OS: Ubuntu 24.04, Linux 6.14.0-27-generic
- Python: 3.12.3
- psutil: 7.2.2 (installed via requirements-prod.txt)
- AutoForge: 0.1.3 (npm)
Traceback
File "psutil/_psposix.py", line 304, in wait_pid
return wait_pid_pidfd_open(pid, timeout)
File "psutil/_psposix.py", line 182, in wait_pid_pidfd_open
pidfd = os.pidfd_open(pid, 0)
OSError: [Errno 22] Invalid argument
INFO: Application shutdown complete.
INFO: Finished server process [1451407]
Root Cause
This is a kernel race condition in pidfd_open() on Linux 6.x. When a child agent process exits, its kernel task struct can be detached before psutil calls pidfd_open() on it, causing EINVAL. psutil 7.2.2 does not handle this errno, so the exception propagates up and crashes the uvicorn server.
The bug is in psutil — I've filed it upstream: giampaolo/psutil#2715
This same race affects systemd and crun:
- pidfd-util: introduce pidfd_open_safe() systemd/systemd#36982
- failed to delete container with error open pidfd: Invalid argument containers/crun#709
Impact
The server crashes completely, killing all running agents. The user must manually restart AutoForge, losing any in-flight work in the agent's current session.
Workaround
Patch psutil's _psposix.py in the autoforge venv to handle EINVAL by falling back to wait_pid_posix():
# In wait_pid_pidfd_open(), add after the ESRCH check:
if err.errno == errno.EINVAL:
debug(f"pidfd_open() got EINVAL ({err!r}); use fallback")
return wait_pid_posix(pid, timeout)An automated patch script can be placed at ~/.autoforge/patches/psutil-pidfd-einval.py and hooked into lib/cli.js to auto-apply after venv setup.
Suggested Fix for AutoForge
Until psutil fixes this upstream, AutoForge could either:
- Pin psutil to a version before the
pidfd_openchange (< 7.0) - Include a post-install patch in the CLI setup
- Add a try/except wrapper around the
psutil.wait_procs()/Process.wait()calls inprocess_utils.py
Option 3 is probably the most robust since it doesn't depend on psutil internals:
# In process_utils.py kill_process_tree():
try:
gone, still_alive = psutil.wait_procs(children, timeout=timeout)
except OSError as e:
if e.errno == 22: # EINVAL from pidfd_open race
gone, still_alive = [], children
else:
raise