Skip to content

Server crashes with psutil OSError EINVAL from pidfd_open on Linux 6.x #167

@DRCubix

Description

@DRCubix

Bug Report

Summary

The AutoForge server (uvicorn) crashes intermittently on Linux 6.x due to an unhandled OSError: [Errno 22] Invalid argument from psutil's wait_pid_pidfd_open(). This happens when psutil tries to wait on agent child processes that have already exited.

Environment

  • OS: Ubuntu 24.04, Linux 6.14.0-27-generic
  • Python: 3.12.3
  • psutil: 7.2.2 (installed via requirements-prod.txt)
  • AutoForge: 0.1.3 (npm)

Traceback

File "psutil/_psposix.py", line 304, in wait_pid
    return wait_pid_pidfd_open(pid, timeout)
File "psutil/_psposix.py", line 182, in wait_pid_pidfd_open
    pidfd = os.pidfd_open(pid, 0)
OSError: [Errno 22] Invalid argument
INFO:     Application shutdown complete.
INFO:     Finished server process [1451407]

Root Cause

This is a kernel race condition in pidfd_open() on Linux 6.x. When a child agent process exits, its kernel task struct can be detached before psutil calls pidfd_open() on it, causing EINVAL. psutil 7.2.2 does not handle this errno, so the exception propagates up and crashes the uvicorn server.

The bug is in psutil — I've filed it upstream: giampaolo/psutil#2715

This same race affects systemd and crun:

Impact

The server crashes completely, killing all running agents. The user must manually restart AutoForge, losing any in-flight work in the agent's current session.

Workaround

Patch psutil's _psposix.py in the autoforge venv to handle EINVAL by falling back to wait_pid_posix():

# In wait_pid_pidfd_open(), add after the ESRCH check:
if err.errno == errno.EINVAL:
    debug(f"pidfd_open() got EINVAL ({err!r}); use fallback")
    return wait_pid_posix(pid, timeout)

An automated patch script can be placed at ~/.autoforge/patches/psutil-pidfd-einval.py and hooked into lib/cli.js to auto-apply after venv setup.

Suggested Fix for AutoForge

Until psutil fixes this upstream, AutoForge could either:

  1. Pin psutil to a version before the pidfd_open change (< 7.0)
  2. Include a post-install patch in the CLI setup
  3. Add a try/except wrapper around the psutil.wait_procs() / Process.wait() calls in process_utils.py

Option 3 is probably the most robust since it doesn't depend on psutil internals:

# In process_utils.py kill_process_tree():
try:
    gone, still_alive = psutil.wait_procs(children, timeout=timeout)
except OSError as e:
    if e.errno == 22:  # EINVAL from pidfd_open race
        gone, still_alive = [], children
    else:
        raise

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions