Skip to content

Preserving hardware memory during cuvid decoding, exporting/importing via dlpack.#2155

Open
caffeinism wants to merge 8 commits intoPyAV-Org:mainfrom
caffeinism:dlpack
Open

Preserving hardware memory during cuvid decoding, exporting/importing via dlpack.#2155
caffeinism wants to merge 8 commits intoPyAV-Org:mainfrom
caffeinism:dlpack

Conversation

@caffeinism
Copy link
Contributor

@caffeinism caffeinism commented Feb 4, 2026

#2148

Hello? I'm a user with limited knowledge of libav, dlpack, and cython. However, recognizing this as a necessary feature, I drafted this with the help of an LLM.

Motivation

If an application decodes video, performs GPU operations, and then re-encodes it, PyAV currently incurs a significant amount of memcopy. (GPU (cuvid) -> CPU (PyAV) -> GPU (Torch, etc.) -> CPU (PyAV) -> GPU (nvenc)) However, if we could export frames decoded by cuvid to dlpack while keeping them on the GPU, we wouldn't need to move the frames to CPU memory.

I passed all existing tests, but with such extensive modifications, it seems difficult for a beginner like me to catch every single detail. However, since most changes involve adding features rather than modifying existing ones, I hope this PR serves as a good starting point.

Usage example

import av
from av.codec.hwaccel import HWAccel
import torch

hwaccel = HWAccel(
    device_type="cuda",
    device=0,
    allow_software_fallback=False,
    output_format="hw", # preserve hw memory
)

# decode using cuvid
with av.open(from_video_filename, "r", hwaccel=hwaccel) as c:
    frame = next(c.decode(video=0))
    y = torch.from_dlpack(frame.planes[0]) # device(type='cuda', index=0), torch.uint8, torch.Size([H, W])
    uv = torch.from_dlpack(frame.planes[1]) # device(type='cuda', index=0), torch.uint8, torch.Size([H/2, W/2])

f = av.VideoFrame.from_dlpack(((y*0.5).to(torch.uint8), uv)) # some operation

with av.open(to_video_filename, "w") as c:
    s = c.add_stream("h264_nvenc", rate=24) # encode using nvenc
    for it in s.encode(f):
        c.mux(it)
    for it in s.encode(None):
        c.mux(it)

@WyattBlue WyattBlue added the needs tests This PR needs a test label Feb 4, 2026
@caffeinism
Copy link
Contributor Author

@WyattBlue If I add tests, will it work fine even if it only runs on a CUDA machine? I don't think it will work in the GitHub workflow.

@WyattBlue
Copy link
Member

WyattBlue commented Feb 4, 2026

You need to test the interface. For example, hw_format does not have an pyi interface, and writing a test would catch that fact.

@WyattBlue
Copy link
Member

av/hwcontext.pxd‎ should be merged with include/avutil. *.pxd files should otherwise not be free radicals, i.e., they should have a corresponding real .py file.

@caffeinism
Copy link
Contributor Author

You need to test the interface. For example, hw_format does not have an pyi interface, and writing a test would catch that fact.

Could you please explain it in a bit more detail?

av/hwcontext.pxd‎ should be merged with include/avutil. *.pxd files should otherwise not be free radicals, i.e., they should have a corresponding real .py file.

In this case, how should dlpack.pxd be handled? Should this also be moved to the include directory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs tests This PR needs a test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants