Please help Star this repo if you find it useful. Thank you!
MatchAttention is a contiguous and differentiable sliding-window attention mechanism that enables long-range connection, explict matching, and linear complexity. When applied to stereo matching and optical flow, real-time and state-of-the-arts performance can be achieved.
Strong zero-shot generalization on real-world datasets when trained on FSD Mix datasets.
High-Resolution inference with fine-grained details
Real-time inference (MatchStereo-T @1280x720 on a RTX 4060 Ti GPU)
Top row shows the color image and GT occlusion mask from the Middlebury dataset (Playtable, 1852 x 2720).
Bottom row shows the cross relative position
MatchStereo-B ranked 1st in average error on the public Middlebury benchmark (2025-05-10)
State-of-the-arts performance on four real-world benchmarks.

| Model | Params | Resolution | FLOPs | GPU Mem | Latency | Checkpoint |
|---|---|---|---|---|---|---|
| MatchStereo-T | 8.78M | 1536x1536 | 0.34T | 1.45G | 38ms | Hugging Face |
| MatchStereo-S | 25.2M | 1536x1536 | 0.98T | 1.73G | 45ms | Hugging Face |
| MatchStereo-B | 75.5M | 1536x1536 | 3.59T | 2.94G | 75ms | Hugging Face |
| MatchFlow-B | 75.5M | 1536x1536 | 3.60T | 3.22G | 77ms | Hugging Face |
GPU memory and latency measured on a single RTX 5090 GPU with torch.compile enabled and FP16 precision
git clone https://github.com/TingmanYan/MatchAttention
cd MatchAttention## Create enviorment
conda create -n matchstereo python=3.10
conda activate matchstereo
## For pytorch 2.5.1+cu124
conda install -c "nvidia/label/cuda-12.4.0" cuda-toolkit
conda install pytorch==2.5.1 torchvision==0.20.1 pytorch-cuda=12.4 -c pytorch -c nvidia
## For pytorch 2.7.1+cu128
conda install nvidia/label/cuda-12.8.1::cuda-toolkit
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
## other dependencies
pip install -r requirements.txt
## (Optional) Install CUDA implementation of match attention
cd models
bash compile.sh
## Download model weights to ./checkpointsNote
PyTorch 2.0+ is required for torch.compile
# on custom images
# stereo
python run_img.py --img0_dir images/left/ --img1_dir images/right/ --output_path outputs --checkpoint_path checkpoints/matchstereo_tiny_fsd.pth --no_compile
# flow
python run_img.py --img0_dir images/frame1/ --img1_dir images/frame2/ --output_path outputs --variant base --checkpoint_path checkpoints/matchflow_base_sintel.pth --mode flow --no_compile
# test on Middlebury
python run_img.py --middv3_dir images/MiddEval3/ --variant tiny --checkpoint_path checkpoints/matchstereo_tiny_fsd.pth --test_inference_time --inference_size 1536 1536 --mat_impl pytorch --precision fp16
python run_img.py --middv3_dir images/MiddEval3/ --variant small --checkpoint_path checkpoints/matchstereo_small_fsd.pth --test_inference_time --inference_size 2176 3840 --mat_impl cuda --precision fp16
python run_img.py --middv3_dir images/MiddEval3/ --variant base --checkpoint_path checkpoints/matchstereo_base_fsd.pth --mat_impl cuda --low_res_init --no_compile
# test on ETH3D
python run_img.py --eth3d_dir images/ETH3D/ --checkpoint_path checkpoints/matchstereo_tiny_fsd.pth --inference_size 416 832 --mat_impl pytorch --precision fp16 --device_id -1 # run on CPUpython gradio_app.pypython zed_capture.py --checkpoint_path checkpoints/matchstereo_tiny_fsd.pthPlease cite our paper if you find it useful
@article{yan2025matchattention,
title={MatchAttention: Matching the Relative Positions for High-Resolution Cross-View Matching},
author={Tingman Yan and Tao Liu and Xilian Yang and Qunfei Zhao and Zeyang Xia},
journal={arXiv preprint arXiv:2510.14260},
year={2025}
}
We would like to thank the authors of UniMatch, RAFT-Stereo, MetaFormer, and TransNeXt for their code release. Thanks to the author of FoundationStereo for the release of the FSD dataset.
Please reach out to Tingman Yan for questions.


