StreamSplat is a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner.
- Feed-forward inference: No per-scene optimization required
- Camera-free: Works directly with uncalibrated monocular videos
- Dynamic scene modeling: Handles both static and dynamic scene elements through polynomial motion modeling
- Probabilistic Gaussian prediction: Uses truncated Gaussian models for robust Gaussian position modeling
- Two-stage training: Stage 1 trains the static encoder, Stage 2 trains the dynamic decoder
re10k-1.mp4
re10k-2.mp4
DAVIS.mp4
vos.mp4
- Create conda environment:
conda env create -f environment.yml
conda activate StreamSplat- Build the differentiable Gaussian rasterizer:
cd submodules/diff-gaussian-rasterization-orth
pip install .- Download pretrained depth model:
Download Depth Anything V2 checkpoint and place it in the checkpoints/ directory:
mkdir -p checkpoints
# Download depth_anything_v2_vitl.pth from https://github.com/DepthAnything/Depth-Anything-V2
# Place it in checkpoints/depth_anything_v2_vitl.pthStreamSplat supports training on multiple datasets. All datasets require pre-computed depth maps using Depth Anything V2.
| Dataset | Type | Description |
|---|---|---|
| RealEstate10K | Static | Real estate videos |
| CO3Dv2 | Static | Object-centric multi-view |
| DAVIS | Dynamic | High-quality videos |
| YouTube-VOS | Dynamic | Large-scale videos |
Use the provided script to preprocess depth maps for DAVIS (similar scripts can be adapted for other datasets):
python preprocess_depth_davis.py --root_path /path/to/davisEdit configs/options.py and configs/options_decoder.py to set dataset paths:
root_path_re10k: str = "/path/to/re10k"
root_path_co3d: str = "/path/to/co3d"
root_path_davis: str = "/path/to/davis"
root_path_vos: str = "/path/to/youtube-vos"Create an accelerate config file (or use the provided acc_configs/gpu8.yaml):
accelerate configTrain the static encoder on combined datasets:
accelerate launch --config_file acc_configs/gpu8.yaml train.py combined \
--workspace /path/to/workspace/encoder_expAfter Stage 1 completes, train the dynamic decoder with the frozen encoder:
accelerate launch --config_file acc_configs/gpu8.yaml train_decoder.py combined_rcvd \
--workspace /path/to/workspace/decoder_exp \
--encoder_path /path/to/workspace/encoder_exp/model.safetensorsTraining progress is logged to Weights & Biases. Set up wandb before training:
wandb loginCheckpoints are saved every 10 epochs and every 30 minutes to checkpoint_latest/.
If you find this work useful, please cite:
@article{wu2025streamsplat,
title={StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams},
author={Zike Wu and Qi Yan and Xuanyu Yi and Lele Wang and Renjie Liao},
journal={arXiv preprint arXiv:2506.08862},
year={2025},
}This project builds upon several excellent works:
- 3D Gaussian Splatting for the differentiable rasterization
- diff-gaussian-rasterization for the depth & alpha rendering
- DINOv2 for vision features
- Depth Anything V2 for monocular depth estimation
- Gamba and MVGamba for the codebase and training framework
- Nutworld for orthographic rasterization
- edm for data augmentation
