Skip to content

[ICLR 2026] StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Notifications You must be signed in to change notification settings

DSL-Lab/StreamSplat

Repository files navigation

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

arXiv

results

Overview

StreamSplat is a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner.

  • Feed-forward inference: No per-scene optimization required
  • Camera-free: Works directly with uncalibrated monocular videos
  • Dynamic scene modeling: Handles both static and dynamic scene elements through polynomial motion modeling
  • Probabilistic Gaussian prediction: Uses truncated Gaussian models for robust Gaussian position modeling
  • Two-stage training: Stage 1 trains the static encoder, Stage 2 trains the dynamic decoder

Videos

re10k-1.mp4
re10k-2.mp4
DAVIS.mp4
vos.mp4

Environment Setup

  1. Create conda environment:
conda env create -f environment.yml
conda activate StreamSplat
  1. Build the differentiable Gaussian rasterizer:
cd submodules/diff-gaussian-rasterization-orth
pip install .
  1. Download pretrained depth model:

Download Depth Anything V2 checkpoint and place it in the checkpoints/ directory:

mkdir -p checkpoints
# Download depth_anything_v2_vitl.pth from https://github.com/DepthAnything/Depth-Anything-V2
# Place it in checkpoints/depth_anything_v2_vitl.pth

Dataset Preparation

StreamSplat supports training on multiple datasets. All datasets require pre-computed depth maps using Depth Anything V2.

Supported Datasets

Dataset Type Description
RealEstate10K Static Real estate videos
CO3Dv2 Static Object-centric multi-view
DAVIS Dynamic High-quality videos
YouTube-VOS Dynamic Large-scale videos

Preprocessing Depth Maps

Use the provided script to preprocess depth maps for DAVIS (similar scripts can be adapted for other datasets):

python preprocess_depth_davis.py --root_path /path/to/davis

Configure Dataset Paths

Edit configs/options.py and configs/options_decoder.py to set dataset paths:

root_path_re10k: str = "/path/to/re10k"
root_path_co3d: str = "/path/to/co3d"
root_path_davis: str = "/path/to/davis"
root_path_vos: str = "/path/to/youtube-vos"

Training

Configure Accelerate

Create an accelerate config file (or use the provided acc_configs/gpu8.yaml):

accelerate config

Stage 1: Train Static Encoder

Train the static encoder on combined datasets:

accelerate launch --config_file acc_configs/gpu8.yaml train.py combined \
    --workspace /path/to/workspace/encoder_exp

Stage 2: Train Dynamic Decoder

After Stage 1 completes, train the dynamic decoder with the frozen encoder:

accelerate launch --config_file acc_configs/gpu8.yaml train_decoder.py combined_rcvd \
    --workspace /path/to/workspace/decoder_exp \
    --encoder_path /path/to/workspace/encoder_exp/model.safetensors

Monitoring Training

Training progress is logged to Weights & Biases. Set up wandb before training:

wandb login

Checkpoints are saved every 10 epochs and every 30 minutes to checkpoint_latest/.

Citation

If you find this work useful, please cite:

@article{wu2025streamsplat,
    title={StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams}, 
    author={Zike Wu and Qi Yan and Xuanyu Yi and Lele Wang and Renjie Liao},
    journal={arXiv preprint arXiv:2506.08862},
    year={2025},
}

Acknowledgments

This project builds upon several excellent works:

About

[ICLR 2026] StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published