[Feature] Add accept length simulator for QwenVL by Lihui-Gu · Pull Request #279 · sgl-project/SpecForge

Lihui-Gu · 2025-11-07T06:52:07Z

Motivation

The system should prioritize evaluating key metrics like accept length, enabling direct validation on datasets without relying on sglang server.
Facilitate performance analysis and benchmarking of the draft model's efficiency. This allows better testing of inference optimizations (including but not limited to quantization, sparse attention) and their benefits on the draft model side.

Using a pre-prepared test set containing: System prompt + User input + Image input (if applicable), Pre-sampled assistant responses from the target model in JSONL format

Modifications

Related Issues

Naive brainstorm: accept length simulator: #63

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist · 2025-11-07T06:52:11Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zyksir · 2025-11-12T18:16:50Z

fantastic work! This is really good for researchers to try new model arch. did you aligh the accept length that your script output with sglang?

FrankLeeeee · 2025-11-19T09:31:23Z

Can you rebase your code with the latest main branch and apply pre-commit formatting?

jiapingW · 2025-11-20T06:02:12Z

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

KerwinKai · 2025-11-23T17:37:49Z

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

I encountered the same issue and resolved it by removing the num_proc=num_proc line from the mapping process. It might be caused by a deadlock between processes, although I haven’t fully figured out the root cause yet.

330205812 · 2025-12-05T03:28:38Z

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

I encountered the same issue and resolved it by removing the num_proc=num_proc line from the mapping process. It might be caused by a deadlock between processes, although I haven’t fully figured out the root cause yet.

see #102 (comment)

Lihui-Gu requested review from FlamingoPg, FrankLeeeee, shuaills and sleepcoo as code owners November 7, 2025 06:52

Lihui-Gu mentioned this pull request Nov 7, 2025

Naive brainstorm: accept length simulator #63

Open

Lihui-Gu mentioned this pull request Nov 20, 2025

EAGLE3 on Qwen2.5-VL / Qwen3-VL shows extremely low accept length (accept_len ≈ 1) #310

Open

[Feature] Add accept length simulator for QwenVL

f389849

Lihui-Gu force-pushed the dev_accept_sim branch from 73fdb87 to f389849 Compare November 20, 2025 06:21

Merge branch 'sgl-project:main' into dev_accept_sim

9fbdbbb

zyt1024 mentioned this pull request Nov 27, 2025

Qwen2_5 VL training #325

Closed

C3236455482 mentioned this pull request Dec 3, 2025

[Bug] args.build-dataset-num-proc doesn't work #349

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add accept length simulator for QwenVL#279

[Feature] Add accept length simulator for QwenVL#279
Lihui-Gu wants to merge 2 commits intosgl-project:mainfrom
Lihui-Gu:dev_accept_sim

Lihui-Gu commented Nov 7, 2025

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

zyksir commented Nov 12, 2025

Uh oh!

FrankLeeeee commented Nov 19, 2025

Uh oh!

jiapingW commented Nov 20, 2025

Uh oh!

KerwinKai commented Nov 23, 2025

Uh oh!

330205812 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Lihui-Gu commented Nov 7, 2025

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Uh oh!

zyksir commented Nov 12, 2025

Uh oh!

FrankLeeeee commented Nov 19, 2025

Uh oh!

jiapingW commented Nov 20, 2025

Uh oh!

KerwinKai commented Nov 23, 2025

Uh oh!

330205812 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants