Skip to content

add resume for dflash training script#460

Open
tianhaoz95 wants to merge 7 commits intosgl-project:mainfrom
tianhaoz95:feature/dflash-resume
Open

add resume for dflash training script#460
tianhaoz95 wants to merge 7 commits intosgl-project:mainfrom
tianhaoz95:feature/dflash-resume

Conversation

@tianhaoz95
Copy link
Contributor

@tianhaoz95 tianhaoz95 commented Feb 4, 2026

Motivation

The current dflash saves checkpoint but does not have logic to resume training, dflash tend to have a higher epoch count compared with eagle3, so this will be helpful.

Modifications

  • change the save checkpoint to be on the epoch resolution since we don't have logic to restore optimizer state
  • when loading drafter model, use checkpoint if requested with --resume and available

Related Issues

NA

Accuracy Test

The loading and training proceeds:

INFO:specforge.utils:/tmp/tianhaoz/longcat_dflash_training/outputs
INFO:specforge.utils:Last checkpoint detected: /tmp/tianhaoz/longcat_dflash_training/outputs/epoch_4
INFO:specforge.utils:Loaded draft config from checkpoint: /tmp/tianhaoz/longcat_dflash_training/outputs/epoch_4

Benchmark & Profiling

This change is unrelated to performance.

Checklist

zhoutianhao03 added 2 commits February 4, 2026 08:13
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants