Add probabilistic pretrain + GRPO RL pipeline with pluggable rewards and tracking (backward‑compatible)#1246
Add probabilistic pretrain + GRPO RL pipeline with pluggable rewards and tracking (backward‑compatible)#1246hcsolakoglu wants to merge 39 commits intoSWivid:mainfrom
Conversation
|
I have several ideas on how to initialize the probabilistic output head, so I will be implementing and testing multiple approaches. This is still a work in progress, but I have made significant headway. If anyone would like to guide the direction, feel free to run tests and share your feedback. @SWivid |
81a0560 to
db9937e
Compare
|
@hcsolakoglu The main issue is that the multiple samples from the rollout stage lack diversity. There's no difference in prosody or voice tone between them—it just sounds like the same audio with some random noise added. When you finish the first-stage prob model training, I was wondering if you saw anything like this on your end? |
Honestly, I was planning to run a hyperparameter search and test different methods for this(head init, stage 1 and 2, but I didn’t get the chance to do training runs beyond 100–200 steps. I’m actively working on multiple projects at the same time; when I have the time, I’ll run a proper hyperparameter search and share more detailed information here. In the meantime, everyone is free to test it. Code is largely complete; only detailed testing and ablation studies remain. |
With this PR, I'm integrating the RL workflow of the F5R into the F5-TTS while maintaining the default deterministic behavior and checkpoint compliance. Goal is to enable a two‑stage pipeline (Gaussian NLL warmup + GRPO RL
fine‑tuning) with a modular reward system and opt‑in robustness improvements, without changing the default training or inference paths.
Key changes:
Notes on compatibility:
Defaults remain deterministic (output_dist=deterministic, objective=mse), so existing training/inference and checkpoints work unchanged.
All deviations from F5R behavior are opt‑in and documented in README_RL.md.
README_RL.md updated with a concise RL runbook, dataset prep, reward model fetch, and recommended opt‑ins.