Skip to content

Conversation

@surajyadav-research
Copy link

What does this PR do?

This PR fixes LoRA integration for LongCatImagePipeline so that load_lora_weights() properly applies the adapter during inference and unload_lora_weights() cleanly restores the base (non-LoRA) behavior.

It also adds a slow regression test that:

  1. runs the pipeline without LoRA (baseline),
  2. loads a LoRA and verifies the output changes,
  3. unloads the LoRA and verifies the output returns close to the baseline.

Why?

Addresses the reported LoRA load/unload issue for LongCat:

Tests

  • RUN_SLOW=yes pytest -q tests/pipelines/longcat_image/test_longcat_lora.py

@surajyadav-research
Copy link
Author

Hi @sayakpaul
CI is currently “awaiting approval from a maintainer”. Could you please approve the workflow runs?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@surajyadav-research
Copy link
Author

Hi @sayakpaul,
I’ve fixed the issue in the test file and pushed the update. Would appreciate your review when convenient.



class LongCatImagePipeline(DiffusionPipeline, FromSingleFileMixin):
class LongCatImagePipeline(DiffusionPipeline, FluxLoraLoaderMixin, FromSingleFileMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite incorrect to me.

Flux has two LoRA loadable modules:

_lora_loadable_modules = ["transformer", "text_encoder"]

For LongCat, it uses a different text encoder (Flux uses two text encoder, let along) and rest of its components also seems to be different from Flux:

def __init__(
self,
scheduler: FlowMatchEulerDiscreteScheduler,
vae: AutoencoderKL,
text_encoder: Qwen2_5_VLForConditionalGeneration,
tokenizer: Qwen2Tokenizer,
text_processor: Qwen2VLProcessor,
transformer: LongCatImageTransformer2DModel,
):

So, could you please explain how using the FluxLoraLoaderMixin is appropriate here?

Instead, I suggest we write a dedicated LoRA loader mixin class for LongCat -- LongCatLoraLoaderMixin. You can refer to

class QwenImageLoraLoaderMixin(LoraBaseMixin):

as an example.

@@ -0,0 +1,107 @@
# Copyright 2025 The HuggingFace Team.
Copy link
Member

@sayakpaul sayakpaul Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. Please try to consult the existing testing structure for pipeline-level LoRA testing c.f. https://github.com/huggingface/diffusers/tree/main/tests/lora/

@surajyadav-research
Copy link
Author

Thanks for the clarification @sayakpaul
I agree the current inheritance from FluxLoraLoaderMixin is misleading here.

While implementing this, my reasoning was that FluxLoraLoaderMixin mainly reuses the generic LoRA load/fuse path and simply assumes the loadable entry points are ["transformer", "text_encoder"]. But since LongCat’s text_encoder is Qwen2_5_VLForConditionalGeneration and the transformer is LongCatImageTransformer2DModel (i.e., not Flux’s text-encoder setup), this can easily lead to subtle issues like incorrect key routing, mismatched target modules, or silent partial loads.

So I’ll switch to a dedicated LongCatLoraLoaderMixin to make the intent explicit and handle any LongCat-specific routing cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants