convert_to_quant

Convert safetensors weights to quantized formats (FP8, INT8, NVFP4, MXFP8) with learned rounding optimization for ComfyUI inference.

Installation

pip install convert_to_quant

Or install from source:

git clone https://github.com/silveroxides/convert_to_quant.git
cd convert_to_quant
pip install -e .

Requirements Summary

Feature	Requirement
Minimum (FP8/INT8)	Python 3.10+, PyTorch 2.8+, CUDA 12.8+
Full (NVFP4/MXFP8)	Python 3.12+, PyTorch 2.10+, CUDA 13.0+, comfy-kitchen
INT8 Kernels	Triton (Linux native, Windows via `triton-windows`)

Important

PyTorch must be installed manually with the correct CUDA version for your GPU. This package does not install PyTorch automatically to prevent environment conflicts.

Detailed Installation (GPU-Specific)

1. Install PyTorch

Visit pytorch.org to get the correct install command.

Examples:

# CUDA 13.0 (Required for Blackwell NVFP4/MXFP8)
pip install torch --index-url https://download.pytorch.org/whl/cu130

# CUDA 12.8 (Stable)
pip install torch --index-url https://download.pytorch.org/whl/cu128

# CPU only
pip install torch --index-url https://download.pytorch.org/whl/cpu

2. Optional: Triton (needed for blockwise INT8)

# Linux
pip install -U triton

# Windows (Example for torch>=2.9)
pip install -U "triton-windows<3.6"

Quick Start

# Basic FP8 quantization with ComfyUI metadata (recommended)
convert_to_quant -i model.safetensors --comfy_quant

# INT8 Block-wise with SVD optimization
convert_to_quant -i model.safetensors --int8 --block_size 128 --comfy_quant

# Blackwell NVFP4 (4-bit)
convert_to_quant -i model.safetensors --nvfp4 --comfy_quant

Load the output .safetensors file in ComfyUI like any other model.

Supported Quantization Formats

Format	CLI Flag	Hardware	Optimization
FP8 (E4M3)	(default)	Ada/Hopper+	Learned Rounding (SVD)
INT8 Block-wise	`--int8`	Any GPU	Learned Rounding (SVD)
INT8 Tensor-wise	`--int8 --scaling_mode tensor`	Any GPU	High-perf `_scaled_mm`
NVFP4 (4-bit)	`--nvfp4`	Blackwell	Dual-scale optimization
MXFP8	`--mxfp8`	Blackwell	Microscaling (E8M0)

For a deep dive into how these formats work, see FORMATS.md.

Model-Specific Presets

Model	Flag	Notes
Flux.2	`--flux2`	Keep modulation/guidance/time/final high-precision
T5-XXL	`--t5xxl`	Decoder removed
Hunyuan Video	`--hunyuan`	Attention norms excluded
WAN Video	`--wan`	Time embeddings excluded

(See --help-filters for a full list of presets)

Documentation

📖 MANUAL.md - Complete usage guide with examples and troubleshooting
📚 FORMATS.md - Technical reference for quantization formats
🧪 DEVELOPMENT.md - Changelog and research notes
📋 AGENTS.md - Developer guide & registry architecture

Key Features

Learned Rounding: SVD-based optimization minimizes quantization error.
Bias Correction: Automatic bias adjustment using synthetic calibration data.
Model-Specific Support: Exclusion lists for sensitive layers (norms, embeddings).
Three-Tier Quantization: Mix different formats per layer using --custom-layers.

Advanced Usage

Layer Config JSON

Define per-layer settings with regex patterns:

convert_to_quant -i model.safetensors --layer-config layers.json --comfy_quant

Scaling Modes

# Block-wise scaling for better accuracy
convert_to_quant -i model.safetensors --scaling-mode block --block_size 64 --comfy_quant

Acknowledgements

Special thanks to:

Clybius – For Learned-Rounding inspiration.
lyogavin – For ComfyUI int8_blockwise support.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github		.github
convert_to_quant		convert_to_quant
docs		docs
tests		tests
.gitignore		.gitignore
ACTIVATIONS.md		ACTIVATIONS.md
ACTIVE.md		ACTIVE.md
AGENTS.md		AGENTS.md
CALIBRATED_FP8_PLAN.md		CALIBRATED_FP8_PLAN.md
DEVELOPMENT.md		DEVELOPMENT.md
INFERENCE.md		INFERENCE.md
MANUAL.md		MANUAL.md
MIXED_FORMAT_PLAN.md		MIXED_FORMAT_PLAN.md
PLANNED.md		PLANNED.md
README.md		README.md
analyze_code_structure.py		analyze_code_structure.py
legacy_convert_script.py		legacy_convert_script.py
pyproject.toml		pyproject.toml
python.instructions.md		python.instructions.md
quantization.examples.md		quantization.examples.md
rename_quant_format.py		rename_quant_format.py
repair_nvfp4_metadata.py		repair_nvfp4_metadata.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

convert_to_quant

Installation

Requirements Summary

Detailed Installation (GPU-Specific)

1. Install PyTorch

2. Optional: Triton (needed for blockwise INT8)

Quick Start

Supported Quantization Formats

Model-Specific Presets

Documentation

Key Features

Advanced Usage

Layer Config JSON

Scaling Modes

Acknowledgements

License

About

Uh oh!

Releases 14

Packages

Languages

silveroxides/convert_to_quant

Folders and files

Latest commit

History

Repository files navigation

convert_to_quant

Installation

Requirements Summary

Detailed Installation (GPU-Specific)

1. Install PyTorch

2. Optional: Triton (needed for blockwise INT8)

Quick Start

Supported Quantization Formats

Model-Specific Presets

Documentation

Key Features

Advanced Usage

Layer Config JSON

Scaling Modes

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Languages

Packages