FP8 vs MXFP8 vs BF16: Choosing the Right LTX 2.3 Quantization Format
A practical guide to LTX 2.3 quantization formats for ComfyUI: when to use FP8 scaled, MXFP8 block-32, or full BF16 precision based on your GPU and use case.
By ltx workflow
Editor's Note: This guide explains the three quantization formats available for LTX 2.3 in ComfyUI — FP8 scaled, MXFP8 block-32, and BF16 — and helps you choose the right one based on your GPU and use case.
What Is Quantization?
When running large AI video models locally, quantization reduces memory usage by storing model weights in lower-precision formats. LTX 2.3's 22B-parameter transformer is too large to fit in 16GB VRAM at full precision (BF16), so Kijai and Lightricks provide quantized variants that trade a small amount of quality for dramatically lower VRAM requirements.
LTX 2.3 currently offers three weight formats for the transformer:
- BF16 — full precision, official weights
- FP8 scaled — 8-bit float, standard quantization
- MXFP8 block-32 — microscaling FP8, block-level quantization
BF16 — Full Precision
BF16 (Brain Float 16) is the native format used by Lightricks when training LTX 2.3. It has a wide dynamic range (same exponent bits as FP32) and no quantization error. This is the reference quality.
Files:
ltx-2.3-22b-distilled-1.1.safetensors(Official, ~46GB)ltx-2.3-22b-distilled-1.1_transformer_only_bf16.safetensors(Kijai, transformer only)ltx-2.3-22b-dev_transformer_only_bf16.safetensors(Kijai, dev)
Requirements: 32GB+ VRAM. Not suitable for 16GB cards.
When to use: You have a 32GB GPU (RTX 4090, A6000, etc.) and want maximum quality. Also required for LoRA training — train on BF16 dev, not quantized models.
FP8 Scaled — Standard 8-bit Float
FP8 scaled uses 8-bit floating point with a per-tensor or per-channel scale factor to preserve dynamic range. It halves the memory footprint compared to BF16, bringing a 42GB+ model down to ~25GB — runnable on 16GB VRAM.
Files (Kijai):
ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors← recommended for 16GBltx-2.3-22b-dev_transformer_only_fp8_scaled.safetensorsltx-2.3-22b-dev_transformer_only_fp8_input_scaled.safetensors
Files (Official Lightricks FP8):
ltx-2.3-22b-distilled-fp8.safetensors(29.5GB)ltx-2.3-22b-dev-fp8.safetensors(29.1GB)
Requirements: 16GB+ VRAM. Requires RTX 40-series (Ada Lovelace) or newer for hardware FP8 matrix multiplication. On older cards, ComfyUI will fall back to software emulation which is much slower.
When to use: You have an RTX 4070/4080/4090 with 16–24GB VRAM and want the best speed-to-quality ratio. This is the most widely used format in the community.
Quality: Visually very close to BF16 in most generations. The small quality gap is rarely noticeable in practice.
MXFP8 Block-32 — Microscaling FP8
MXFP8 (Microscaling FP8) is a newer quantization standard co-developed by NVIDIA, Microsoft, AMD, and Intel. Instead of a single scale factor per tensor (like standard FP8), MXFP8 applies independent scale factors to blocks of 32 elements. This finer-grained scaling reduces quantization error in layers where weights have high variance across channels.
Files (Kijai):
ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32.safetensorsltx-2.3-22b-dev_transformer_only_mxfp8_block32.safetensorsltx-2.3-22b-distilled_transformer_only_mxfp8_block32.safetensors
Requirements: 16GB+ VRAM. Native hardware support is available on NVIDIA Blackwell (RTX 50-series, B100, B200). On Ada Lovelace (RTX 40xx), it runs via software emulation.
When to use:
- Standard FP8 causes errors or artifacts on your specific GPU
- You have an RTX 50-series card with native MXFP8 support
- You want to experiment with an alternative quantization format
Note: On RTX 40xx GPUs, MXFP8 may be slower than standard FP8 since it lacks hardware acceleration for block-level scaling. Benchmark both on your system if performance matters.
Quick Comparison Table
| Format | VRAM | GPU Requirement | Quality | Speed |
|---|---|---|---|---|
| BF16 | 32GB+ | Any CUDA GPU | Reference | Slowest |
| FP8 scaled | 16GB | RTX 40xx+ recommended | ~98% of BF16 | Fast |
| MXFP8 block-32 | 16GB | RTX 50xx for hw accel | ~98% of BF16 | Varies |
Practical Decision Guide
I have a 32GB GPU (RTX 4090, A6000):
→ Use official BF16 ltx-2.3-22b-distilled-1.1.safetensors. No reason to quantize.
I have a 16–24GB GPU with RTX 40xx:
→ Start with ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors. This is the community standard.
FP8 scaled crashes or shows artifacts on my GPU:
→ Try ltx-2.3-22b-distilled-1.1_transformer_only_mxfp8_block32.safetensors instead.
I have an RTX 50-series GPU: → MXFP8 block-32 has native hardware support. Test both FP8 and MXFP8 and benchmark.
I want to train a LoRA: → Use BF16 dev model. Do not train on quantized models. For inference with a trained LoRA on 16GB, load the FP8 dev model + your LoRA.
I have an RTX 30xx (no hardware FP8): → FP8 and MXFP8 will use software emulation — noticeably slower. Consider using the 24GB sequential offloading approach with the official BF16 model if you have 24GB, or check if a GGUF variant exists for your use case.
File Placement
All checkpoint variants (BF16, FP8, MXFP8) go in the same location:
ComfyUI/models/checkpoints/
No workflow changes needed when switching between quantization formats of the same model — ComfyUI's checkpoint loader handles the format automatically.