SCFM: Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch

Xu Cai^∗†, Yang Wu^‡, Qianli Chen^∗, Haoran Wu^∗, Lichuan Xiang^§, Hongkai Wen^§

NeurIPS 2025

🔗 Resources

🚀 TL;DR

We introduce SCFM — a highly efficient post-training distillation method that converts any pre-trained flow matching diffusion model (e.g., Flux, SD3) into a 3–8 step sampler in <1 A100 day.

💡 Key Contributions

Velocity-space self-distillation: Operates directly on the velocity field to enforce linear trajectory consistency across timesteps.
No step-size conditioning needed: Unlike Shortcut Models (Frans et al. 2025), SCFM works on standard pre-trained FM models out-of-the-box.
Few-shot distillation: Achieves competitive results with as few as 10 training images — the first successful few-shot distillation for 10B+ parameter diffusion models.
Ultra-fast training: Distills Flux.1-Dev (12B) into a 3-step sampler in under 24 GPU hours using LoRA.

📊 Main Results (Flux.1-Dev → 3 Steps)

Method	Steps	Latency (A100, s)	\|ΔFID\| ↓	FID ↓	CLIP ↑
Flux-HyperSD	3	1.33	1.52	9.65	31.95
Flux-TDD	3	1.33	4.46	8.26	31.38
Flux-SCFM (Ours)	3	1.33	1.01	6.34	33.10
Flux-Schnell (Official)	3	1.33	6.58	7.06	33.06

SCFM achieves the best FID and CLIP scores among 3-step distilled models — and does so without adversarial distillation (ADD/LADD).

🖼️ Visual Comparison

📦 Get Started

✅ Compatible with any pre-trained flow matching model (Flux, SD3, etc.)
✅ Uses LoRA for parameter-efficient fine-tuning
✅ No need for large datasets(self generated) — works in few-shot regime
✅ Inference uses standard ODE solvers(flow match) — no architectural changes

🔗 Resources

📬 Contact

Questions? Reach out to the corresponding author: caitreex@gmail.com