LTX23_audio_vae_bf16.safetensors
LTX 2.3 Audio VAE (Kijai)
Audio VAE by Kijai for LTX 2.3's joint audio-video generation. Encodes/decodes the audio latent stream so a workflow can produce sound synchronized with the video. Place in models/vae/.
Released 2026-05-27 · Source: Kijai/LTX2.3_comfy (HuggingFace) — Added in Kijai's late-May component batch to enable audio-conditioned video generation workflows.
Download LTX23_audio_vae_bf16.safetensors
Direct HuggingFace download. ~1 GB · Free.
No 2GB GPU? Try LTX23_audio_vae_bf16.safetensors online — free generation included
Skip the ~1 GB download and ComfyUI setup. Generate a 6-second video using this exact model in your browser, ~30 seconds.
Technical details
LTX23_audio_vae_bf16.safetensors is the audio half of LTX 2.3's audio-video pipeline. Where the video VAE handles visual latents, this VAE encodes and decodes the audio latent stream — it is what lets a workflow emit a soundtrack (speech, ambient, lip-synced dialogue) that is generated jointly with the frames rather than dubbed on afterward.
It is a BF16 component at roughly 1 GB and goes in ComfyUI/models/vae/ — the same directory as the video VAE and taeltx2_3, not a separate audio folder. Audio-video workflows reference both this file and a video VAE; loading only one leaves the other modality's node unsatisfied.
This is not a replacement for taeltx2_3.safetensors. taeltx2_3 (or the BF16 video VAE) still decodes the picture; the audio VAE only adds the sound path. A silent T2V/I2V workflow never loads it.
When to choose LTX23_audio_vae_bf16.safetensors
Download this only if your workflow generates audio — talking-head / lip-sync pipelines, audio-conditioned video, or any graph that has an audio VAE node. For those, it is required; without it the audio branch errors or produces silence.
For standard silent text-to-video or image-to-video, skip it entirely — taeltx2_3.safetensors is the only VAE you need. Adding the audio VAE to a silent workflow does nothing but consume disk.
If you are building audio workflows, pair this with LTX23_video_vae_bf16.safetensors (the joint pipeline expects the full BF16 video VAE alongside it) rather than the tiny taeltx2_3.
Will this run on my GPU?
Minimum: 2GB VRAM.
Recommendation: Required only for audio-conditioned / audio-to-video workflows. Pair it with the video VAE — the joint pipeline loads both. Not needed for silent T2V or I2V.
How to use LTX23_audio_vae_bf16.safetensors
- Download the file from HuggingFace.
- Place it in ComfyUI/models/vae/ inside your ComfyUI directory.
- Restart ComfyUI (or refresh the model list from the menu).
- Load a compatible workflow — see below.
Don't want to run this locally? Try LTX23_audio_vae_bf16.safetensors online with a free generation — no GPU, no install, ~30 seconds per clip.
ComfyUI says it can't find LTX23_audio_vae_bf16.safetensors?
Some published workflow JSONs reference this file under a custom subdirectory. If ComfyUI shows a "cannot find model" error and your workflow references one of these path-prefixed variants:
- vae/LTX23_audio_vae_bf16.safetensors
- ltx23_audio_vae_bf16_kj.safetensors
- ltx23\LTX23_audio_vae_bf16.safetensors
The prefix before the slash or backslash is a subdirectory the workflow author used. The actual file is the same LTX23_audio_vae_bf16.safetensors — you have two fixes:
- Create the matching subdirectory inside ComfyUI/models/vae/ and place the file there. Example: if the workflow references vae/LTX23_audio_vae_bf16.safetensors, create the corresponding subfolder under ComfyUI/models/vae/ and put LTX23_audio_vae_bf16.safetensors inside it.
- Or open the workflow JSON in a text editor and replace the prefixed string with just LTX23_audio_vae_bf16.safetensors. ComfyUI then resolves it directly from ComfyUI/models/vae/.
On Windows the separator is \, on macOS/Linux it is / — they refer to the same nested folder regardless of platform.
Common issues
VAELoader / audio node: 'LTX23_audio_vae_bf16.safetensors not in list'▼
File placed in ComfyUI/models/ root or in checkpoints/ instead of vae/. Fix: Move it to ComfyUI/models/vae/LTX23_audio_vae_bf16.safetensors and click refresh on the loader node so ComfyUI re-scans the directory.
Video generates but there is no sound / the audio track is silent▼
The audio VAE is missing, or the workflow loaded a video-only VAE for both branches. Audio-video pipelines need this file specifically on the audio path. Fix: Confirm the audio VAE node points at LTX23_audio_vae_bf16.safetensors, and that the video path uses a video VAE. Both must be loaded for joint audio-video output.
Workflow references 'ltx23_audio_vae_bf16_kj.safetensors' but my file has a different name▼
Some workflow authors rename the file with a '_kj' suffix to mark it as Kijai's. The file is identical; only the referenced string differs. Fix: Either rename your local copy to match the workflow, or edit the audio VAE node and reselect LTX23_audio_vae_bf16.safetensors from the dropdown.
Decoded audio is noise / garbled▼
Partial download — HF's xet protocol can leave a truncated file that still has a valid .safetensors header. Fix: Delete and redownload. Prefer `huggingface-cli download` or `aria2c` with retry over a browser, and verify the size matches HuggingFace's reported ~1 GB.
ComfyUI doesn't see the file after I downloaded it▼
Make sure the file is in ComfyUI/models/vae/ (not a subfolder). Restart ComfyUI fully — the menu refresh sometimes misses new files. Filename must match exactly: LTX23_audio_vae_bf16.safetensors.
CUDA out of memory error when loading the model▼
LTX23_audio_vae_bf16.safetensors needs ~2GB VRAM minimum. If you're hitting OOM: • Enable Sequential Offloading in ComfyUI settings • Lower the resolution (768×512 instead of 1280×704) — both dimensions must be divisible by 32 • Reduce frame count (65 frames instead of 161) — must be 8n+1 • Use a smaller variant — see Related models below.
How do I apply this LoRA in ComfyUI?▼
Load it in a 'LoraLoader' node and connect it after your model loader. Pair this LoRA with the dev base model (not the distilled one) for the right behavior. LoRA strength 1.0 is the trained value — start there.
Get notified when LTX 2.3 Audio VAE (Kijai) updates
Occasional updates on what's new in LTX 2.3 — new FP8 quants, LoRAs, IC-LoRA releases — with our hands-on verdict on whether they're worth re-downloading. No fixed cadence.
No spam. Sent occasionally when there's real news. Unsubscribe in one click.
Related models
ltx-2.3-22b-distilled-1.1_lora-dynamic_fro09_avg_rank_111_bf16.safetensors
ltx-2.3-22b-distilled-lora-1.1_fro90_ceil72_condsafe.safetensors
LTX-2.3-OmniNFT-RL-Lora_bf16.safetensors
ltx-2.3-22b-distilled-lora-384-1.1.safetensors
ltx-2.3-22b-distilled-lora-384.safetensors
ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors