LTX 2.3 Complete ComfyUI Workflow: T2V, I2V, Talking Avatar & Audio
A comprehensive walkthrough of all major LTX 2.3 workflow types — text-to-video, image-to-video, talking avatar with audio sync, and native audio-video generation.
By ltx workflow
📹 Video Tutorial
Editor's Note: This YouTube tutorial from March 2026 provides the most comprehensive walkthrough of LTX 2.3 in ComfyUI, covering all four major workflow types in one video.
Workflow Types Covered
Text-to-Video (T2V)
Generate video from a text prompt. Use Distilled v1.1: 8 steps, CFG=1, euler scheduler. Dev model: 20–30 steps, CFG=3.5.
Image-to-Video (I2V)
Animate a still image. Use LTXVConditioning with image input. Start frame strength 1.0 for tight adherence, 0.7–0.9 for creative motion.
Talking Avatar (Audio-Synced I2V)
Provide portrait image + speech audio → lip-synced video.
Required extra files:
- LTX23_audio_vae_bf16.safetensors → models/vae/
- LTX23_video_vae_bf16.safetensors → models/vae/
- ltx-2.3_text_projection_bf16.safetensors → models/text_encoders/
Use clean mono speech audio at 16kHz. Keep clips under 5 seconds for best sync.
Native Audio-Video Generation
Generate both video and audio simultaneously from a text prompt. Best for speech-referenced scenes; ambient audio results vary.
Model Selection
- 16GB (FP8): ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
- 32GB (BF16): ltx-2.3-22b-distilled-1.1.safetensors