Tutorials

LTX 2.3 Complete ComfyUI Workflow: T2V, I2V, Talking Avatar & Audio

A comprehensive walkthrough of all major LTX 2.3 workflow types — text-to-video, image-to-video, talking avatar with audio sync, and native audio-video generation.

By ltx workflow

📹 Video Tutorial

Editor's Note: This YouTube tutorial from March 2026 provides the most comprehensive walkthrough of LTX 2.3 in ComfyUI, covering all four major workflow types in one video.

Workflow Types Covered

Text-to-Video (T2V)

Generate video from a text prompt. Use Distilled v1.1: 8 steps, CFG=1, euler scheduler. Dev model: 20–30 steps, CFG=3.5.

Image-to-Video (I2V)

Animate a still image. Use LTXVConditioning with image input. Start frame strength 1.0 for tight adherence, 0.7–0.9 for creative motion.

Talking Avatar (Audio-Synced I2V)

Provide portrait image + speech audio → lip-synced video.

Required extra files:

  • LTX23_audio_vae_bf16.safetensors → models/vae/
  • LTX23_video_vae_bf16.safetensors → models/vae/
  • ltx-2.3_text_projection_bf16.safetensors → models/text_encoders/

Use clean mono speech audio at 16kHz. Keep clips under 5 seconds for best sync.

Native Audio-Video Generation

Generate both video and audio simultaneously from a text prompt. Best for speech-referenced scenes; ambient audio results vary.

Model Selection

  • 16GB (FP8): ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
  • 32GB (BF16): ltx-2.3-22b-distilled-1.1.safetensors

Sources

#ltx-2.3#comfyui#text-to-video#image-to-video#talking-avatar#audio#workflow