NVIDIA RTX Video Generation Guide for LTX-2.3 — Settings, Prompts, Troubleshooting
Official NVIDIA guide for running LTX-2.3 on RTX GPUs with the FirstFrame/LastFrame template and RTX Video Super Resolution. Includes recommended settings, prompt structure, and a sharp troubleshooting Q&A.
By ltx workflow
Editor's Note: NVIDIA's official guide for the LTX-2.3 + RTX Video Super Resolution workflow on RTX GPUs. The most useful parts for LTX users: recommended resolution/frame count, inference settings, prompt structure for I2V, and a sharp troubleshooting Q&A (black output, last-frame mismatch, subject drift). Excerpted below — LTX-related sections only.
Getting Started
The NVIDIA Video Generation workflow runs locally on your RTX GPU using Blender, ComfyUI, generative AI models like FLUX.1 from Black Forest Labs and LTX-2.3 from Lightricks, and the new RTX Video Super Resolution upscaler node available in ComfyUI.
The workflow is broken down into three steps:
- A blueprint for generating 3D objects from text prompts
- A blueprint for using those assets as depth shaders to control image-gen composition
- A workflow in ComfyUI that uses first and last frame images to generate video from text prompts and upscale the output using RTX Video
Creators can pick and choose which part of the blueprint they want to use. For the full pipeline, work through each step before moving to the next to ensure full system resources are available.
Downloads
- 3D Object Generator blueprint — see NVIDIA-AI-Blueprints/3d-object-generation
- 3D Guided Generative AI blueprint — see NVIDIA-AI-Blueprints/3d-guided-genai-rtx
- LTX-2.3 FirstFrame/LastFrame + RTX Video upscaler ComfyUI template — via ComfyUI template browser (when available) or via GitHub
System requirements
- GPU: 16 GB of VRAM (NVIDIA GeForce RTX 5070 Ti or higher recommended)
- OS: Windows 11
- System RAM: 64 GB
LTX-2.3 settings (FAQ excerpts)
What resolution and frame count should I use for LTX-2.3?
Optimize iteration work at 1280×720 and keep sequences under 257 frames for the best balance of coherence and speed. When ready, try increasing to 1920×1080.
What inference settings should I use for LTX-2.3?
Use 20–30 steps when iterating and 40+ steps for final-quality renders. Set Guidance Scale to 3.0–3.5 for the best balance between prompt coherence and natural-looking motion.
How do I configure RTX Video Super Resolution?
Set Upscale Factor (1–4) based on input resolution and target output — for 720p → 4K, use 3. Set Quality Level to 4 for maximum edge sharpening and artifact removal.
How do I write a prompt for LTX-2.3 image-to-video?
LTX-2.3 expects natural language, not tag lists. Your image already contains the visual information; the prompt should describe what happens.
A reliable structure:
- Shot framing — e.g. "medium close-up, slight upward tilt"
- Lighting — e.g. "golden hour, long shadows"
- Action as a time sequence — e.g. "the motorcycle accelerates forward, dust rising behind the rear wheel"
Front-load tone and quality words before subject nouns. Write 4–6 sentences. Don't repeat what's already visible in your keyframe — describe the change, not the static state.
What should I put in my negative prompt?
Keep it focused. A reliable starting point: morphing, distortion, warping, flicker, jitter, blur, artifacts, glitch, overexposure, watermark, text, subtitles. Avoid building long lists. (Note: LTX-2.3 doesn't require a negative prompt.)
Troubleshooting
My last frame doesn't match the image I provided. How do I fix it?
Known issue. First, raise the last-frame strength value to 1.0 in your guide node. If that doesn't resolve it, try setting the last-frame position index to -12 instead of -1 — this gives the model a few frames of landing room before the end. End-frame adherence also degrades over longer clips, so keeping sequences to 5 seconds (121 frames) significantly improves results.
My output video is completely black. What do I check first?
Three things in order:
- Frame count must follow the (N×8)+1 rule — valid values: 49, 65, 97, 121, …
- FirstFrame/LastFrame workflow: add
LTXVCropGuidesbefore the VAE decode node. Without it, the guide frames corrupt the decode and produce black output. - Text encoder must load correctly — a missing Gemma encoder means the model has no conditioning signal and will generate black or near-black frames.
My subject's appearance changes mid-video. How do I reduce this?
Subject drift is a model limitation, not a bug. Most effective mitigations:
- Keep clips to 5 seconds maximum
- Describe one clear motion at a time in your prompt
- Reduce CFG to 3.0–3.5
- For repeating characters, a LoRA trained on that subject significantly improves consistency across generations