Data Pipeline Overview
Model Parameters
14B
Transformer
Text Encoder
11B
UMT5-XXL
Transformer Blocks
40
Attention + FFN
Text Encoding Data Flow
WAN-Style Prompts: WanVideo was trained on detailed scene descriptions. Write complete paragraphs describing the subject, environment, lighting, and atmosphere.
UMT5-XXL Processing Details
- • Input: Tokenized text sequences
- • Output: Fixed [256, 4096] embeddings per prompt
- • Memory per prompt: ~4.2 MB (FP32)
- • Disk caching available for faster re-use
Latent Space Transformation
Patchification Process
The latent space is divided into patches for transformer processing:
- • VAE Stride: (4, 8, 8) for (T, H, W)
- • Patch Size: (1, 2, 2)
- • Each patch: 16 × 1 × 2 × 2 = 64 values
- • Flattened to 5,120 dimensions
Transformer Block Data Flow
Block Architecture (40 Blocks Total)
Block Specialization
Interactive Block Explorer
Memory Usage Analysis
Model Memory
-
Activation Memory
-
Buffer/Cache
2.0 GB
Total Required
-
Memory Optimization Options
Model Optimizations:
- • FP8 Quantization: Save ~75% model memory
- • Model Offloading: Keep only active blocks in VRAM
- • LoRA: Use adapters instead of full model
Activation Optimizations:
- • Flash Attention: Reduce quadratic memory
- • Gradient Checkpointing: Trade compute for memory
- • Lower Resolution: Reduces sequence length quadratically
Activation Editor Analysis
Critical Requirement
Main and injection prompts must be >50% different for visible effects. Use WAN-style detailed descriptions (full paragraphs, not short phrases).
Injection Configuration
Block Selection
Block Patterns (General Tendencies)
| Range | Typical Processing | Testing Focus |
|---|---|---|
| Early (0-13) | Local patterns, colors, textures, edges | Material transformation, texture mixing |
| Middle (14-26) | Object boundaries, scene structure, motion | Object morphing, composition changes |
| Late (27-39) | Semantic relationships, global coherence | Concept blending, mood transformation |
Note: These are general patterns from transformer research, not rigid rules. Actual behavior depends on specific prompts and their embedding difference.
Prompt Difference Calculator
Estimate the semantic difference between your prompts (aim for >50%):
Debug Features
The WanVideoActivationEditor node includes runtime log level control:
- off: No debug output
- basic: Essential information only
- verbose: Detailed operation logs including percent changed
- trace: Full trace with stack information
Use "verbose" to see the actual embedding difference percentages during generation.