Open-Source AI Video Generation Model
A 15-billion-parameter unified Transformer that jointly produces video and synchronized audio from text or image prompts, with cinematic 1080p quality and seven-language lip-sync.
40-layer self-attention network with 4 modality-specific layers.
Generates synchronized dialogue and ambient sound.
Reduces denoising to just 8 steps.
English, Mandarin, Japanese, Korean, German, French.
5-8 second clips at 1080p.
Commercial-use permission included.
Ranked #1 globally on the Artificial Analysis Video Arena with Elo 1333.
| Model | Visual | Alignment | Physical | WER (%) |
|---|---|---|---|---|
| Happy Horse 1.0 | 4.80 | 4.18 | 4.52 | 14.60 |
| OVI 1.1 | 4.73 | 4.10 | 4.41 | 40.45 |
| LTX 2.3 | 4.76 | 4.12 | 4.56 | 19.23 |
Win rate: 80.0% vs OVI 1.1 · 60.9% vs LTX 2.3
Runs on NVIDIA H100 or A100 (≥48GB VRAM recommended).
# Clone & install git clone https://github.com/happy-horse/happyhorse-1.git cd happyhorse-1 pip install -r requirements.txt # Download weights bash download_weights.sh # Generate python demo_generate.py --prompt "a robot dancing on the moon" --duration 5