> For the complete documentation index, see [llms.txt](https://docs.qolaba.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.qolaba.ai/model-reference/video-models.md).

# Video Models

Qolaba provides access to 10+ video generation models — covering text-to-video, image-to-video, multi-reference guided generation, and AI-powered video editing. Credit costs vary by model, duration, and resolution. Use this page as a reference when selecting a model for your video generation task.

***

### How to Read This Page

<table><thead><tr><th width="222.83203125">Column</th><th>What It Means</th></tr></thead><tbody><tr><td><strong>Credits</strong></td><td>Credits consumed per video at the specified duration and resolution combination</td></tr><tr><td><strong>Duration</strong></td><td>Supported video lengths in seconds</td></tr><tr><td><strong>Resolution</strong></td><td>Supported output quality options</td></tr><tr><td><strong>Reference Images</strong></td><td>Whether the model accepts uploaded images as generation input</td></tr><tr><td><strong>Audio Support</strong></td><td>Whether the model supports AI-generated or uploaded audio</td></tr></tbody></table>

{% hint style="info" %}
Models marked with ⭐ are available on **paid plans only**.
{% endhint %}

***

### **Google Veo Models**

Google's flagship video generation models — delivering cinematic quality, physics-accurate motion, and highly detailed environments.

#### **Veo 3.1** ⭐

| Duration | 720p / 1080p | 4K    |
| -------- | ------------ | ----- |
| 4s       | 416          | —     |
| 6s       | 624          | —     |
| 8s       | 832          | 1,248 |

#### **Veo 3.1 Fast**

| Duration | 720p / 1080p | 4K  |
| -------- | ------------ | --- |
| 4s       | 156          | —   |
| 6s       | 234          | —   |
| 8s       | 312          | 728 |

| Feature                | Veo 3.1                                                           | Veo 3.1 Fast                                                 |
| ---------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------ |
| **Input**              | Text-to-video, Image-to-video                                     | Text-to-video, Image-to-video                                |
| **Default resolution** | 720p                                                              | 720p                                                         |
| **Max generations**    | 4                                                                 | 4                                                            |
| **Best for**           | Cinematic quality, physics-accurate motion, detailed environments | Faster generation at lower cost — balanced speed and quality |

**Duration & Resolution Restrictions:**

| Condition                                 | Allowed Durations |
| ----------------------------------------- | ----------------- |
| 720p + text-to-video (no reference image) | 4s, 6s, 8s        |
| 720p + reference image (image-to-video)   | 8s only           |
| 1080p (any input)                         | 8s only           |
| 4K (any input)                            | 8s only           |

{% hint style="info" %}
4K is only available at 8 seconds duration for both Veo models.
{% endhint %}

***

### **Runway Models**

#### **Runway Gen-4.5** ⭐

Credits are calculated at 31.2 credits per second:

| Duration | Credits |
| -------- | ------- |
| 2s       | 63      |
| 5s       | 156     |
| 10s      | 312     |

| Feature                 | Details                                                                           |
| ----------------------- | --------------------------------------------------------------------------------- |
| **Input**               | Text-to-video, Image-to-video                                                     |
| **Supported durations** | 2–10 seconds                                                                      |
| **Output resolution**   | 720p only                                                                         |
| **Frame rate**          | 24fps, 25fps                                                                      |
| **Max generations**     | 4                                                                                 |
| **Best for**            | Industry-leading production quality — reliable for commercial and branded content |

**Aspect Ratio Restrictions:**

| Input Mode     | Supported Aspect Ratios         |
| -------------- | ------------------------------- |
| Text-to-video  | 16:9 only                       |
| Image-to-video | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 |

***

### **ByteDance Models**

**Seedance 2.0** and **Seedance 2.0 Fast** are ByteDance's flagship video models — distinguished by their multi-reference input capability, allowing up to 12 reference files (images, videos, and audio) to guide a single generation.

***

#### **Seedance 2.0** ⭐

| Feature                   | Details                                                                                                          |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| **Input**                 | Text-to-video, Image-to-video, Video-to-video                                                                    |
| **Supported durations**   | 4–15 seconds (or AI-determined if left blank)                                                                    |
| **Supported resolutions** | 480p, 720p, 1080p                                                                                                |
| **Aspect ratios**         | 16:9, 21:9, 9:16, 3:4, 1:1, 4:3                                                                                  |
| **Max generations**       | 4                                                                                                                |
| **Audio support**         | AI-generated audio or uploaded audio                                                                             |
| **Best for**              | Cinematic quality with multi-reference guidance — brand-consistent generation combining images, video, and audio |

#### **Seedance 2.0 Fast** ⭐

| Feature                   | Details                                                                  |
| ------------------------- | ------------------------------------------------------------------------ |
| **Input**                 | Text-to-video, Image-to-video, Video-to-video                            |
| **Supported durations**   | 4–15 seconds                                                             |
| **Supported resolutions** | 480p, 720p                                                               |
| **Max generations**       | 4                                                                        |
| **Audio support**         | AI-generated audio or uploaded audio                                     |
| **Best for**              | Faster generation at lower cost — rapid prototyping and quick iterations |

**Reference Media Support (Both Seedance Models):**

| Media Type      | Limit                   | Size Limit                              |
| --------------- | ----------------------- | --------------------------------------- |
| Images          | Up to 9                 | Max 30 MB each                          |
| Videos          | Up to 3                 | Combined 2–15 seconds, max 50 MB total  |
| Audio           | Up to 3                 | Combined max 15 seconds, max 15 MB each |
| **Total files** | Max 12 across all types | —                                       |

**How to Reference Uploads in Your Prompt:**

Use tags to tell the model exactly how to use each uploaded file:

* Images: `@Image1`, `@Image2`, etc.
* Videos: `@Video1`, `@Video2`, etc.
* Audio: `@Audio1`, `@Audio2`, etc.

**Example prompt:**

```
The person in @Image1 walks into a futuristic city
while @Audio1 plays softly in the background.
```

> **Note:** Audio cannot be uploaded without at least one image or video reference. Maximum 12 files total across all media types.

***

### **Happy Horse Models**&#x20;

Happy Horse models are purpose-built for strong character consistency and advanced video editing — the only models in Qolaba that support AI-powered editing of existing video footage.

***

#### **Happy Horse (Generation)** ⭐

| Feature                   | Details                                                                           |
| ------------------------- | --------------------------------------------------------------------------------- |
| **Input**                 | Text-to-video, Image-to-video                                                     |
| **Supported durations**   | 3–15 seconds (default 5s)                                                         |
| **Supported resolutions** | 720p, 1080p (default 1080p)                                                       |
| **Aspect ratios**         | 16:9, 9:16, 1:1, 4:3, 3:4                                                         |
| **Max generations**       | 4                                                                                 |
| **Max reference images**  | Up to 9 images                                                                    |
| **Best for**              | Multi-character video generation with consistent character identity across scenes |

**How Character References Work:** Upload images of your characters — the first uploaded image becomes `character1`, the second becomes `character2`, and so on up to `character9`. Reference them directly in your prompt:

```
A futuristic dance battle between character1 and character2
under neon lights.
```

***

#### **Happy Horse Video Edit** ⭐

Happy Horse Video Edit is a distinct capability from generation — it edits and transforms existing videos rather than creating new ones from scratch.

| Feature                       | Details                                                                           |
| ----------------------------- | --------------------------------------------------------------------------------- |
| **Input**                     | Source video (required) + optional style images                                   |
| **Supported resolutions**     | 720p, 1080p                                                                       |
| **Source video requirements** | MP4 or MOV, 3–60 seconds, under 100 MB, minimum 320px shortest side               |
| **Max style images**          | Up to 5                                                                           |
| **Audio options**             | Keep original audio or regenerate with AI                                         |
| **Max output length**         | 15 seconds (longer videos are automatically trimmed)                              |
| **Best for**                  | Style transfer, element replacement, re-texturing or re-lighting existing footage |

**Example prompt:**

```
Replace the person's jacket with the red leather jacket
shown in @Image1, and make the sky look like a sunset.
```

> **Important:** Even if you upload a 60-second source video, the model processes and returns a maximum of 15 seconds of edited footage.

***

### **Minimax Models**

#### **Hailuo 2.3 Pro** ⭐

| Duration | Resolution | Credits |
| -------- | ---------- | ------- |
| 6s       | 1080p      | 128     |
| 10s      | 768p       | —       |

| Feature              | Details                                                                         |
| -------------------- | ------------------------------------------------------------------------------- |
| **Input**            | Text-to-video, Image-to-video                                                   |
| **Default duration** | 6 seconds                                                                       |
| **Max generations**  | 4                                                                               |
| **Best for**         | Strong general-purpose video generation — reliable motion and scene consistency |

> **Note:** 10-second duration is only available at 768p resolution.

***

### **Kling Models**

*Kuaishou*

#### **Kling O3 Pro** ⭐

| Duration | Credits |
| -------- | ------- |
| 5s       | 183     |
| 10s      | 365     |
| 15s      | 547     |

#### **Kling V3 Pro** ⭐

| Duration | Credits |
| -------- | ------- |
| 5s       | 219     |
| 10s      | 437     |
| 15s      | 656     |

| Feature                 | Kling O3 Pro                                                                        | Kling V3 Pro                                                        |
| ----------------------- | ----------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| **Input**               | Text-to-video, Image-to-video                                                       | Text-to-video, Image-to-video                                       |
| **Supported durations** | 3–15 seconds                                                                        | 3–15 seconds                                                        |
| **Resolution**          | 720p / 1080p                                                                        | 720p / 1080p                                                        |
| **Max generations**     | 4                                                                                   | 4                                                                   |
| **Best for**            | High quality, strong motion — reliable for short-form social and commercial content | Latest Kling generation — improved motion realism and visual detail |

***

### **Vidu Models**

#### **Vidu Q3 Turbo**

| Duration | 360p / 540p | 720p / 1080p |
| -------- | ----------- | ------------ |
| 4s       | 37          | 81           |
| 8s       | 73          | 161          |
| 16s      | 146         | 321          |

| Feature                 | Details                                                                                                    |
| ----------------------- | ---------------------------------------------------------------------------------------------------------- |
| **Input**               | Text-to-video, Image-to-video                                                                              |
| **Supported durations** | 1–16 seconds                                                                                               |
| **Max generations**     | 4                                                                                                          |
| **Best for**            | Fast and cost-effective generation — quick cuts, looping visuals, short transitions, high-volume iteration |

***

### **Luma Models**

#### **Luma Ray 2**

| Duration | 540p | 720p | 1080p |
| -------- | ---- | ---- | ----- |
| 5s       | 130  | 260  | 520   |
| 9s       | 234  | 468  | 936   |

| Feature             | Details                                                                                             |
| ------------------- | --------------------------------------------------------------------------------------------------- |
| **Input**           | Text-to-video, Image-to-video                                                                       |
| **Max generations** | 4                                                                                                   |
| **Best for**        | Cinematic quality with excellent prompt adherence — smooth motion and strong narrative storytelling |

***

### **xAI Models**

#### **Grok Imagine Video** ⭐

| Duration | 480p | 720p |
| -------- | ---- | ---- |
| 5s       | 65   | 92   |
| 10s      | 130  | 183  |
| 15s      | 195  | 274  |

| Feature                 | Details                                                                                       |
| ----------------------- | --------------------------------------------------------------------------------------------- |
| **Input**               | Text-to-video, Image-to-video                                                                 |
| **Supported durations** | 1–15 seconds                                                                                  |
| **Max generations**     | 4                                                                                             |
| **Best for**            | Cost-effective generation across extended durations — creative and experimental video content |

***

### Model Comparison at a Glance

| Use Case                                 | Recommended Model                 | Reason                                                               |
| ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- |
| **Cinematic quality — maximum fidelity** | Veo 3.1 or Seedance 2.0 ⭐         | Google's flagship or ByteDance's premium model                       |
| **Fast generation — lower cost**         | Veo 3.1 Fast or Seedance 2.0 Fast | Faster variants at reduced credit cost                               |
| **Multi-reference guided generation**    | Seedance 2.0 ⭐                    | Only model supporting up to 12 reference files (image, video, audio) |
| **Character-consistent generation**      | Happy Horse ⭐                     | Dedicated character reference system for multi-character scenes      |
| **AI video editing**                     | Happy Horse Video Edit ⭐          | Only model supporting AI-powered editing of existing video footage   |
| **Industry-leading production quality**  | Runway Gen-4.5 ⭐                  | Reliable for commercial and branded content                          |
| **Long-duration generation**             | Kling O3 Pro or Kling V3 Pro ⭐    | Supports up to 15 seconds                                            |
| **Highest context window**               | Grok Imagine Video                | Cost-effective across 1–15 second durations                          |
| **Fast, low-cost iteration**             | Vidu Q3 Turbo                     | Lowest credit cost — good for drafts and quick concepts              |
| **Smooth cinematic motion**              | Luma Ray 2                        | Excellent prompt adherence with cinematic quality                    |
| **Strong general-purpose**               | Hailuo 2.3 Pro                    | Reliable motion and scene consistency                                |
Column	What It Means
Credits	Credits consumed per video at the specified duration and resolution combination
Duration	Supported video lengths in seconds
Resolution	Supported output quality options
Reference Images	Whether the model accepts uploaded images as generation input
Audio Support	Whether the model supports AI-generated or uploaded audio