Text-to-Image

Edit, transform, and compose images using Nano Banana Flash and Nano Banana Pro — Google Gemini-powered models that understand both text and images simultaneously.

Text to image API

POST /api/v1/images/generate

Headers

Name
Value

Content-Type

application/json

Authorization

Bearer <token>


Models at a Glance

Model ID
Name
Engine
Speed
Max Resolution
Max Images
Key Feature

vertex/nano-banana-flash

Nano Banana 2

Gemini Flash

Fast

4K

10

Search grounding

vertex/nano-banana-pro

Nano Banana Pro

Gemini Pro

Standard

4K

10

Text rendering, character consistency

vertex/imagen-4

Imagen 4

Imagen 4.0

Standard

2K

4

Balanced quality + adherence

vertex/imagen-4-fast

Imagen 4 Fast

Imagen 4.0 Fast

Fastest

~1K

4

Lowest cost, highest throughput

vertex/imagen-4-ultra

Imagen 4 Ultra

Imagen 4.0 Ultra

Slowest

2K

4

Highest quality, complex prompts


Nano Banana Flash — vertex/nano-banana-flash

Powered by Gemini 3.1 Flash. Best for fast generation, real-time search-grounded accuracy, high-volume workflows, and image-to-image editing.

Minimal

Standard

With Search Grounding

Enable real-time Google Search so the model generates factually and visually accurate content — real architecture, real locations, real brand identities.

Multiple Variations

Parameters

Field
Type
Default
Options
Description

prompt

string

required

max 4000 chars

Text description of the image

aspect_ratio

string

1:1

See Aspect Ratio Support

Output dimensions

quality

string

2K

512, 1K, 2K, 4K

Output resolution. 4K triggers higher billing rate

num_images

integer

1

1–10

Variations per call

temperature

number

1.0

0–2

Creativity. Lower = faithful, higher = creative

seed

integer

any integer

Reproducibility. Same seed + prompt = same image

use_search_grounding

boolean

false

Real-time Google Search context for accuracy

celery

boolean

false

true = async, returns task_id immediately


Nano Banana Pro — vertex/nano-banana-pro

Powered by Gemini 3 Pro. Best for production output, multi-language text rendering in images, 4K exports, and consistent character/product depiction across generations.

Minimal

Standard

With Text in Image

Nano Banana Pro renders text in images with high accuracy across multiple languages.

With Reference Images (Character / Product Consistency)

Parameters

Field
Type
Default
Options
Description

prompt

string

required

max 4000 chars

Text description of the image

aspect_ratio

string

1:1

See Aspect Ratio Support

Output dimensions

quality

string

2K

1K, 2K, 4K

Output resolution. 4K triggers higher billing rate

num_images

integer

1

1–10

Variations per call

temperature

number

1.0

0–2

Creativity. Lower = faithful, higher = creative

seed

integer

any integer

Reproducibility

reference_images

object[]

max 14

Visual references for consistency. Each has url and optional description (max 500 chars)

celery

boolean

false

true = async processing


Imagen 4 — vertex/imagen-4

Powered by Imagen 4.0. Best for general-purpose generation with strict prompt adherence, multilingual support, and configurable safety controls.

Minimal

Standard

With All Safety Controls


Imagen 4 Fast — vertex/imagen-4-fast

Powered by Imagen 4.0 Fast. Best for bulk generation, rapid prototyping, and cost-sensitive workloads. Fixed to approximately 1K resolution.

Minimal

Standard

Batch Generation


Imagen 4 Ultra — vertex/imagen-4-ultra

Powered by Imagen 4.0 Ultra. Best for the highest quality outputs, complex multi-element compositions, and strict instruction following. Use when quality is more important than speed or cost.

Minimal

Standard

Maximum Quality


Aspect Ratio Support

Ratio
NB Flash
NB Pro
Imagen 4 / Fast / Ultra

1:1

Yes

Yes

Yes

4:3

Yes

Yes

Yes

3:4

Yes

Yes

Yes

16:9

Yes

Yes

Yes

9:16

Yes

Yes

Yes

3:2

Yes

Yes

2:3

Yes

Yes

21:9

Yes

Yes

1:4

Yes

Yes

4:1

Yes

Yes

5:4

Yes

Yes

4:5

Yes

Yes

Last updated