Image-to-Image

Edit, transform, and compose images using Nano Banana Flash and Nano Banana Pro — Google Gemini-powered models that understand both text and images simultaneously.

Image-to-Image API

POST /api/v1/images/generate

Name
Value

Content-Type

application/json

Authorization

Bearer <token>

Model Overview

vertex/nano-banana-flash

vertex/nano-banana-pro

Underlying model

gemini-3.1-flash-image-preview

gemini-3-pro-image-preview

Provider

Google Gemini / Vertex AI

Google Gemini / Vertex AI

Speed

Faster

Slower

Quality

High

Excellent

Max resolution

4K

4K

Quality tiers

512, 1K, 2K, 4K

1K, 2K, 4K

Max source images (image_urls)

14

14

Max reference images

14

14

Search grounding

Yes

No

Text rendering

Good

Excellent

Subject consistency

Good

Excellent

Cost per image (2K)

~$0.08

~$0.161


Capabilities

Both models support the following image-to-image operations:

Capability
Description

Image editing

Modify specific parts of an image based on a text instruction

Style transfer

Apply the visual style of one image to another

Background replacement

Swap the background while keeping the subject

Multi-image composition

Combine elements from multiple source images into one

Reference-based consistency

Keep a character, product, or style consistent across generations

Variations

Generate alternative versions of an existing image

Object placement

Insert a logo, product, or object into a scene

Appearance change

Change clothing, color, texture, or features

Search-grounded editing

Edit using real-world knowledge (Flash only)


Request Reference

Required Fields

Field
Type
Description

model

string

vertex/nano-banana-flash or vertex/nano-banana-pro

prompt

string (max 4000)

Editing instruction or description.

image_urls

string[]

One or more publicly accessible source image URLs.

Valid aspect_ratio Values


Use Cases & Payloads


1. Edit a Specific Part of an Image

Change a targeted element while leaving everything else untouched.

Flash — fast iteration:

Pro — production output:


2. Style Transfer

Apply the visual style, mood, or artistic look of a reference image to a source image.

Text-only style transfer (no reference image):


3. Replace Background

See the dedicated Replace Background guide for full documentation.

Quick reference:


4. Multi-Image Composition

Combine elements from multiple source images into a single coherent output.

Merge two scenes:

Extract and composite a product:


5. Character / Product Consistency with References

Keep a specific subject (person, character, product) visually consistent across multiple generations using reference_images.

Consistent character across scenes:

Consistent product in a lifestyle scene:


6. Image Upscale & Re-render

Re-render a low-quality or low-resolution image at higher quality with enhanced detail.

Upscale with artistic enhancement:


7. Outfit & Appearance Change

Modify what a subject is wearing or change their visual appearance.

Change outfit:

Change hair color:

Product color variant:


8. Logo / Object Placement into a Scene

Insert a product, logo, or object into an existing scene naturally.

Product mockup on billboard:


Prompt Writing Guide

State the edit clearly upfront

Start the prompt with what you want to change before describing the result.

Explicitly protect what should not change

The model won't know what to preserve unless you say so.

Separate the subject from the edit

Describe the original subject, then describe the change.

Last updated