How it Works ?

Step 1: Have a Conversation

Start a normal chat. Each assistant message is tagged with the model that generated it (e.g., "GPT-4.1", "Claude Sonnet 4.6"), so users always know which model they're interacting with.

Step 2: Regenerate with a Different Model (Branching)

Users can regenerate any assistant response using a different model. This creates a branch—an alternative response to the same question. The branch appears inline alongside the original, allowing users to compare them.

Users can generate up to 5 branches per message, building a collection of responses from different models. For example:

Original

GPT-4.1

Default response

Branch 2

Claude Sonnet 4.6

Different writing style

Branch 3

Gemini 2.5 Pro

Compare Google's approach

Branch 4

DeepSeek R1

Reasoning-focused model

Branch 5

Grok 4

Another perspective

Each branch is clearly labeled with its model for easy comparison.

Step 3: Continue from a Branch (Forking)

After comparing branches, users can continue the conversation from any branch. This creates a fork—a new thread that picks up from the selected branch. The original thread remains intact, allowing users to revisit or explore other branches.

Forks can be nested up to 5 levels deep, enabling exploratory conversation trees.


Per-Message Branch Indicators

Messages with branches display a count of alternative responses (e.g., "1 of 4"). Each response shows:

  • Model name (with logo).

  • Token usage (response cost).

  • Reasoning content (if supported by the model).

Thread Sidebar

Forked threads are hidden from the main list to keep the sidebar clean. They are accessible through the parent thread's conversation tree view.

Full Conversation Tree

The /full endpoint displays the entire conversation tree, showing where branches and forks occurred.

Last updated