How it Works ?
Step 1: Have a Conversation
Start a normal chat. Each assistant message is tagged with the model that generated it (e.g., "GPT-4.1", "Claude Sonnet 4.6"), so users always know which model they're interacting with.
Step 2: Regenerate with a Different Model (Branching)
Users can regenerate any assistant response using a different model. This creates a branch—an alternative response to the same question. The branch appears inline alongside the original, allowing users to compare them.
Users can generate up to 5 branches per message, building a collection of responses from different models. For example:
Original
GPT-4.1
Default response
Branch 2
Claude Sonnet 4.6
Different writing style
Branch 3
Gemini 2.5 Pro
Compare Google's approach
Branch 4
DeepSeek R1
Reasoning-focused model
Branch 5
Grok 4
Another perspective
Each branch is clearly labeled with its model for easy comparison.
Step 3: Continue from a Branch (Forking)
After comparing branches, users can continue the conversation from any branch. This creates a fork—a new thread that picks up from the selected branch. The original thread remains intact, allowing users to revisit or explore other branches.
Forks can be nested up to 5 levels deep, enabling exploratory conversation trees.
Per-Message Branch Indicators
Messages with branches display a count of alternative responses (e.g., "1 of 4"). Each response shows:
Model name (with logo).
Token usage (response cost).
Reasoning content (if supported by the model).
Thread Sidebar
Forked threads are hidden from the main list to keep the sidebar clean. They are accessible through the parent thread's conversation tree view.
Full Conversation Tree
The /full endpoint displays the entire conversation tree, showing where branches and forks occurred.
Last updated