OpenAI-Compatible Chat
OpenAI-compatible chat completions endpoint. Supports both standard (JSON) and streaming (SSE) responses. Rate-limited per API key.
Base URL
Supported endpoints
GET
/v1/models
List all available models
GET
/v1/models/{model}
Retrieve a specific model
POST
/api/v1/chat/completions
Create chat completions (streaming, tools, multimodal)
Chat Completions
POST /api/v1/chat/completions
Headers
Content-Type
application/json
Authorization
Bearer <token>
Body
model
string
Yes
The model ID to use for completion (e.g., "gpt-4o", "claude-3-5-sonnet"). Use GET /api/v1/chat/models to list available models.
messages
array
Yes
Ordered list of messages forming the conversation. Each object must have a role ("system", "user", or "assistant") and a content field (string or structured content array). Must contain at least one message.
stream
boolean
No
If true, the response is streamed as Server-Sent Events (SSE). Each event is a data: {...} line; the stream ends with data: [DONE]. Defaults to false.
temperature
number
No
Sampling temperature between 0 and 2. Higher values produce more random output; lower values are more deterministic. Defaults to 1.
max_tokens
integer
No
Maximum number of tokens to generate in the response. Must be a positive integer. If omitted, the model's default limit applies.
top_p
number
No
Nucleus sampling threshold between 0 and 1. The model considers only the tokens comprising the top top_p probability mass. An alternative to temperature; avoid adjusting both simultaneously.
frequency_penalty
number
No
Penalizes tokens based on how frequently they have appeared so far. Positive values reduce repetition. Typically between -2.0 and 2.0.
presence_penalty
number
No
Penalizes tokens that have appeared at all in the conversation so far. Positive values encourage the model to introduce new topics. Typically between -2.0 and 2.0.
stop
string | string[]
No
One or up to 4 sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the output.
tool_choice
string | object
No
Controls which tool (if any) the model calls. Accepts "none", "auto".
stream_options
object
No
Additional options for streaming. Set { "include_usage": true } to receive a final SSE chunk containing token usage data. Automatically enabled when stream: true.
Request
Response (non-streaming)
Response (streaming)
When stream: true, the response is a stream of SSE events:
Error Responses
400
model is missing
400
messages is missing, not an array, or empty
400
temperature is outside 0–2
400
top_p is outside 0–1
400
max_tokens is not a positive integer
400
stop array contains more than 4 sequences
429
Rate limit exceeded
504
Request timed out (3-minute upstream limit)
502
Upstream provider error or invalid response
500
Internal server error
Related Endpoints
GET /api/v1/chat/models— Returns the list of available models
Run the API
To test this API, please use the following link:
Last updated