OpenAI-Compatible Chat

OpenAI-compatible chat completions endpoint. Supports both standard (JSON) and streaming (SSE) responses. Rate-limited per API key.

Base URL

Supported endpoints

Method
Path
Description

GET

/v1/models

List all available models

GET

/v1/models/{model}

Retrieve a specific model

POST

/api/v1/chat/completions

Create chat completions (streaming, tools, multimodal)

Chat Completions

POST /api/v1/chat/completions

Headers

Name
Value

Content-Type

application/json

Authorization

Bearer <token>

Body

Parameter
Type
Required
Description

model

string

Yes

The model ID to use for completion (e.g., "gpt-4o", "claude-3-5-sonnet"). Use GET /api/v1/chat/models to list available models.

messages

array

Yes

Ordered list of messages forming the conversation. Each object must have a role ("system", "user", or "assistant") and a content field (string or structured content array). Must contain at least one message.

stream

boolean

No

If true, the response is streamed as Server-Sent Events (SSE). Each event is a data: {...} line; the stream ends with data: [DONE]. Defaults to false.

temperature

number

No

Sampling temperature between 0 and 2. Higher values produce more random output; lower values are more deterministic. Defaults to 1.

max_tokens

integer

No

Maximum number of tokens to generate in the response. Must be a positive integer. If omitted, the model's default limit applies.

top_p

number

No

Nucleus sampling threshold between 0 and 1. The model considers only the tokens comprising the top top_p probability mass. An alternative to temperature; avoid adjusting both simultaneously.

frequency_penalty

number

No

Penalizes tokens based on how frequently they have appeared so far. Positive values reduce repetition. Typically between -2.0 and 2.0.

presence_penalty

number

No

Penalizes tokens that have appeared at all in the conversation so far. Positive values encourage the model to introduce new topics. Typically between -2.0 and 2.0.

stop

string | string[]

No

One or up to 4 sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the output.

tool_choice

string | object

No

Controls which tool (if any) the model calls. Accepts "none", "auto".

stream_options

object

No

Additional options for streaming. Set { "include_usage": true } to receive a final SSE chunk containing token usage data. Automatically enabled when stream: true.

Request

Response (non-streaming)

Response (streaming)

When stream: true, the response is a stream of SSE events:

Error Responses

Status
Condition

400

model is missing

400

messages is missing, not an array, or empty

400

temperature is outside 0–2

400

top_p is outside 0–1

400

max_tokens is not a positive integer

400

stop array contains more than 4 sequences

429

Rate limit exceeded

504

Request timed out (3-minute upstream limit)

502

Upstream provider error or invalid response

500

Internal server error


  • GET /api/v1/chat/models — Returns the list of available models

Run the API

To test this API, please use the following link:

Last updated