OpenAI-Compatible Chat

OpenAI-compatible chat completions endpoint. Supports both standard (JSON) and streaming (SSE) responses. Rate-limited per API key.

Base URL

https://qolaba-server-b2b.up.railway.appqolaba-server-b2b.up.railway.app

Supported endpoints

Method

Path

Description

GET

/v1/models

List all available models

GET

/v1/models/{model}

Retrieve a specific model

POST

/api/v1/chat/completions

Create chat completions (streaming, tools, multimodal)

Chat Completions

POST /api/v1/chat/completions

Headers

Name

Value

Content-Type

application/json

Authorization

Bearer <token>

Body

Parameter

Type

Required

Description

model

string

Yes

The model ID to use for completion (e.g., "gpt-4o", "claude-3-5-sonnet"). Use GET /api/v1/chat/models to list available models.

messages

array

Yes

Ordered list of messages forming the conversation. Each object must have a role ("system", "user", or "assistant") and a content field (string or structured content array). Must contain at least one message.

stream

boolean

If true, the response is streamed as Server-Sent Events (SSE). Each event is a data: {...} line; the stream ends with data: [DONE]. Defaults to false.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more random output; lower values are more deterministic. Defaults to 1.

max_tokens

integer

Maximum number of tokens to generate in the response. Must be a positive integer. If omitted, the model's default limit applies.

top_p

number

Nucleus sampling threshold between 0 and 1. The model considers only the tokens comprising the top top_p probability mass. An alternative to temperature; avoid adjusting both simultaneously.

frequency_penalty

number

Penalizes tokens based on how frequently they have appeared so far. Positive values reduce repetition. Typically between -2.0 and 2.0.

presence_penalty

number

Penalizes tokens that have appeared at all in the conversation so far. Positive values encourage the model to introduce new topics. Typically between -2.0 and 2.0.

stop

string | string[]

One or up to 4 sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the output.

tool_choice

string | object

Controls which tool (if any) the model calls. Accepts "none", "auto".

stream_options

object

Additional options for streaming. Set { "include_usage": true } to receive a final SSE chunk containing token usage data. Automatically enabled when stream: true.

Request

{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "stream": false
}

Response (non-streaming)

{
  "id": "chatcmpl-4dd9fded095b4d86bceed1e91042a",
  "object": "chat.completion",
  "created": 1774095579,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "total_credit": 0.25,
    "total_cost": 0.0012441,
    "breakdown": {
      "model": {
        "model": "google/gemini-2.5-flash",
        "credit": 0.25,
        "cost": 0.001244
      }
    }
  }
}

Response (streaming)

When stream: true, the response is a stream of SSE events:

data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"Paris."},"finish_reason":null}]}

data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

data: {"usage":{"total_credit":0.25,"total_cost":0.0012441,"breakdown":{"model":{"model":"google/gemini-2.5-flash","credit":0.25,"cost":0.001244}}}}

Error Responses

Status

Condition

400

model is missing

400

messages is missing, not an array, or empty

400

temperature is outside 0–2

400

top_p is outside 0–1

400

max_tokens is not a positive integer

400

stop array contains more than 4 sequences

429

Rate limit exceeded

504

Request timed out (3-minute upstream limit)

502

Upstream provider error or invalid response

500

Internal server error

GET /api/v1/chat/models — Returns the list of available models

Run the API

To test this API, please use the following link:

https://app.theneo.io/api-runner/qolaba/ml-apis/api-reference/chat-copyapp.theneo.io

PreviousStreamChat NextBuilt-in Tools

Last updated 11 days ago

hashtagChat Completions

hashtagRequest

hashtagResponse (non-streaming)

hashtagResponse (streaming)

hashtagError Responses

hashtagRelated Endpoints

hashtagRun the API

Chat Completions

Request

Response (non-streaming)

Response (streaming)

Error Responses

Related Endpoints

Run the API