# OpenAI-Compatible Chat

OpenAI-compatible chat completions endpoint. Supports both standard (JSON) and streaming (SSE) responses. Rate-limited per API key.

**Base URL**

{% embed url="<https://qolaba-server-b2b.up.railway.app>" %}

**Supported endpoints**

| Method | Path                       | Description                                            |
| ------ | -------------------------- | ------------------------------------------------------ |
| `GET`  | `/v1/chat/models`          | List all available models                              |
| `GET`  | `/v1/chat/models/{model}`  | Retrieve a specific model                              |
| `POST` | `/api/v1/chat/completions` | Create chat completions (streaming, tools, multimodal) |

## Chat Completions

**POST** `/api/v1/chat/completions`

**Headers**

| Name          | Value              |
| ------------- | ------------------ |
| Content-Type  | `application/json` |
| Authorization | `Bearer <token>`   |

**Body**<br>

| Parameter           | Type                 | Required | Description                                                                                                                                                                                                               |
| ------------------- | -------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`             | `string`             | **Yes**  | The model ID to use for completion (e.g., `"gpt-4o"`, `"claude-3-5-sonnet"`). Use `GET /api/v1/chat/models` to list available models.                                                                                     |
| `messages`          | `array`              | **Yes**  | Ordered list of messages forming the conversation. Each object must have a `role` (`"system"`, `"user"`, or `"assistant"`) and a `content` field (string or structured content array). Must contain at least one message. |
| `stream`            | `boolean`            | No       | If `true`, the response is streamed as Server-Sent Events (SSE). Each event is a `data: {...}` line; the stream ends with `data: [DONE]`. Defaults to `false`.                                                            |
| `temperature`       | `number`             | No       | Sampling temperature between `0` and `2`. Higher values produce more random output; lower values are more deterministic. Defaults to `1`.                                                                                 |
| `max_tokens`        | `integer`            | No       | Maximum number of tokens to generate in the response. Must be a positive integer. If omitted, the model's default limit applies.                                                                                          |
| `top_p`             | `number`             | No       | Nucleus sampling threshold between `0` and `1`. The model considers only the tokens comprising the top `top_p` probability mass. An alternative to `temperature`; avoid adjusting both simultaneously.                    |
| `frequency_penalty` | `number`             | No       | Penalizes tokens based on how frequently they have appeared so far. Positive values reduce repetition. Typically between `-2.0` and `2.0`.                                                                                |
| `presence_penalty`  | `number`             | No       | Penalizes tokens that have appeared at all in the conversation so far. Positive values encourage the model to introduce new topics. Typically between `-2.0` and `2.0`.                                                   |
| `stop`              | `string \| string[]` | No       | One or up to **4** sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the output.                                                                              |
| `tool_choice`       | `string \| object`   | No       | Controls which tool (if any) the model calls. Accepts `"none"`, `"auto"`.                                                                                                                                                 |
| `stream_options`    | `object`             | No       | Additional options for streaming. Set `{ "include_usage": true }` to receive a final SSE chunk containing token usage data. Automatically enabled when `stream: true`.                                                    |

#### Request

```json
{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "stream": false
}
```

#### Response (non-streaming)

{% tabs %}
{% tab title="200" %}

```json
{
  "id": "chatcmpl-4dd9fded095b4d86bceed1e91042a",
  "object": "chat.completion",
  "created": 1774095579,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "total_credit": 0.25,
    "total_cost": 0.0012441,
    "breakdown": {
      "model": {
        "model": "google/gemini-2.5-flash",
        "credit": 0.25,
        "cost": 0.001244
      }
    }
  }
}
```

{% endtab %}
{% endtabs %}

#### Response (streaming)

When `stream: true`, the response is a stream of SSE events:

{% tabs %}
{% tab title="200" %}

```json
data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{"content":"Paris."},"finish_reason":null}]}

data: {"id":"chatcmpl-fed9ce60c8d6469aafb665a11d505","object":"chat.completion.chunk","created":1774095604,"model":"google/gemini-2.5-flash","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

data: {"usage":{"total_credit":0.25,"total_cost":0.0012441,"breakdown":{"model":{"model":"google/gemini-2.5-flash","credit":0.25,"cost":0.001244}}}}

```

{% endtab %}
{% endtabs %}

#### Error Responses

| Status | Condition                                     |
| ------ | --------------------------------------------- |
| `400`  | `model` is missing                            |
| `400`  | `messages` is missing, not an array, or empty |
| `400`  | `temperature` is outside `0–2`                |
| `400`  | `top_p` is outside `0–1`                      |
| `400`  | `max_tokens` is not a positive integer        |
| `400`  | `stop` array contains more than 4 sequences   |
| `429`  | Rate limit exceeded                           |
| `504`  | Request timed out (3-minute upstream limit)   |
| `502`  | Upstream provider error or invalid response   |
| `500`  | Internal server error                         |

***

#### Related Endpoints

* `GET /api/v1/chat/models` — Returns the list of available models<br>

## Run the API

To test this API, please use the following link:

{% embed url="<https://app.theneo.io/api-runner/qolaba/ml-apis/api-reference/chat-copy>" %}
