# OpenAI-Compatible Chat

OpenAI-compatible chat completions endpoint. Supports both standard (JSON) and streaming (SSE) responses. Rate-limited per API key.

**Base URL:**

{% embed url="<https://api.platform.qolaba.ai>" %}

**Supported endpoints**

| `GET`  | `/v1/chat/models`          | List all available models                              |
| ------ | -------------------------- | ------------------------------------------------------ |
| `GET`  | `/v1/chat/models/{model}`  | Retrieve a specific model                              |
| `POST` | `/api/v1/chat/completions` | Create chat completions (streaming, tools, multimodal) |

## Chat Completions

**POST** `/api/v1/chat/completions`

**Headers**

| Name          | Value              |
| ------------- | ------------------ |
| Content-Type  | `application/json` |
| Authorization | `Bearer <token>`   |

**Body**

| Parameter           | Type                 | Required | Description                                                                                                                                                                                                               |
| ------------------- | -------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`             | `string`             | **Yes**  | The model ID to use for completion. Use `GET /api/v1/chat/models` to list available models.                                                                                                                               |
| `messages`          | `array`              | **Yes**  | Ordered list of messages forming the conversation. Each object must have a `role` (`"system"`, `"user"`, or `"assistant"`) and a `content` field (string or structured content array). Must contain at least one message. |
| `stream`            | `boolean`            | No       | If `true`, the response is streamed as Server-Sent Events (SSE). Each event is a `data: {...}` line; the stream ends with `data: [DONE]`. Defaults to `false`.                                                            |
| `temperature`       | `number`             | No       | Sampling temperature between `0` and `2`. Higher values produce more random output; lower values are more deterministic. Defaults to `1`.                                                                                 |
| `max_tokens`        | `integer`            | No       | Maximum number of tokens to generate in the response. Must be a positive integer. If omitted, the model's default limit applies.                                                                                          |
| `top_p`             | `number`             | No       | Nucleus sampling threshold between `0` and `1`. The model considers only the tokens comprising the top `top_p` probability mass. An alternative to `temperature`; avoid adjusting both simultaneously.                    |
| `frequency_penalty` | `number`             | No       | Penalizes tokens based on how frequently they have appeared so far. Positive values reduce repetition. Typically between `-2.0` and `2.0`.                                                                                |
| `presence_penalty`  | `number`             | No       | Penalizes tokens that have appeared at all in the conversation so far. Positive values encourage the model to introduce new topics. Typically between `-2.0` and `2.0`.                                                   |
| `stop`              | `string \| string[]` | No       | One or up to **4** sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the output.                                                                              |
| `tool_choice`       | `string \| object`   | No       | Controls which tool (if any) the model calls. Accepts `"none"`, `"auto"`.                                                                                                                                                 |
| `stream_options`    | `object`             | No       | Additional options for streaming. Set `{ "include_usage": true }` to receive a final SSE chunk containing token usage data. Automatically enabled when `stream: true`.                                                    |

#### Request

```json
{
  "model": "google/gemini-2.5-flash",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ],
  "stream": false
}
```

#### Response (non-streaming)

{% tabs %}
{% tab title="200" %}

```json
{
  "id": "chatcmpl-4dd9fded095b4d86bceed1e91042a",
  "object": "chat.completion",
  "created": 1774095579,
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "total_credit": 0.25,
    "total_cost": 0.0012441,
    "breakdown": {
      "model": {
        "model": "google/gemini-2.5-flash",
        "credit": 0.25,
        "cost": 0.001244
      }
    }
  }
}
```

{% endtab %}
{% endtabs %}

#### Error Responses

| Status | Condition                                     |
| ------ | --------------------------------------------- |
| `400`  | `model` is missing                            |
| `400`  | `messages` is missing, not an array, or empty |
| `400`  | `temperature` is outside `0–2`                |
| `400`  | `top_p` is outside `0–1`                      |
| `400`  | `max_tokens` is not a positive integer        |
| `400`  | `stop` array contains more than 4 sequences   |
| `429`  | Rate limit exceeded                           |
| `504`  | Request timed out (3-minute upstream limit)   |
| `502`  | Upstream provider error or invalid response   |
| `500`  | Internal server error                         |

***

#### Related Endpoints

* `GET /api/v1/chat/models` — Returns the list of available models<br>

## Run the API

To test this API, please use the following link:

{% embed url="<https://app.theneo.io/api-runner/qolaba/ml-apis/api-reference/chat-copy>" %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qolaba.ai/api-platform/api-platform/openai-compatible-chat.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
