Documentation

One OpenAI-compatible endpoint for every model. If you've used the OpenAI SDK, you already know VietToken — just change the base URL.

Overview

VietToken is a gateway that routes your requests to dozens of LLM providers behind a single API and key. It speaks the OpenAI Chat Completions format, streams tokens straight through (low latency), and fails over automatically across keys and providers.

Base URL https://api.viettoken.app/v1

Key ideas

One endpoint: switch models by changing the model string — no new SDK, no re-auth.
Streaming first: Server-Sent Events pass through unbuffered for fast first tokens.
Failover: if a model or key fails, traffic reroutes within the same group.

Quickstart

From sign-up to your first streamed token in under a minute.

Create an account

Add credits

Top up once and spend on any model. No subscription.

Create an API key

Dashboard → API Keys → Create. Copy it once and store it safely.

Make your first request

Point your OpenAI SDK at the VietToken base URL and call any model.

request.sh

# Chat completion
curl https://api.viettoken.app/v1/chat/completions \
  -H "Authorization: Bearer $VIETTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role":"user","content":"Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.viettoken.app/v1",
    api_key="$VIETTOKEN_API_KEY",
)

resp = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.viettoken.app/v1",
  apiKey: process.env.VIETTOKEN_API_KEY,
});

const resp = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);

Authentication

Authenticate every request with a Bearer token in the Authorization header. Create and revoke keys in the dashboard. Treat keys like passwords — never ship them in client-side code.

header

Authorization: Bearer $VIETTOKEN_API_KEY

Tip: Store the key in an environment variable (e.g. VIETTOKEN_API_KEY) and load it at runtime.

Chat completions

The core endpoint. Send a list of messages and a model id; get a completion back. Fully OpenAI-compatible — temperature, max_tokens, tools, JSON mode and more all work.

Parameter	Description
`model`	Model id, e.g. `anthropic/claude-sonnet-4`.
`messages`	Conversation as role/content objects.
`stream`	Set `true` to stream tokens via SSE.
`temperature`	Sampling randomness (0–2). Optional.

Streaming

Set stream: true to receive tokens as Server-Sent Events. VietToken passes the stream through unbuffered, so first tokens arrive fast.

stream.py

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role":"user","content":"Tell me a story"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

List models

Fetch every model available to your key. Use it to populate dropdowns or validate a model id before calling.

models.sh

curl https://api.viettoken.app/v1/models \
  -H "Authorization: Bearer $VIETTOKEN_API_KEY"

Prefer a visual list? Browse the full catalog with prices on the Models page.

Errors

VietToken uses standard HTTP status codes. Error bodies follow the OpenAI shape with a message and type.

Code	Meaning
`401`	Invalid or missing API key.
`402`	Insufficient credits — top up to continue.
`429`	Rate limited — retry with backoff.
`5xx`	Upstream issue — VietToken auto-retries the next provider.

Custom providers

Bring your own model: add any OpenAI-compatible endpoint (self-hosted vLLM/Ollama, a private deployment, or another gateway) as a custom provider in the dashboard. Give it a base URL and key, then call it by its model id like any other.

Rate limits

Limits depend on your plan and balance. The gateway load-balances across multiple keys per provider and fails over on 429/5xx, so you get higher effective throughput than a single upstream key.