Python SDK (`sociaro-ai`)

One OpenAI-compatible API for every model behind the gateway — LLM chat (with streaming), image generation/editing, async video — plus client-side fallback chains, a typed /gw management client, and typed errors.

pip install sociaro-ai

Requires Python 3.10+. This page is the complete reference: every constructor parameter, every resource method, the retry/fallback model, the response objects, and the full /gw management surface are documented here.

Quick start

from sociaro_ai import Sociaro

client = Sociaro(api_key="gw_live_...")   # or set SOCIARO_API_KEY env var

response = client.chat.completions.create(
    model="alibaba/qwen3.7-max",            # always a catalog slug "author/model"
    messages=[{"role": "user", "content": "What is the speed of light?"}],
)
print(response.choices[0].message.content)

model is always a catalog slug "author/model" (see the model catalog) — the gateway’s doors resolve the slug and route to wherever inference runs. The author only selects the request format client-side: anthropic/... models are translated to the Anthropic Messages format and sent to POST /v1/messages; every other author (including ones the SDK has never heard of) uses the OpenAI format via POST /v1/chat/completions. Unknown slugs get the server’s 404/400 — the SDK does not pre-validate them.

Client configuration

Sociaro (sync) and AsyncSociaro (async) take the same keyword arguments:

Parameter	Type	Default	Notes
`api_key`	`str`	`""` → `SOCIARO_API_KEY`	The gateway key. If empty, the client reads the `SOCIARO_API_KEY` environment variable. If neither is set, the constructor raises `AuthError` (status 401) immediately — before any request.
`base_url`	`str`	`https://api.sociaro.com`	Gateway base URL. Trailing slashes are stripped. Override for local development or a region endpoint.
`timeout`	`float`	`60.0`	Per-request timeout in seconds, applied to every HTTP attempt (connect + read).
`max_retries`	`int`	`2`	Transport-level retries for transient failures (see Retries and timeouts). Total attempts per chain entry are `max_retries + 1`.
`default_attribution`	`dict[str, str] \| None`	`None`	Attribution tags applied to every request as `X-Attr-*` headers (see Attribution). Per-call `attribution=` is merged on top.
`models`	`dict[str, list[str]] \| None`	`None`	Alias registry mapping a logical name to an ordered fallback chain of catalog slugs (see Fallback chains).
`fallback_on_rate_limit`	`bool`	`True`	Whether a `429` advances the fallback chain to the next entry. With it off, a `429` is terminal.

client = Sociaro(
    api_key="gw_live_...",
    base_url="http://localhost:8080",   # local gateway
    timeout=120.0,
    max_retries=4,
    default_attribution={"team": "product"},
    models={"smart": ["openai/gpt-4o", "anthropic/claude-sonnet-4-6"]},
    fallback_on_rate_limit=True,
)

The api_key never appears in any repr()/str() of the client, config, or transport.

Reading the key from the environment

import os
os.environ["SOCIARO_API_KEY"] = "gw_live_..."

client = Sociaro()   # reads SOCIARO_API_KEY automatically

Retries and timeouts

The SDK has two distinct retry layers — keep them separate when reasoning about behaviour:

Transport-level retries (max_retries) — applied per single chain entry, inside one HTTP call. The transport retries with exponential backoff of [0.5, 1.0, 2.0] seconds (the last value repeats if max_retries > 3).
- Retryable: connect errors, read timeouts, and HTTP 502 / 503 / 504.
- Non-retryable (raised immediately, no transport retry): 400, 401, 403, 404, 422, 429.
- Total attempts for one entry are max_retries + 1 (default: 3).
- If every attempt fails with a connect/timeout error, the transport surfaces it as an APIError with status_code == 0.
Chain fallback — only after a chain entry has exhausted its transport retries does the SDK decide whether to advance to the next entry in the alias chain (see Fallback chains).

So a 503 on openai/gpt-4o is first retried up to max_retries times against openai/gpt-4o itself; only if all of those fail does the chain advance to the next slug.

Streaming

for chunk in client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Count to 5."}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Any extra OpenAI-compatible parameter (temperature, max_tokens, tools, response_format, …) is passed through as a keyword argument and forwarded to the door verbatim.

Images

images.generate goes through the POST /v1/images/generations door, which serves image-kind slugs by their catalog slug — some upstreams are OpenAI-shaped pass-through (e.g. Seedream), others are translated to/from the provider’s native shape (e.g. Qwen-Image via DashScope) — not a 400. Image generation is synchronous. A few image slugs whose upstream has no OpenAI-compatible images path are not reachable through the door (see media doors). Video generation (async) has its own helpers; see media doors and async jobs.

img = client.images.generate(
    model="bytedance/seedream-5-0",
    prompt="A red fox in snow",
    n=1,                       # number of images (default 1)
    size="1024x1024",         # optional; omitted if None
    # any extra OpenAI param is forwarded, e.g. quality="hd", response_format="b64_json"
)
print(img.data[0].url)

The response is an ImagesResponse. Each entry in data[] carries either a url or a b64_json string, depending on the provider / response_format:

img = client.images.generate(
    model="bytedance/seedream-5-0",
    prompt="A red fox in snow",
    response_format="b64_json",
)
import base64
raw = base64.b64decode(img.data[0].b64_json)
open("fox.png", "wb").write(raw)

Image editing

images.edit is the one escape hatch: it stays on the native /{author}/v1/images/edits prefix (the doors do not accept multipart uploads) and supports openai/... models only. Note: no openai image model is in the live catalog yet, so images.edit is currently unusable in production — the examples below show the API shape only (the openai/<image-model> slug is a placeholder).

edited = client.images.edit(
    model="openai/<image-model>",
    image=open("original.png", "rb"),   # bytes, path str/Path, or binary file-like
    mask=open("mask.png", "rb"),        # optional, same accepted input types
    prompt="Add a sunset",
    # extra params (e.g. n=1, size="1024x1024") are forwarded as form fields
)
print(edited.data[0].url)

image and mask each accept raw bytes, a filesystem path (str or pathlib.Path), or an open binary file object — the SDK reads the bytes for you, so all of these work:

client.images.edit(model="openai/<image-model>", image=b"...png bytes...", prompt="...")
client.images.edit(model="openai/<image-model>", image="original.png", prompt="...")
client.images.edit(model="openai/<image-model>", image=pathlib.Path("original.png"), prompt="...")

Video (async jobs)

Video is called through the standard POST /v1/videos door by catalog slug author/model (e.g. bytedance/seedance-1-5-pro, alibaba/wan2.7-t2v) — the same slug convention as chat and images. The door resolves the slug and routes to wherever inference runs. The SDK drives the create→poll→result loop itself.

video.generate blocks until the job reaches a terminal state:

result = client.video.generate(
    model="bytedance/seedance-1-5-pro",     # catalog slug "author/model"
    prompt="A timelapse of cherry blossoms blooming",
    seconds=5,
    size="1280x720",  # OpenAI-style "WxH"; the door buckets by short side
    interval=2.0,     # initial poll interval in seconds (default 2.0)
    timeout=300.0,    # max wait in seconds (default 300)
    on_progress=lambda status: print("status:", status),
)
print(result.url)          # primary video URL
print(result.urls)         # full list of URLs
print(result.duration)
print(result.resolution)

The result is a VideoResult.

Image-to-video

Pass image= (a URL or data: URI string) to either generate or submit to do image-to-video. The SDK forwards it through the door to the model:

result = client.video.generate(
    model="bytedance/seedance-1-5-pro",
    prompt="Pan slowly across the scene",
    image="https://example.com/first-frame.png",
    seconds=5,
)

Poll loop

The poll loop backs off exponentially with jitter to avoid hammering the gateway on long jobs:

The first sleep is interval (default 2.0s), jittered to [0.5×, 1.0×].
Each subsequent sleep grows by a factor of 1.5, capped at 15.0s (max_interval), and is jittered the same way.
Polling stops when the job reaches a terminal state or timeout (default 300.0s) elapses — a timeout raises AsyncJobTimeout.

Non-blocking `submit` and the `Job` handle

job = client.video.submit(
    model="bytedance/seedance-1-5-pro",
    prompt="Ocean waves",
    seconds=5,
)
print(job.task_id)

submit returns a Job (or AsyncJob on AsyncSociaro) with:

Method	Returns	Behaviour
`status()`	`str`	One poll. Returns the provider’s raw status string.
`wait(interval=2.0, timeout=300.0, on_progress=None)`	`VideoResult`	Polls to a terminal state, returns and caches the result. Idempotent — calling `wait()` again returns the cached result without re-polling. Raises `AsyncJobFailed` on a failed/cancelled job, `AsyncJobTimeout` on timeout.
`result()`	`VideoResult`	Returns the cached result. Raises `RuntimeError("job result not available yet — call wait() first")` if `wait()` has not completed. It never polls — it is cached-only.

result = job.wait(interval=2.0, timeout=300.0, on_progress=lambda s: print(s))
print(result.url)

# Later, without re-polling:
same = job.result()   # the cached VideoResult

Video fallback is intentionally limited: a multi-step job cannot fail over mid-job, so only the first slug in a chain is used for the CREATE step and the whole job runs against that provider.

Fallback chains and model aliases

Define logical model aliases backed by an ordered list of catalog slugs; the SDK advances down the chain when an entry fails with a retryable error:

client = Sociaro(
    api_key="gw_live_...",
    models={
        "smart": ["openai/gpt-4o", "anthropic/claude-sonnet-4-6", "alibaba/qwen3.7-max"],
        "fast":  ["openai/gpt-4o-mini", "alibaba/qwen3.5-flash"],
    },
)
response = client.chat.completions.create(model="smart", messages=[...])

Fallback policy — which errors advance the chain to the next entry:

Error	Advances the chain?
`ProviderUnavailableError` (`502` / `503` / `504`)	Yes — upstream is temporarily down.
Escaped connect / timeout error (surfaced as `APIError` with `status_code == 0`)	Yes.
`RateLimitError` (`429`)	Only when `fallback_on_rate_limit=True` (the default). With the knob off, `429` is terminal.
`BadRequestError` (`400` / `404` / `422`)	No — terminal, raises immediately.
`AuthError` (`401` / `403`)	No — terminal, raises immediately.
Any other `APIError` — including a plain `500` / `501`	No — terminal, raises immediately.

Note that a plain 500 is not retryable at the fallback layer: only 502 / 503 / 504 (mapped to ProviderUnavailableError) and escaped connect/timeout errors advance the chain. “5xx” is not a blanket rule.

If every entry of a multi-entry chain fails with a retryable error, AllProvidersFailed aggregates the per-entry errors (its .errors list holds one wrapped SociaroError per slug). A single-entry chain (an explicit slug, no alias) re-raises the original error unwrapped — there is nothing to aggregate. Streaming fallback happens only before the first chunk is yielded; once data has reached the consumer the entry is committed and later errors propagate unchanged.

How `model` is resolved

resolve(model, models) turns the model argument into the ordered chain:

If model contains a / it is treated as an explicit catalog slug and used verbatim — the alias registry is ignored, even if a same-named alias exists. The author is the part before the first /.
Otherwise model is looked up in the models alias registry and expands to its configured chain.
A bare name that is neither a slug nor a known alias raises BadRequestError client-side (before any request).
An alias mapped to an empty list raises BadRequestError.

Attribution

client = Sociaro(api_key="gw_live_...", default_attribution={"team": "product"})

client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[...],
    attribution={"feature": "chat"},   # → X-Attr-Feature header
)

Default and per-call attribution are merged (per-call wins on key collisions). Each pair becomes an X-Attr-<key>: <value> header. Tags show up as dimensions in GET /gw/stats.

Value constraints: attribution values must be ASCII-only and contain no CR or LF characters. The SDK validates this before any network I/O and raises ValueError on a violation — so a non-ASCII or newline-bearing tag fails fast, locally, rather than corrupting a header.

Response objects

Every response is a Model — a thin dict subclass that exposes keys as attributes and as dict items. Both styles work, and the underlying object is a plain dict, so **resp round-tripping and JSON serialisation keep working:

resp = client.chat.completions.create(model="openai/gpt-4o", messages=[...])

resp.choices[0].message.content              # attribute access (OpenAI-like)
resp["choices"][0]["message"]["content"]     # dict-style access
resp.get("usage")                            # .get() with default, returns None if absent
{**resp}                                     # plain dict round-trip

Nested dicts/lists are wrapped lazily on access, so attribute access works at any depth. Accessing a missing attribute raises AttributeError.

Typed result classes per resource (all are Model subclasses):

Class	Returned by	Shape
`ChatCompletion`	`chat.completions.create` (non-stream)	OpenAI chat completion
`ChatCompletionChunk`	`chat.completions.create(stream=True)`	OpenAI streaming chunk
`ImagesResponse`	`images.generate` / `images.edit`	`{"created": int, "data": [{"url": str} \| {"b64_json": str}]}`
`VideoResult`	`video.generate` / `Job.wait` / `Job.result`	`{"url": str \| None, "urls": [str], "duration": ..., "resolution": ..., "task_id": str, "raw": {...}}` plus model-specific fields (e.g. `completion_tokens` for ByteDance Seedance)

The /gw management types are listed under the management reference.

Errors

from sociaro_ai import (
    Sociaro, SociaroError, APIError,
    AuthError, RateLimitError, BadRequestError, ProviderUnavailableError,
    AsyncJobFailed, AsyncJobTimeout, AllProvidersFailed,
)

Exception hierarchy:

SociaroError
├── APIError                    HTTP error from the gateway
│   ├── AuthError               401 / 403
│   ├── RateLimitError          429
│   ├── BadRequestError         400 / 404 / 422
│   └── ProviderUnavailableError  502 / 503 / 504 (after transport retries)
├── AsyncJobFailed              async media job reached terminal failure
├── AsyncJobTimeout             async media job did not finish within timeout
└── AllProvidersFailed          every entry in the fallback chain failed

Useful fields:

APIError.status_code (int; 0 for an escaped connect/timeout), APIError.body (the raw response body, truncated to 2000 chars), APIError.provider (str | None).
AsyncJobFailed.job_id, AsyncJobFailed.reason.
AsyncJobTimeout.job_id, AsyncJobTimeout.timeout (the timeout in seconds).
AllProvidersFailed.errors — a list of per-entry SociaroErrors (route: error messages; the underlying error is on each .__cause__).

/gw error responses carry the gateway’s {"error": {"type", "message"}} envelope in APIError.body; extract it with parse_gw_error:

from sociaro_ai import APIError, parse_gw_error

try:
    client.gw.keys.create(scopes=["inference:use", "keys:manage"])
except APIError as e:
    print(parse_gw_error(e))   # {"type": ..., "message": ...} or None

Async client

AsyncSociaro mirrors Sociaro exactly — same constructor params, same resources, same method signatures — but all resource methods are async. Use it with FastAPI, aiohttp, asyncio, etc.:

import asyncio
from sociaro_ai import AsyncSociaro

async def main() -> None:
    async with AsyncSociaro(api_key="gw_live_...") as client:
        response = await client.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{"role": "user", "content": "Hello"}],
        )
        print(response.choices[0].message.content)

        stream = await client.chat.completions.create(
            model="openai/gpt-4o",
            messages=[{"role": "user", "content": "Count to 5."}],
            stream=True,
        )
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

The async ledger group is gw.async_ (trailing underscore — async is a Python keyword; the TypeScript SDK calls it gw.async). On AsyncSociaro, gw.async_.iterate(...) and gw.evidence(...) are async iterators.

Lifecycle and context managers

Both clients hold a pooled HTTP connection; close it when done. The context manager protocol does this for you:

# sync
with Sociaro(api_key="gw_live_...") as client:
    client.chat.completions.create(model="openai/gpt-4o", messages=[...])

# async
async with AsyncSociaro(api_key="gw_live_...") as client:
    await client.chat.completions.create(model="openai/gpt-4o", messages=[...])

Or close explicitly without a with block:

client = Sociaro(api_key="gw_live_...")
# ...
client.close()              # sync

aclient = AsyncSociaro(api_key="gw_live_...")
# ...
await aclient.aclose()      # async

Surface support matrix

Surface	Endpoint	Notes
LLM chat + streaming	`POST /v1/chat/completions` door	Any author (default OpenAI format), `Authorization: Bearer` auth
LLM chat + streaming (`anthropic/...`)	`POST /v1/messages` door	SDK translates OpenAI shape ⇄ Anthropic Messages; uses `x-api-key` auth (the door also accepts Bearer)
Image generation	`POST /v1/images/generations` door	Image-kind slugs by slug (some upstreams pass-through, some translated); a few slugs whose upstream has no OpenAI-compatible images path are not reachable via the door
Image edit	`/openai/v1/images/edits` native prefix	`openai/...` models only (the doors do not accept multipart uploads) — no openai image model is in the live catalog yet, so this is currently unusable in prod
Video generation	`POST /v1/videos` door	Video-kind slugs by catalog slug (e.g. `bytedance/seedance-1-5-pro`, `alibaba/wan2.7-t2v`); async `create → poll → result` driven by the SDK.

The model author only selects the request format client-side; the gateway door rewrites the body’s model field to the provider-native id server-side.
The catalog (slugs, authors, modalities) is served by GET /v1/models. New authors and modalities are added in the catalog without SDK changes — unknown authors use the default OpenAI format automatically.

`/gw` management client reference

client.gw is a typed client for the /gw self-service API — key issuance, sub-accounts, budgets, white-label branding, stats/usage, the async-media ledger and the evidence export — available on both Sociaro (sync) and AsyncSociaro (same methods, awaited). All calls authenticate with the client’s own gw key (Bearer), are tenant-scoped server-side, and do not use routing/fallback (a single management target); transport-level retries on transient 502/503/504/network errors still apply.

Absent optional params are omitted from the wire entirely — a missing query filter and an empty one are not the same to the gateway.

Identity

client.gw.me()        # → GWMeResponse {"scopes": [...], "entitlements": [...]}
client.gw.ceiling()   # → GWCeilingResponse {"max_scopes": [...], "entitlements": [...]}

ceiling() is the client-level maximum a child key may be granted; a keys.create request that exceeds it is rejected with 403.

Keys — `client.gw.keys`

Method	Wire	Params
`list()`	`GET /gw/keys`	— Returns `list[GWKeyInfo]`; never the secret.
`create(scopes, entitlements=None, sub_account_id=None)`	`POST /gw/keys`	See below. Returns `GWKeyCreated`.
`revoke(id)`	`DELETE /gw/keys/{id}`	`204`; `404` if unknown. Returns `None`.

create parameters:

scopes: list[str] (required) — the scopes granted to the child key, e.g. ["inference:use"]. Must fit the client ceiling.
entitlements: list[dict] | None — each entry is {"provider", "model_pattern", "effect"} with effect "allow"/"deny". The wire semantics matter:
- None (the default) → the key is omitted from the body and the child key inherits the client’s full ceiling.
- [] (an explicit empty list) → a deliberate deny-all on inference (the key can manage but not infer).
- A non-empty list → exactly those entitlement rules.
sub_account_id: str | None — bind the key to a sub-account (must exist, belong to the client, and be active).

The returned created.key is the plaintext secret, available only in this response — the gateway stores a hash and cannot show it again. Store it now.

Sub-accounts — `client.gw.subaccounts`

Method	Wire	Params
`list()`	`GET /gw/subaccounts`	Returns `list[GWSubAccount]`.
`create(name, slug)`	`POST /gw/subaccounts`	`slug` is `[a-z0-9-]{1,64}`; `409` on duplicate. Returns `GWSubAccount` (with `markup_pct` starting at 0).
`update(id, status=None, name=None, markup_pct=None)`	`PATCH /gw/subaccounts/{id}`	`204`. At least one field required (`400` otherwise).

update parameter values:

status — "active" or "suspended". Suspending immediately blocks all keys bound to the sub-account.
markup_pct: float — reseller resale markup in percent, range 0..10000. The end customer’s billed price is cost * (1 + markup_pct / 100).

Budgets — `client.gw.budgets`

Method	Wire	Params
`list()`	`GET /gw/budgets`	Returns `list[GWBudget]` (level, target, limit, current spend).
`set(scope, period, limit_usd, id=None)`	`PUT /gw/budgets`	`204`.

set parameter values:

scope — "client", "sub_account" or "key".
period — "month" or "day".
limit_usd: float — the spending cap in USD.
id — the target id for sub_account/key scope (must belong to the client, 404 otherwise). For client scope id must be left None (the target is always the calling client; a non-empty id is rejected with 400).

Branding — `client.gw.branding`

Method	Wire	Notes
`get()`	`GET /gw/branding`	Returns `GWBrandingConfig`. `404` (→ `BadRequestError`) while branding has never been configured.
`set(portal_slug, product_name, logo_url=None, accent_color=None, allow_self_issue=None)`	`PUT /gw/branding`	Full upsert; returns the stored `GWBrandingConfig`.

set is a full upsert — when logo_url/accent_color are left as None (omitted), the stored values remain unchanged. To clear stored values, pass empty strings ("") instead. allow_self_issue left as None defaults to False. Constraints: logo_url must start with https://, accent_color is #RRGGBB. 409 if portal_slug is taken by another client.

Stats and usage

client.gw.stats(group_by=None, since=None, provider=None)
# → list[GWStatsRow] {"key", "requests", "total_tokens", "total_cost_usd"}

client.gw.usage(provider=None, parse_status=None, since=None, limit=None)
# → list[GWUsageRow], newest first

stats (GET /gw/stats): group_by is the aggregation key (gateway default "provider"); since is a look-back window like "7d", "24h", "90m". Costs are the client’s raw gateway costs — sub-account markup is not applied here.
usage (GET /gw/usage): since is a window like "7d"; limit is 1..1000 (gateway default 100); parse_status filters usage-parse status. Each GWUsageRow is {"id", "client_id", "provider", "model", "cost_usd", "total_tokens", "status_code", "latency_ms", "parse_status", "attribution", "created_at"}.

Async-media ledger — `client.gw.async_`

page = client.gw.async_.list(
    scope=None,            # "key" (default) | "client"
    status=None,           # task status filter
    kind=None,             # "video" | "image" | "3d" | "audio"
    created_after=None,    # RFC 3339 timestamp
    created_before=None,   # RFC 3339 timestamp
    limit=None,            # 1..500 (gateway default 100)
    cursor=None,           # opaque cursor from a previous page's next_cursor
)
# page → GWAsyncPage {"data": [GWAsyncTask], "next_cursor": str}

for task in client.gw.async_.iterate(kind="video"):   # auto-follows next_cursor
    print(task.task_id, task.status, task.final_cost_usd)

list (GET /gw/async) fetches one page; iterate auto-follows next_cursor until the gateway returns an empty cursor, applying the same filters to every page. Each GWAsyncTask is {"task_id", "provider", "model", "kind", "status", "provisional_cost_usd", "final_cost_usd", "expiration_reason", "attribution", "created_at", "finalized_at"} (final_cost_usd is null until the job is finalized). On AsyncSociaro, iterate is an async iterator.

Evidence export — `client.gw.evidence`

for line in client.gw.evidence(since="30d", until=None, limit=None):
    print(line.type, line.prev)

evidence (GET /gw/evidence) streams the tamper-evident evidence export as parsed NDJSON GWEvidenceLines — a single streaming request (no transport retry, fitting for large exports). Parameters: since is the look-back lower bound (e.g. "30d", gateway default 30 days), until the upper bound (omitted = now), limit a global record cap (the checksum line reports truncation). On AsyncSociaro it is an async iterator.

Each line is discriminated on type:

"header" — export metadata; the first line.
"usage" — one billed usage record.
"audit" — one operator audit event (provisioning, budget changes, key revocations, …).
"checksum" — the final line; reports the record count and any truncation.

Every line carries a prev field: the lowercase hex SHA-256 of the exact bytes of the previous line (without its trailing newline), forming a verifiable hash chain (empty on the header line). Request bodies are never included in the export.

Reseller flow

Issue a scoped, sub-account-bound key and use it:

reseller = Sociaro(api_key="gw_live_...")   # the reseller's own gw key

ceiling = reseller.gw.ceiling()             # what a child key may be granted
sub = reseller.gw.subaccounts.create(name="Acme Corp", slug="acme-corp")

created = reseller.gw.keys.create(
    scopes=["inference:use"],
    entitlements=[{"provider": "openai", "model_pattern": "gpt-4o*", "effect": "allow"}],
    sub_account_id=sub.id,
)

# created.key is the PLAINTEXT secret — returned ONLY here. Store it now.
customer = Sociaro(api_key=created.key)
customer.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello from Acme!"}],
)

The /gw management types — GWMeResponse, GWCeilingResponse, GWKeyInfo, GWKeyCreated, GWSubAccount, GWBudget, GWBrandingConfig, GWStatsRow, GWUsageRow, GWAsyncTask, GWAsyncPage, GWEvidenceLine — are all Model subclasses with the wire shapes documented above (keys mirror the gateway’s JSON tags verbatim; timestamps are RFC 3339 strings).

See also the TypeScript SDK for the same surface in Node/edge runtimes. The package README ships with the source on GitHub, but this page is the authoritative reference.