Skip to content

API endpoints

Base URL: https://api.cortexlayer.dev. All POST bodies are JSON; all responses are JSON or, for streaming endpoints, Server-Sent Events.

Authentication

Two credential types — pick by call site:

CredentialHeaderWhere it’s safe to useEndpoints
API key (ck_live_…)Authorization: Bearer <key>Server-side only — never ship to a browserAll admin/CRUD endpoints; /v1/widget/session mint
Session token (cs_…)X-Cortex-Session: <token>Browser-side, scoped to one agent + one origin, 15 min TTL/v1/chat/completions (widget path)

API keys are HMAC-SHA256 hashed at rest with a server-side pepper; the prefix is indexed for fast lookup, the secret is constant-time compared. Session tokens are opaque — they live in Redis and are revocable by deleting the entry.

Agents

POST /v1/agents

Create an agent. Auth: API key.

{
"name": "Support bot",
"system_prompt": "You are a friendly support agent for ACME Inc.",
"provider": "gemini",
"model": "gemini-2.0-flash-exp",
"fallback_provider": "openai", // optional
"temperature": 0.7, // optional
"max_tokens": 2048, // optional
"allowed_origins": ["https://your-site.com"],
"allowed_tool_domains": [], // for fetch_url, opt-in only
"tools": [] // built-ins: search_kb, fetch_url
}

Returns the created agent including agent_id (agt_…).

PATCH /v1/agents/:id

Partial update. Same body shape, all fields optional.

Widget sessions

POST /v1/widget/session

Mint a short-lived browser-safe token bound to one agent + the requesting origin. Auth: API key.

{ "agent_id": "agt_abc123..." }

The server reads the Origin header and validates it against the agent’s allowed_origins. Returns:

{
"session_token": "cs_...",
"expires_at": "2026-04-21T15:30:00Z",
"budget": { "messages_remaining": 50, "tokens_remaining": 100000 }
}

Chat

POST /v1/chat/completions

Streaming chat. Auth: session token (widget path) or API key (server path).

{
"agent_id": "agt_abc123...",
"messages": [{ "role": "user", "content": "Hi" }],
"conversation_id": "conv_…", // optional; server creates one if omitted
"stream": true // default true
}

Response is text/event-stream. Frame types:

typePayloadNotes
startrequestId, runId, provider, model, conversationIdFirst frame.
deltatextAppend to the current assistant bubble.
tool_callname, argsTool runtime is about to execute.
tool_resultname, outputWrapped in <tool_result>…</tool_result>.
errorcode, messageRecoverable; the run is over.
doneusage, finish_reasonLast frame.

Errors that prevent the stream from starting are returned as JSON with the standard envelope.

Rate limits

Limits stack — the strictest one wins.

ScopeLimit
Per-IP (CDN)100 req/min
Per-API-key60 req/min sliding window
Per-tenant10 simultaneous streams
Per-tenant/day$2 soft (warn header) / $5 hard (429)
Per-IP (widget)20 msg/min
Per-session5 msg/min, 50 messages, 100K tokens

A 429 response carries Retry-After (seconds) and an envelope:

{ "error": { "code": "rate_limit_exceeded", "message": "...", "scope": "per_session" } }

Errors

All error responses share one envelope:

{
"error": {
"code": "agent_not_found", // stable machine-readable code
"message": "...", // human-readable; do not parse
"request_id": "req_..." // include in support tickets
}
}

Codes are stable across versions; messages are not.