API endpoints

Base URL: https://api.cortexlayer.dev. All POST bodies are JSON; all responses are JSON or, for streaming endpoints, Server-Sent Events.

Authentication

Two credential types — pick by call site:

Credential	Header	Where it’s safe to use	Endpoints
API key (`ck_live_…`)	`Authorization: Bearer <key>`	Server-side only — never ship to a browser	All admin/CRUD endpoints; `/v1/widget/session` mint
Session token (`cs_…`)	`X-Cortex-Session: <token>`	Browser-side, scoped to one agent + one origin, 15 min TTL	`/v1/chat/completions` (widget path)

API keys are HMAC-SHA256 hashed at rest with a server-side pepper; the prefix is indexed for fast lookup, the secret is constant-time compared. Session tokens are opaque — they live in Redis and are revocable by deleting the entry.

Agents

`POST /v1/agents`

Create an agent. Auth: API key.

{
  "name": "Support bot",
  "system_prompt": "You are a friendly support agent for ACME Inc.",
  "provider": "gemini",
  "model": "gemini-2.0-flash-exp",
  "fallback_provider": "openai",          // optional
  "temperature": 0.7,                     // optional
  "max_tokens": 2048,                     // optional
  "allowed_origins": ["https://your-site.com"],
  "allowed_tool_domains": [],             // for fetch_url, opt-in only
  "tools": []                             // built-ins: search_kb, fetch_url
}

Returns the created agent including agent_id (agt_…).

`PATCH /v1/agents/:id`

Partial update. Same body shape, all fields optional.

`POST /v1/widget/session`

Mint a short-lived browser-safe token bound to one agent + the requesting origin. Auth: API key.

{ "agent_id": "agt_abc123..." }

The server reads the Origin header and validates it against the agent’s allowed_origins. Returns:

{
  "session_token": "cs_...",
  "expires_at": "2026-04-21T15:30:00Z",
  "budget": { "messages_remaining": 50, "tokens_remaining": 100000 }
}

Chat

`POST /v1/chat/completions`

Streaming chat. Auth: session token (widget path) or API key (server path).

{
  "agent_id": "agt_abc123...",
  "messages": [{ "role": "user", "content": "Hi" }],
  "conversation_id": "conv_…",   // optional; server creates one if omitted
  "stream": true                  // default true
}

Response is text/event-stream. Frame types:

`type`	Payload	Notes
`start`	`requestId`, `runId`, `provider`, `model`, `conversationId`	First frame.
`delta`	`text`	Append to the current assistant bubble.
`tool_call`	`name`, `args`	Tool runtime is about to execute.
`tool_result`	`name`, `output`	Wrapped in `<tool_result>…</tool_result>`.
`error`	`code`, `message`	Recoverable; the run is over.
`done`	`usage`, `finish_reason`	Last frame.

Errors that prevent the stream from starting are returned as JSON with the standard envelope.

Rate limits

Limits stack — the strictest one wins.

Scope	Limit
Per-IP (CDN)	100 req/min
Per-API-key	60 req/min sliding window
Per-tenant	10 simultaneous streams
Per-tenant/day	$2 soft (warn header) / $5 hard (429)
Per-IP (widget)	20 msg/min
Per-session	5 msg/min, 50 messages, 100K tokens

A 429 response carries Retry-After (seconds) and an envelope:

{ "error": { "code": "rate_limit_exceeded", "message": "...", "scope": "per_session" } }

Errors

All error responses share one envelope:

{
  "error": {
    "code": "agent_not_found",  // stable machine-readable code
    "message": "...",            // human-readable; do not parse
    "request_id": "req_..."     // include in support tickets
  }
}

Codes are stable across versions; messages are not.

API endpoints

Authentication

Agents

POST /v1/agents

PATCH /v1/agents/:id

Widget sessions

POST /v1/widget/session