API endpoints
Base URL: https://api.cortexlayer.dev. All POST bodies are JSON; all responses are JSON or, for streaming endpoints, Server-Sent Events.
Authentication
Two credential types — pick by call site:
| Credential | Header | Where it’s safe to use | Endpoints |
|---|---|---|---|
API key (ck_live_…) | Authorization: Bearer <key> | Server-side only — never ship to a browser | All admin/CRUD endpoints; /v1/widget/session mint |
Session token (cs_…) | X-Cortex-Session: <token> | Browser-side, scoped to one agent + one origin, 15 min TTL | /v1/chat/completions (widget path) |
API keys are HMAC-SHA256 hashed at rest with a server-side pepper; the prefix is indexed for fast lookup, the secret is constant-time compared. Session tokens are opaque — they live in Redis and are revocable by deleting the entry.
Agents
POST /v1/agents
Create an agent. Auth: API key.
{ "name": "Support bot", "system_prompt": "You are a friendly support agent for ACME Inc.", "provider": "gemini", "model": "gemini-2.0-flash-exp", "fallback_provider": "openai", // optional "temperature": 0.7, // optional "max_tokens": 2048, // optional "allowed_origins": ["https://your-site.com"], "allowed_tool_domains": [], // for fetch_url, opt-in only "tools": [] // built-ins: search_kb, fetch_url}Returns the created agent including agent_id (agt_…).
PATCH /v1/agents/:id
Partial update. Same body shape, all fields optional.
Widget sessions
POST /v1/widget/session
Mint a short-lived browser-safe token bound to one agent + the requesting origin. Auth: API key.
{ "agent_id": "agt_abc123..." }The server reads the Origin header and validates it against the agent’s allowed_origins. Returns:
{ "session_token": "cs_...", "expires_at": "2026-04-21T15:30:00Z", "budget": { "messages_remaining": 50, "tokens_remaining": 100000 }}Chat
POST /v1/chat/completions
Streaming chat. Auth: session token (widget path) or API key (server path).
{ "agent_id": "agt_abc123...", "messages": [{ "role": "user", "content": "Hi" }], "conversation_id": "conv_…", // optional; server creates one if omitted "stream": true // default true}Response is text/event-stream. Frame types:
type | Payload | Notes |
|---|---|---|
start | requestId, runId, provider, model, conversationId | First frame. |
delta | text | Append to the current assistant bubble. |
tool_call | name, args | Tool runtime is about to execute. |
tool_result | name, output | Wrapped in <tool_result>…</tool_result>. |
error | code, message | Recoverable; the run is over. |
done | usage, finish_reason | Last frame. |
Errors that prevent the stream from starting are returned as JSON with the standard envelope.
Rate limits
Limits stack — the strictest one wins.
| Scope | Limit |
|---|---|
| Per-IP (CDN) | 100 req/min |
| Per-API-key | 60 req/min sliding window |
| Per-tenant | 10 simultaneous streams |
| Per-tenant/day | $2 soft (warn header) / $5 hard (429) |
| Per-IP (widget) | 20 msg/min |
| Per-session | 5 msg/min, 50 messages, 100K tokens |
A 429 response carries Retry-After (seconds) and an envelope:
{ "error": { "code": "rate_limit_exceeded", "message": "...", "scope": "per_session" } }Errors
All error responses share one envelope:
{ "error": { "code": "agent_not_found", // stable machine-readable code "message": "...", // human-readable; do not parse "request_id": "req_..." // include in support tickets }}Codes are stable across versions; messages are not.