AI Call Agent

API docs for AI agents Phase D

Outbound voice-call API designed for AI agents (Claude, GPT, Cursor, MCP, Zapier, custom). Two entrypoints: a structured Skill API (manifest + idempotent run + SSE events) and the legacy free-form POST /calls. Every endpoint below is live in production. No login required to read these docs.

Hand this off to another AI

The button above copies a single self-contained Markdown spec โ€” every endpoint, auth, request/response shape, idempotency contract, SSE event stream, request_id ask_user, error codes, agent recipes โ€” designed to be pasted into Claude.ai / ChatGPT / Cursor as context. After paste, your AI can call the API correctly first try.

Get an API key Top up credits OpenAPI JSON Open console
On this page:
Authentication Skill API (recommended) Event stream (SSE) ask_user + request_id Legacy /calls Account & quota Webhook channel Polling fallback Outcome enums Errors & status codes Operational specs Tips for agents

Authentication

Every authenticated request takes the API key in the X-API-Key header. Get a free key at the home page (100 free credits). One key works for everything; no per-endpoint scopes today.
Public (no auth): GET /skills, GET /skills/{skill_id}/manifest, GET /docs, GET /openapi.json.
Single exception: GET /calls/{id}/recording also accepts ?api_key= query param so HTML5 <audio> can fetch it.

curl https://eks.vox-bot.live/users/me \ -H "X-API-Key: pk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Skill API recommended

Strongly-typed wrapper around the voice pipeline. The agent fetches the manifest (input schema + capabilities), POSTs structured data to /skills/{skill_id}/run, and the server builds the brief + dispatches the call. Idempotency, SSE events, durable ask_user โ€” all part of this path.

GET /skills โ€” list available skills

curl https://eks.vox-bot.live/skills

Response (200):

[ { "skill_id": "restaurant-reservation", "title": "Restaurant Reservation", "version": "1.0.0", "manifest_url": "/skills/restaurant-reservation/manifest" } ]

GET /skills/{skill_id}/manifest โ€” input schema + capabilities

Public. Phase-level โ€” does NOT inspect any specific customer's webhook config. Per-customer validation happens at /run time.

{ "skill_id": "restaurant-reservation", "title": "Restaurant Reservation", "version": "1.0.0", "phase": "D", "input_schema": { /* JSON Schema, generated live from Pydantic */ }, "ask_user_modes_supported": ["webhook", "any", "stream"], "event_types": ["status_change", "ask_user", "answered", "outcome", "recording_ready"], "webhook_requirements": { "required": false, "metadata_field": "ask_user_webhook_url", "auth": "Standard Webhooks v1 HMAC-SHA256", "phase_note": "Stream mode is the recommended channel for terminal-only AI agents (Phase D)." }, "agent_docs_url": "/docs", "openapi_url": "/openapi.json", "run_url_template": "/skills/{skill_id}/run", "status_url_template": "/calls/{call_id}", "answer_url_template": "/calls/{call_id}/answer", "events_url_template": "/calls/{call_id}/events", "limits": { "party_size": {"min": 1, "max": 50}, "name_max_chars": 100, "special_requests_max_chars": 1000, "notes_max_chars": 1000, "phone_max_chars": 32 } }

POST /skills/{skill_id}/run โ€” start a call from structured input

curl -X POST https://eks.vox-bot.live/skills/restaurant-reservation/run \ -H "X-API-Key: pk_live_..." \ -H "Idempotency-Key: $(uuidgen)" \ -H "Content-Type: application/json" \ -d '{ "restaurant_phone": "+37441467478", "party_size": 2, "date": "2026-05-12", "time": "20:00", "name": "Frunze", "phone_to_dictate": "094467478", "language": "auto", "ask_user_mode": "stream", "notes": "card if asked" }'

Headers: X-API-Key (required), Idempotency-Key (optional 1โ€“255 chars; strongly recommended).

ask_user_mode:

ValueBehavior
"webhook"Customer MUST have metadata.ask_user_webhook_url configured, else 422 ask_user_channel_required.
"any"Use webhook if configured, else SSE-stream. Recommended default.
"stream"No webhook preflight. Agent keeps SSE open and answers via POST /answer. Phase D.

Defense-in-depth: If a webhook is configured on the customer's API key, it fires for ask_user events even when ask_user_mode="stream" โ€” both channels deliver the same event with the same request_id. Agents listening to both must dedupe by request_id. To suppress webhook delivery entirely, remove the webhook via DELETE /users/me/webhook.

Response (201):

{ "skill_run_id": "srun_", "call_id": "", "call_sid": "CA...", "owner_pod": "ai-call-agent-...", "status": "dialing", "credits_reserved": 200, "status_url": "/calls/", "answer_url": "/calls//answer", "recording_url": "/calls//recording", "replayed": false, "expected_next_steps": [ "Open SSE at /calls//events to watch live status + ask_user.", "POST to answer_url with {\"answer\":\"...\",\"request_id\":\"...\"} when ask_user fires.", "Poll status_url or watch SSE for terminal status." ] }

Call queue / 202 Accepted: when the call queue is enabled server-side (CALL_QUEUE_ENABLED), this endpoint queues the call instead of dialing immediately and returns 202 โ€” the same SkillRunResponse shape as the 201, just status:"queued" and call_sid:null. The call is dialed shortly after at the account-wide CPS limit. Treat any 2xx as success, capture call_id, then follow the SSE stream or poll /calls/{call_id} โ€” the status moves queued โ†’ dialing โ†’ in_progress โ†’ โ€ฆ.

SSE URL is not in the run response โ€” derive it from manifest.events_url_template (today: /calls/{call_id}/events).

Idempotency contract

Same Idempotency-Key + same body from same customer โ†’ server replays original response, no duplicate dial. Use a unique key per logical agent retry attempt (e.g. outer-loop UUID).
Storage: table skill_run_idempotency keyed on (customer_id, idempotency_key). Body fingerprinted with SHA-256 of canonical JSON. TTL 24 h; lock window 120 s.

Server stateHTTPdetail.errorNotes
First time, success201โ€”replayed: false
Same key + same body, prior call completed201โ€”replayed: true, identical payload
Same key, different body409idempotency_conflictDon't reuse keys across distinct intents
Prior request still in-flight (lock fresh)409idempotency_in_progressHas retry_after_seconds + Retry-After header. Wait, retry SAME body.
Prior worker crashed (lock expired โ‰ฅ120 s)409idempotency_state_unknownWe don't auto-rerun (would risk 2nd Twilio dial). Use GET /calls?since=... to verify.
Idempotency-Key length not in [1, 255]400idempotency_key_invalidHeader validation

No Idempotency-Key header โ†’ endpoint runs without replay guarantee. Network retries will produce duplicate calls. Don't ship that.

Event stream โ€” SSE /calls/{call_id}/events

Content-Type: text/event-stream. Auth: X-API-Key. Use curl -N or any standard SSE client.

Connection lifecycle

Event types

Every line is event: <type>\ndata: <json>\n\n. Each event includes id: <event_id> so SSE clients populate Last-Event-ID automatically.

EventPayloadSource
status_change{"status":"<queued|scheduled|dialing|in_progress|completed|failed|cancelled>","at":"<ISO>"}call status transitions
ask_user{"request_id":"req_<hex>","message":"<question>","at":"<ISO>","answer_url":"/calls/.../answer"}bot needs info
answered{"request_id":"req_<hex>","at":"<ISO>"}echo of accepted POST /answer
outcome{"outcome_type":"<...>","billable":true,"summary":"<text>","at":"<ISO>"}call analyzer result
recording_ready{"url":"/calls/.../recording","at":"<ISO>"}Twilio recording posted
heartbeat{}keepalive (not persisted)

Quick stream consumer

curl -N -H "X-API-Key: $KEY" \ https://eks.vox-bot.live/calls/<call_id>/events

ask_user โ€” answering bot questions

When the bot needs info during the call it emits ask_user to both the configured webhook and the SSE stream โ€” same request_id in both, dedupe by request_id.

POST /calls/{call_id}/answer โ€” submit answer

curl -X POST https://eks.vox-bot.live/calls/<call_id>/answer \ -H "X-API-Key: pk_live_..." \ -H "Content-Type: application/json" \ -d '{"answer":"yes, 2 people, 8pm","request_id":"req_abc1234567890def"}'

Body:

FieldTypeNotes
answerstringrequired, 1โ€“4000 chars
request_idstring?optional, โ‰ค64 chars. Recommended โ€” atomic UPDATE on durable row by PK; correctly disambiguates multiple in-flight ask_user.

Without request_id: FIFO fallback โ€” oldest pending row for this call_id (FOR UPDATE SKIP LOCKED). Safe when only one ask_user is open at a time. Compatible with legacy webhook clients.

Response: {"delivered": <bool>}. delivered=false means no pending row matched (already answered, expired, call ended, unknown id). Idempotent โ€” same request_id twice โ†’ second returns delivered=false.

Cross-pod guarantee

The pending_skill_ask_user row is written before the SSE/webhook event fires. POST /answer can land on any pod (Istio path-hash routes to call's owner pod when possible, but works without it): the broker UPDATEs the row and pg_notify('ask_user_answer', request_id) wakes the owner pod.
Timeout: server-side asyncio.wait_for per ask_user. If timeout hits before answer, row goes pending โ†’ timeout. Late answers return delivered=false.

Legacy /calls API (free-form brief)

Original entrypoint, unchanged. Use only when the structured Skill API doesn't fit your case (e.g. one-off cancellation, unusual ad-hoc task). For new agent integrations, prefer Skill API.

POST /calls โ€” start a call from free-form brief

curl -X POST https://eks.vox-bot.live/calls \ -H "X-API-Key: pk_live_..." \ -H "Content-Type: application/json" \ -d '{ "target_phone": "+37491234567", "brief": "Book a table for 2 tonight at 8pm, name Alex, callback +37491234567", "language": "ru" }'

Body: target_phone (E.164), brief (free text), language (ru/en/auto), kind (default OTHER), parsed_slots (optional).

Response (201): {call: {call_id, call_sid, status, ...}, task_id, credits_reserved, owner_pod}. Same call lifecycle, same SSE/answer endpoints work.

Call queue / 202 Accepted: when the call queue is enabled server-side (CALL_QUEUE_ENABLED), POST /calls queues the call instead of dialing immediately and returns 202 with the slim body {call_id, queue_id, position, status:"queued"} โ€” no status_url and no call_sid; build the status URL yourself as /calls/{call_id}. The call is dialed shortly after at the account-wide CPS limit. Treat any 2xx as success, capture call_id, then follow the SSE stream or poll /calls/{call_id} โ€” the status moves queued โ†’ dialing โ†’ in_progress โ†’ โ€ฆ.

GET /calls?limit=N โ€” list recent calls

Returns array of CallDTO. transcript_full and supervisor_decisions are null in list responses (kept on single-call GET only).

GET /calls/{call_id} โ€” single-call details

Full CallDTO:

{ "call_id": "", "customer_id": "...", "task_id": "...", "target_phone": "+...", "language": "ru", "status": "completed", "call_sid": "CA...", "started_at": "", "ended_at": "", "duration_sec": 124, "outcome_type": "success_booked", "outcome_summary": "Booked at 20:00 for 2 people, no deposit.", "outcome_charge_cents": 111, "created_at": "", "has_recording": true, "reservation_signals": {}, "transcript_full": [], "supervisor_decisions": [] }

Status enum: queued / scheduled / dialing / in_progress / completed / failed / cancelled. (queued appears only when the call queue is enabled โ€” the call is accepted but not yet dialed.)

POST /calls/{call_id}/cancel โ€” cancel a call

Updates DB status to cancelled and releases credit reservation. Does NOT terminate an already-active Twilio leg (Phase 3 limitation). Idempotent โ€” returns {cancelled: false} on already-final calls, never errors.

GET /calls/{call_id}/recording โ€” stream mp3

audio/mpeg. Both X-API-Key header AND ?api_key=KEY query param accepted (latter for HTML5 <audio>). 404 until Twilio posts the recording-status callback (typically a few seconds after outcome).

Account & quota

GET /users/me โ€” your profile + quota

{ "customer_id": "...", "customer_name": "...", "quota_limit": 1000, "current_usage": 222, "max_concurrent_calls": 1, "ask_user_webhook_configured": true, "ask_user_webhook_url": "https://your-service/...", "maintenance": {"enabled": false, "message": null, "resume_at": null}, "is_admin": false }

PUT /users/me/limits โ€” change concurrency cap

curl -X PUT https://eks.vox-bot.live/users/me/limits \ -H "X-API-Key: pk_live_..." \ -H "Content-Type: application/json" \ -d '{"max_concurrent_calls": 5}'

Default is 1 in-flight call per account. Raise as needed.

PUT /users/me/webhook โ€” register ask_user webhook

curl -X PUT https://eks.vox-bot.live/users/me/webhook \ -H "X-API-Key: pk_live_..." \ -H "Content-Type: application/json" \ -d '{"url":"https://your.app/voice-hooks","regenerate_secret":true}'

Returns the HMAC secret once โ€” store it. Standard Webhooks v1, SHA-256.

DELETE /users/me/webhook โ€” remove webhook

After removal, mid-call ask_user falls back to SSE-only delivery. If neither is configured, ask_user times out โ†’ outcome failed_technical.

ask_user webhook flow (alternative to SSE)

If your agent runs an HTTP server, configure a webhook to receive ask_user synchronously. Both webhook and SSE deliver the same event for the same request_id โ€” your agent must dedupe by request_id if it listens to both.

Webhook payload:

{ "type": "ask_user", "call_id": "", "request_id": "req_", "question": "How many people?", "answer_url": "/calls//answer", "deadline_at":"" }

Headers: webhook-signature: ... (Standard Webhooks v1, HMAC-SHA256). Verify before trusting body.

Sync reply (โ‰ค30 s): 200 with body {"answer": "..."}.

Async reply: 202 with empty body, then POST /calls/{id}/answer with {answer, request_id}.

Polling-only fallback (no webhook, no SSE)

If your AI agent has neither a webhook server nor an SSE client, you can still operate (without ask_user mid-call):

  1. POST /skills/{id}/run with Idempotency-Key.
  2. Poll GET /calls/{call_id} every 2-3 s.
  3. Stop when status โˆˆ completed/failed/cancelled.
  4. Read outcome_type, outcome_summary, transcript_full; optionally fetch recording_url.

ask_user mid-call without webhook OR SSE is not supported. Either configure one or include all needed info upfront in the input.

Outcome enums

When a call reaches status=completed, outcome_type is one of:

Errors & status codes

Structured errors use detail = {"error": "<code>", ...}. Codes are stable across versions.

HTTPdetail.errorWhenRetry?
400idempotency_key_invalidHeader length not in [1, 255]Fix key
400invalid_inputSkill-side derivation failedFix body
401โ€”Missing or invalid API keyFix key
402quota_exceededOut of creditsTop up /checkout
404skill_not_foundUnknown skill_id; payload includes available_skillsFix id
404Call not foundcall_id not yours or doesn't existFix id
404Recording not availableRecording not posted yetWait + retry
409concurrent_call_not_allowedPer-customer cap; payload has active_call_ids + max_concurrentWait or raise cap
409idempotency_conflictSame key, different bodyFix body or new key
409idempotency_in_progressPrior request running, lock fresh; payload has retry_after_secondsWait Retry-After, retry SAME body
409idempotency_state_unknownPrior worker crashedGET /calls?since=... to verify; do NOT retry blindly
422Pydantic errors listBody fails input_schemaFix body
422ask_user_channel_requiredask_user_mode="webhook" but no webhook URL on keyConfigure webhook
500โ€”Server config issueBackoff + retry
502โ€”Twilio dial failed; reservation releasedSafe to retry with NEW key
503maintenanceAdmin-set maintenance window; payload has resume_atWait until resume
503too_many_sse_streamsPer-pod SSE cap reachedWait Retry-After

Operational specs

SpecValue
Idempotency replay TTL24 h
Idempotency lock window120 s
Idempotency-Key length1โ€“255 chars
SSE heartbeat interval15 s
SSE max stream duration30 min
SSE post-outcome grace5 s
SSE per-pod stream cap200
SSE per-subscriber queue size100
request_id formatreq_<16-hex>
ask_user expirationtimeout + 5 s
Credits reserved at start200
Default per-customer concurrency1 in-flight call
Webhook signature schemeStandard Webhooks v1, SHA-256
Sync webhook deadline~30 s

Best practices for AI agents

  1. Prefer the Skill API over POST /calls for new integrations โ€” schema validation, idempotency, structured errors, dynamic manifest.
  2. Always send Idempotency-Key on POST /skills/{id}/run. One key per logical retry, NOT per network attempt. Without it, a connection blip will dial twice.
  3. Use SSE for terminal-mode agents. No HTTP server needed; just curl -N. Resume with Last-Event-ID after disconnect.
  4. Always pass request_id in POST /answer. Disambiguates multiple in-flight ask_user. The bot supports parallel ask_user since Phase C.
  5. Dedupe ask_user by request_id if you listen to BOTH webhook and SSE โ€” same event delivered twice.
  6. Pre-spell phone numbers if dictation accuracy matters. The skill formatter does this automatically for phone_to_dictate.
  7. Surface 402 / 503 to the user โ€” these need human action (top up, wait for maintenance).
  8. Don't poll faster than every 2 s. Be polite โ€” and SSE is cheaper than polling anyway.
  9. Trust /openapi.json over this page if anything disagrees. It's the byte-exact contract.