API Reference
The agent runtime exposes three HTTP API surfaces: the A2A protocol
for agent-to-agent communication, an MCP server for tool/prompt discovery
by Claude Desktop and other MCP hosts, and a lightweight REST management API.
All surfaces are served by a single Go binary on a configurable port (default :8080).
Authentication
When agent.security.require_auth = true, all A2A endpoints require a
Bearer token in the Authorization header:
Authorization: Bearer <token>
Task ownership is enforced — clients can only read or cancel tasks they created.
Requests for another client's task return 404 (not 403) to avoid
leaking task existence. The REST management API and MCP server are not covered by bearer auth;
they should be placed behind a network perimeter in production.
A2A — Agent Discovery
Google A2A ProtocolThe Agent Card describes the agent's capabilities, supported A2A version, skills, and authentication requirements. Any A2A-compatible client fetches this before initiating tasks.
Return the AgentCard for capability discovery.
Response
{
"name": "companies-researcher",
"description": "Research companies using the knowledge graph",
"version": "1.0.0",
"url": "https://my-agent.example.com",
"capabilities": {
"streaming": true,
"pushNotifications": false,
"stateTransitionHistory": true
},
"defaultInputModes": ["text/plain"],
"defaultOutputModes": ["text/plain"],
"skills": [
{ "id": "graph-search", "name": "Graph Search", "description": "..." }
],
"authentication": { "schemes": ["bearer"] }
}
A2A — Tasks
Submit tasks and poll their lifecycle state. Tasks progress through:
submitted → working → completed | failed | canceled.
Submit a task and get a synchronous response (Task or Message). For streaming, use /a2a/stream.
Request body (JSON-RPC)
{
"jsonrpc": "2.0",
"id": "1",
"method": "message/send",
"params": {
"message": {
"role": "user",
"parts": [{ "type": "text", "text": "Top 5 companies by revenue?" }]
}
}
}
Response
{
"jsonrpc": "2.0",
"id": "1",
"result": {
"id": "task_01j...",
"status": { "state": "completed" },
"artifacts": [
{ "parts": [{ "type": "text", "text": "1. Apple — $394B ..." }] }
]
}
}
Submit a task and receive a Server-Sent Events stream of TaskStatusUpdateEvent and TaskArtifactUpdateEvent messages. Connection closes when the task reaches a terminal state.
Request body
Same JSON-RPC envelope as POST /a2a with method "message/stream"
SSE event stream
data: {"type":"TaskStatusUpdateEvent","taskId":"task_01j...","status":{"state":"working"}}
data: {"type":"TaskArtifactUpdateEvent","taskId":"task_01j...","artifact":{"parts":[{"type":"text","text":"1. Apple"}]}}
data: {"type":"TaskStatusUpdateEvent","taskId":"task_01j...","status":{"state":"completed"},"final":true}
Poll the current state of a task by ID. Returns the full Task object including all artifacts.
Path parameters
id string Task ID returned from POST /a2a
Response
{ "id": "task_01j...", "status": { "state": "completed" }, "artifacts": [...] }
List tasks for the authenticated client. Cursor-based pagination.
Query parameters
cursor string Pagination cursor from previous response limit int Max results per page (default 20, max 100)
Response
{ "tasks": [...], "nextCursor": "eyJ..." }
Cancel a running task. The task transitions to canceled state; any in-flight model call is interrupted via context cancellation.
Response
{ "id": "task_01j...", "status": { "state": "canceled" } }
Agent-to-Agent (A2A) Orchestration
Multi-agent patterns
Agents can delegate tasks to other agents over HTTP using the same A2A protocol. An
orchestrator agent submits tasks to a specialist agent, streams
progress events, and optionally gates irreversible actions behind human approval —
all without any out-of-band coordination.
Specialist agents are discovered via
/.well-known/agent.json.
Multi-agent topology
User / ai run
│
│ POST /a2a
▼
┌──────────────────┐ POST /a2a ┌──────────────────┐
│ Analyst Agent │──────────────────►│ Researcher Agent │
│ :8080 │◄──────────────────│ :8082 │
│ (orchestrator) │ GET /a2a/{id}/ │ (specialist) │
│ │ stream (SSE) │ │
└──────────────────┘ └──────────────────┘
│ │
│ POST /a2a/{id}/approve │ mcp-toolbox :15000
│ (human approval gate) │
▼ ▼
Human Operator Neo4j Graph DB
Task lifecycle
POST /a2a │ ▼ submitted Task accepted, queued for execution │ ▼ working Agent is executing; streaming progress events │ ├──► awaiting-approval Human approval required before continuing │ │ │ POST /a2a/{id}/approve (approved or rejected) │ │ │ ◄──────┘ │ ├──► completed Final answer available in artifacts ├──► failed Unrecoverable error; see error field └──► canceled Canceled via DELETE /a2a/tasks/{id}
The awaiting-approval state pauses execution until the operator
posts an approval decision. The agent resumes from the same point — no work is lost.
Subscribe to the real-time event stream for a specific task. Returns a Server-Sent
Events stream that remains open until the task reaches a terminal state
(completed, failed, or canceled).
Used by ai run --stream and the a2a_delegate built-in tool.
SSE event types
event: progress
data: {"type":"progress","message":"Querying companies graph for subsidiaries..."}
event: approval_required
data: {"type":"approval_required","message":"About to publish investment memo to Confluence. Approve?"}
event: completed
data: {"type":"completed","result":"## Investment Memo\n\nAlphabet Inc. analysis..."}
event: failed
data: {"type":"failed","error":"Tool execution error: connection refused"}
Event type reference
progress Intermediate status update; print and continue streaming approval_required Execution paused; operator must POST /a2a/{id}/approve completed Final result in result field; stream closes failed Terminal error in error field; stream closes
Example — stream with curl
curl -N -H "Authorization: Bearer $TOKEN" \
https://my-agent.fly.dev/a2a/task_01j.../stream
Resume a task that is in the awaiting-approval state. Sending
"approved": true continues execution; false cancels the
pending action and lets the agent decide how to proceed (it may retry, skip, or fail).
Request body
{
"approved": true,
"message": "Looks good, proceed"
}
Response (task resumes)
{ "id": "task_01j...", "status": { "state": "working" } }
Response (rejection)
{ "id": "task_01j...", "status": { "state": "working" }, "note": "Approval rejected — agent notified" }
Error — task not awaiting approval
HTTP 409
{ "error": "task is not in awaiting-approval state" }
Agent discovery for orchestrators
Before delegating, an orchestrator fetches GET /.well-known/agent.json from the target agent to verify its capabilities and authentication scheme. The a2a_delegate built-in tool does this automatically. Example — check if a specialist supports streaming before delegating: GET https://researcher.example.com/.well-known/agent.json → { "capabilities": { "streaming": true }, "authentication": { "schemes": ["bearer"] } } Set RESEARCHER_ENDPOINT and RESEARCHER_TOKEN env vars in the orchestrator's agent.toml so a2a_delegate can target the specialist at runtime. See Demo 04 — Multi-Agent Pipeline for a full walkthrough.
MCP Server — Tools
MCP 2025-11-25 spec
When agent.mcp_server.enabled = true, the agent exposes itself as an MCP server
at port 8081 (HTTP/SSE) or via stdio. MCP hosts (Claude Desktop, Cursor, etc.)
discover tools and invoke them through the standard MCP JSON-RPC protocol.
Return all tools available to this agent (from toolbox + registered tools).
Response
{
"tools": [
{
"name": "company_lookup",
"description": "Look up a company by name or ID",
"inputSchema": { "type": "object", "properties": { "name": { "type": "string" } } }
},
...
]
}
Invoke a tool by name. The agent executes the tool and returns the result.
Request
{ "name": "company_lookup", "arguments": { "name": "Apple" } }
Response
{ "content": [{ "type": "text", "text": "{\"name\":\"Apple Inc.\",\"revenue\":\"$394B\"}" }] }
MCP Server — Prompts (Skills)
Skills are exposed as MCP Prompts, enabling MCP hosts to discover and inject skill context into their own conversations.
List all skills registered with this agent as MCP prompts.
Response
{
"prompts": [
{ "name": "graph-search", "description": "Search the knowledge graph...", "arguments": [] },
{ "name": "memory-recall", "description": "Recall entities from prior sessions", "arguments": [] }
]
}
Retrieve the system prompt fragment for a skill. The MCP host can inject this into its own context.
Request
{ "name": "graph-search" }
Response
{
"messages": [
{
"role": "user",
"content": "When searching for companies:\n1. Use company_lookup for direct lookups..."
}
]
}
Claude Desktop Integration
MCP host config
Claude Desktop (and any MCP-compatible host) can connect to a running ai serve
instance as an MCP server. Tools defined in your toolbox.yaml are exposed directly
inside Claude conversations — no CLI required for end users.
Config file locations
macOS ~/Library/Application Support/Claude/claude_desktop_config.json Windows %APPDATA%\Claude\claude_desktop_config.json
Local agent (no auth)
{
"mcpServers": {
"my-agent": {
"url": "http://localhost:8081/mcp/sse"
}
}
}
Remote / deployed agent (with Bearer token)
{
"mcpServers": {
"my-agent": {
"url": "https://my-agent.fly.dev/mcp/sse",
"headers": {
"Authorization": "Bearer <your-bearer-token>"
}
}
}
}
Multiple servers can be listed under mcpServers — each key becomes
a named tool source in Claude Desktop.
Verify the connection
1. Save the config file and restart Claude Desktop (Quit + relaunch). 2. In any conversation, click the 🔧 tools icon in the input bar. 3. Your server name (e.g. my-agent) should appear with a green indicator. 4. Start a conversation — Claude will call your tools automatically when relevant.
Notes
• The MCP server runs on port 8081 by default (separate from the A2A port 8080). Set agent.mcp_server.port in agent.toml to change it. • Bearer token is set via agent.security.bearer_token in agent.toml. • For deployed agents, ai deploy fly prints the /mcp/sse URL after deployment. • See Demo 05 — Deploy & Claude Desktop for a full walkthrough.
REST Management API
Internal / local use
Lightweight management endpoints for tooling and health checks. Served on the same port as A2A.
The OpenAPI spec for this surface is available at /openapi.json.
List all configured agents and their current status.
Response
{ "agents": [{ "id": "companies-researcher", "status": "running", "activeSessions": 3 }] }
Hot-reload agent config without restarting the server. Skills and tools are re-resolved; active sessions drain gracefully.
Request body
{ "configPath": "./agent.toml" }
Response
{ "id": "companies-researcher", "status": "reloaded" }
Liveness and readiness probe. Returns 200 when the server is up and all sidecars are healthy. Returns 503 if any required sidecar is down.
Response (healthy)
{
"status": "ok",
"sidecars": {
"genai-toolbox": "running",
"cypher-mcp": "running",
"graphrag": "running"
}
}
Response (degraded)
HTTP 503
{ "status": "degraded", "sidecars": { "graphrag": "stopped" } }
OpenAPI 3.0 specification for the REST management API. Suitable for import into Salesforce External Services or any OAS-compatible tool.
Response
OpenAPI 3.0 JSON document describing all /api/* endpoints