Architecture
agent-intelligence is a Go-first agentic platform: a single binary CLI + runtime server, with optional Python sidecar services for graph-heavy workloads. Agents communicate over A2A MCP REST and connect to graph backends locally (Kuzu) or in the cloud (Neo4j Aura).
System Overview
Particle streams show live message flow — green = requests, blue = responses, amber = model API calls.
┌──────────────────────────────────────────────────────────────────┐ │ EXTERNAL LAYER │ │ ┌───────────────────┐ ┌──────────────────┐ ┌─────────────┐ │ │ │ai (CLI) │ │IDE / Desktop │ │REST / curl │ │ │ │run serve graph │ │MCP stdio client │ │A2A peers │ │ │ └───────────────────┘ └──────────────────┘ └─────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────┘ │ HTTP · MCP stdio · A2A JSON-RPC │ ▼ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ AGENT RUNTIME Go · net/http :8080 / :8081 │ │ │ │ ┌────────────────────────────────────────────────┐ │ │ │A2A:8080·MCP:8081(StreamableHTTP+SSE)│ │ │ │REST/api/*·WebUI:8888│ │ │ └────────────────────────────────────────────────┘ │ │ │ │ │ [result] │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────┐ │ │ │AgentLoop │ │ │ │contextwindow·tokenbudget·multi-turn│ │ │ └──────────────────────────────────────────────┘ │ │ │ │ │ │ MCP tool calls model calls │ │ │ │ │ │ ▼ ▼ │ │ ┌───────────────────────────────────────────┐ ┌────────────────────────────────────────────────┐ │ │ │MCPClient││ModelRouter │ │ │ │modelcontextprotocol/go-sdk││AnthropicSDK·OpenAI-compat││ │ └───────────────────────────────────────────┘ └────────────────────────────────────────────────┘ │ │ ││ │ │ └────────│─────────────────────────│─────────────────────────────────────────│──────────────────────────┘ │ │ │ MCP stdio · HTTP MCP stdio · HTTP HTTPS streaming │ │ │ ▼ ▼ ▼ ┌──────────────────────────┬───────────────────────────┐ ┌──────────────────────────┐ │mcp-toolbox │mcp-toolbox │ │Anthropic · OpenAI │ │:15000 Go │:15001 Go │ │claude-opus-4-6 · gpt-5 │ └──────────────────────────┴───────────────────────────┘ └──────────────────────────┘ │ │ Neo4j Bolt Neo4j Bolt │ │ ▼ ▼ ┌──────────────────────────────────────────────────────────────┐ │ GRAPH BACKEND │ │ │ │ ┌──────────────────────────┐┌──────────────────────────┐ │ │ │Kuzu ││Neo4j Aura │ │ │ │local · embedded · CGO ││cloud · Bolt · managed │ │ │ └──────────────────────────┘└──────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘
Agent Runtime
The runtime is a single Go binary. One goroutine per session — 500+ concurrent sessions per instance. The agent loop manages context window, token budget, and multi-turn tool use.
A2A Server :8080
JSON-RPC 2.0 over HTTP + SSE streaming. Accepts tasks from CLI, other agents, and REST clients.
Exposes /.well-known/agent.json for agent discovery.
MCP Server :8081
Exposes agent_run, agent_list as MCP tools. Skills registered as MCP Prompts.
Dual-transport: Streamable HTTP (POST /mcp, MCP spec 2025-03-26) and legacy SSE (GET /sse).
Also supports stdio. Used by Claude Desktop, Cursor, etc.
Agent Loop
Core reasoning loop: receives task → assembles context → calls model → dispatches tool calls → injects results → repeats until done. Manages token budget with warn / compact / abort thresholds.
Model Router
Routes to Anthropic SDK (direct, preferred — preserves stop_reason) or OpenAI-compat
endpoint. Supports fallback chains: primary model → cheaper fallback → local model.
MCP Client
Connects outbound to MCP tool servers. Uses modelcontextprotocol/go-sdk v1.0.0.
Manages tool server subprocesses (mcp-toolbox instances). Per-session tool filtering via middleware.
Code Sandboxes
Tier 1: QuickJS→WASM via wazero (<5 ms cold start, all platforms). Tier 2: CPython→WASM for stdlib. Tier 3: Firecracker microVM for packages + shell (Linux/KVM only).
Local Mode
ai run agent.toml — single binary, zero infra. Kuzu embedded graph, sidecars launched on demand.
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ YOUR MACHINE │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ ai binary │ │ │ │<25MB·<200ms cold start · CGO_ENABLED=0 │ │ │ │ ┌─────────────────────────────┐ │ │ │ │ │Agent Runtime │ │ │ │ │ │MCP Client + Model Router │ │ │ │ │ └─────────────────────────────┘ │ │ │ │ │ │ │ │ │ │┌─────────────────┘ │ │ │ │ ││ │ │ │ │ ││ subprocess spawn │ │ │ ││ │ │ │ │ ││ ▼ │ │ │ ││ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ ││ │mcp-toolbox :15000 · mcp-toolbox :15001 │ │ │ │ ││ │Graph Build :8090 · GraphRAG :8091 · Memory :8092 · Eval Bridge :8093 │ │ │ │ ││ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ ││ │ │ │ │ ││ Bolt │ │ │ ││ │ │ │ │ │HTTPS ▼ │ │ │ ││ ┌───────────────────────────────────────────┐ │ │ │ ││ │Kuzu │ │ │ │ ││ │embedded · CGO · local .kuzu/ directory │ │ │ │ ││ └───────────────────────────────────────────┘ │ │ │ ││ │ │ │ └────│──────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ └─────────│─────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────┐ │Anthropic / OpenAI APIs │ │HTTPS outbound │ └─────────────────────────┘
Cloud Mode
ai deploy — packages the runtime into a Docker image (<50 MB) and deploys to Fly.io.
Neo4j Aura replaces Kuzu. Python sidecars run as companion containers.
┌───────────────────────────────┐ │CLOUDFLARE │ │DNS · CDN · TLS termination │ └───────────────────────────────┘ │ HTTPS │ ▼ ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ FLY.IO MACHINE │ │per-region · shared-cpu-1x · 256 MB RAM │ │ ┌────────────────────────────────────────────────────┐ ┌──────────────────────────────────────────────┐ │ │ │airuntimecontainer │ │Python sidecar (optional) │ │ │ │A2A:8080·MCP:8081·Web:8888·REST/api/*││GraphRAG:8091·Memory:8092 · Eval :8093 │ │ │ └────────────────────────────────────────────────────┘ └──────────────────────────────────────────────┘ │ │ │ │ │ │ └──────────────────────────│────────────────────────────────│──────────────────────────────────────│───────────┘ │ │ │ HTTPS streaming Neo4j Bolt+s │ │ │ │ │ │ ┌───Neo4j─Bolt+s────┘ │ │ │ ▼ ▼ ▼ ┌────────────────────────────────────────────┐ ┌───────────────────────────────────────┐ │Anthropic / OpenAI APIs │ │Neo4j Aura │ │claude-opus-4-6 │ │managed · multi-region │ └────────────────────────────────────────────┘ └───────────────────────────────────────┘
CLI Command Flows
How each command routes through the system.
Protocols
A2A — Agent-to-Agent
JSON-RPC 2.0 over HTTP + SSE streaming. Agents discover each other via
/.well-known/agent.json. Task lifecycle: submit → working → done / failed.
Supports multi-turn delegation between agents.
MCP — Model Context Protocol
Bidirectional: the runtime is both an MCP client (calls tool servers)
and an MCP server (exposes agent capabilities). Dual-transport server:
Streamable HTTP (POST /mcp) and legacy SSE (GET /sse) on :8081.
Also supports stdio for local IDE integrations.
REST Management API
GET /api/agents, PUT /api/agents/:id, GET /api/health,
GET /openapi.json. Used by the Web UI and external orchestrators.
OTel spans emitted for every request.
Neo4j Bolt
Cypher over Bolt protocol to Kuzu (local) or Neo4j Aura (neo4j+s://).
Managed by mcp-toolbox (kind: neo4j, native Bolt support) and Python sidecar services.
Python Sidecars
Optional heavyweight services managed by the Go CLI via subprocess lifecycle.
Each sidecar exposes a local HTTP API; the Go runtime polls /health
every 250 ms and sends SIGTERM on shutdown (5 s deadline, then SIGKILL).
┌──────────────────────────────────┐ │Go Runtime │ │spawn · health-check · SIGTERM │ └──────────────────────────────────┘ │ │ │ │ ┌──────────────────────HTTP──────────────────────┘ │ │ └───────────────────HTTP────────────────────┐ │ ││ │ │ │└─────HTTP─────┐ │ │ ││ │ │ HTTP │ │ │ ││ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────┐ ┌───────────────────────────────────┐ ┌───────────────────────────────┐ ┌────────────────────────────────────┐ │Graph Build :8090 │ │GraphRAG :8091 │ │Memory :8092 │ │Eval Bridge :8093 │ │llm-graph-builder │ │neo4j-graphrag │ │agent-memory │ │Opik / Arize │ └─────────────────────────────────────────┘ └───────────────────────────────────┘ └───────────────────────────────┘ └────────────────────────────────────┘ │ ││ │ Neo4j Bolt │ └─────────────Neo4j Bolt──────────────┐ │ ┌───────────Neo4j Bolt───────────┘ │ │ │ ▼ ▼ ▼ ┌──────────────────┐ │Kuzu / Neo4j Aura │ │ │ └──────────────────┘
Sidecar Architecture
Port assignments for all managed services, the MCP tool registration flow, and
the ai sidecar status output format.
See Python Sidecars above for the topology diagram.
Port assignments
| Service | Port | Runtime | MCP tools registered | Start command |
|---|---|---|---|---|
| mcp-toolbox | :15000 | Go binary | Cypher query tools (defined in toolbox.yaml) |
ai serve (auto) |
| mcp-toolbox (Neo4j) | :15001 | Go binary | Neo4j-native tools (kind: neo4j) |
ai sidecar mcp-toolbox --config toolbox.yaml |
| graphrag | :8091 | Python / uvicorn | graphrag_search |
ai sidecar start graphrag |
| agent-memory | :8092 | Python / uvicorn | memory_store, memory_recall |
ai sidecar start memory |
| graph-construction | :8090 | Python / uvicorn | ingest_documents, get_job_status |
ai sidecar start graph-construction |
| eval-bridge | :8093 | Python / uvicorn | (REST only — no MCP tools) | ai sidecar start eval |
Tool registration flow
When ai serve starts, each sidecar follows this lifecycle:
- Spawn subprocess —
exec.CommandContextstarts the sidecar binary (toolbox servefor mcp-toolbox;uvicornfor Python sidecars). - Health-check loop — Agent polls
GET /healthevery 250 ms until 200 OK (5 s timeout before SIGKILL on failure). - MCP connect — MCP client connects to
http://localhost:{port}/mcp/sse(legacy SSE) orPOST http://localhost:{port}/mcp(Streamable HTTP, MCP spec 2025-03-26). - ListTools — Agent calls
ListToolson the MCP session; receives all tool definitions in one JSON-RPC response. - Tool set refresh — Tools are merged into the active tool set;
the model sees them on the next
/messagescall. - Live refresh on reconnect — If a sidecar restarts, the MCP client
reconnects and calls
ListToolsagain, updating the tool set mid-run. - Graceful shutdown — On
ai serveexit, SIGTERM is sent; process has 5 s to flush state before SIGKILL.
ai sidecar status
Run ai sidecar status at any time to inspect all managed services:
Sidecar status (YYYY-MM-DD HH:MM:SS) SERVICE PORT STATUS UPTIME LAST_CHECK mcp-toolbox 15000 ✓ up 43m 12s 200 OK mcp-toolbox-neo4j 15001 ✓ up 43m 11s 200 OK graphrag 8091 ✓ up 43m 08s 200 OK agent-memory 8092 ✓ up 43m 05s 200 OK graph-construction 8090 – down – not started eval-bridge 8093 – down – not started Tools available: 14 (5 Cypher + 4 generated + 3 graphrag + memory_store + memory_recall)
✓ up — sidecar healthy; MCP connection active and tools registered.
– down — sidecar not running; tools from this sidecar are unavailable.
not started — process was never launched in this ai serve session.
Key ADRs
ADR-001 — Go framework
Custom assembly chosen over google/adk-go. Rationale: adk-go is immature;
custom MCP + A2A gives full control with no hidden abstractions.
ADR-003 — Anthropic SDK direct
Use anthropic-sdk-go directly, not OpenAI-compat shim.
Shim loses stop_reason precision needed for reliable tool-use detection.
ADR-005 — Split MCP libraries
modelcontextprotocol/go-sdk v1.0.0 for the MCP client (stable API).
mark3labs/mcp-go v0.45 for the MCP server (more mature server API).
ADR-006 — Dual-role MCP
Runtime is simultaneously MCP client + server in one process. Per-session tool authorization via middleware (ToolFilterFunc is static — no context).
ADR-004 — mcp-toolbox for Neo4j (Superseded)
mcp-toolbox supports Neo4j natively (kind: neo4j, 48 source kinds total).
Original CypherMCP plan superseded — no custom server needed. Runs on :15000 / :15001.
Language strategy
Go for all runtime, CLI, and tool servers. Python for graph-heavy workloads (GraphRAG, graph construction, memory) — run as managed HTTP sidecar processes.