Architecture

agent-intelligence is a Go-first agentic platform: a single binary CLI + runtime server, with optional Python sidecar services for graph-heavy workloads. Agents communicate over A2A MCP REST and connect to graph backends locally (Kuzu) or in the cloud (Neo4j Aura).

System Overview

Particle streams show live message flow — green = requests, blue = responses, amber = model API calls.

                                                                        
                                                 EXTERNAL LAYER                                                                           
                                                                                  
     ai  (CLI)            IDE / Desktop       REST / curl                                                                           
     run  serve  graph    MCP stdio client    A2A peers                                                                             
                                                                                  
                                                                                                                                          
                                                                        
                                                                                                                                           
                                          HTTP ·  MCP stdio ·  A2A JSON-RPC                                                                 
                                                                                                                                           
                                                                                                                                           
                                   
                                                                 AGENT RUNTIME  Go ·  net/http  :8080 / :8081                             
                                                                                                                                          
                                                                                        
                                                         A2A:8080·MCP:8081(StreamableHTTP+SSE)            
                                                         REST/api/*·WebUI:8888                            
                                                                                        
                                                                                                                                         
                                                                                   [result]                                               
                                                                                                                                         
                                                                                                                                         
                                                                                          
                                                                AgentLoop                                        
                                                                contextwindow·tokenbudget·multi-turn             
                                                                                          
                                                                                                                                
                                                                        MCP tool calls   model calls                                      
                                                                                                                                
                                                                                                                                        
                                           
                                       MCPClientModelRouter                           
                                       modelcontextprotocol/go-sdkAnthropicSDK·OpenAI-compat
                                           
                                                                                           
                                   
                                                                                                                                         
                                    MCP stdio ·  HTTP         MCP stdio ·  HTTP                          HTTPS streaming                    
                                                                                                                                         
                                                                                                                                         
                                                        
                                mcp-toolbox               mcp-toolbox                      Anthropic ·  OpenAI                         
                                :15000  Go                :15001  Go                       claude-opus-4-6 ·  gpt-5                    
                                                        
                                                                                                                                          
                                     Neo4j Bolt                  Neo4j Bolt                                                                 
                                                                                                                                          
                                                                                                                                          
                                                                            
                                                GRAPH BACKEND                                                                             
                                                                                                                                          
                                                                                  
                 Kuzu                      Neo4j Aura                                                                                 
                 local ·  embedded ·  CGO  cloud ·  Bolt ·  managed                                                                   
                                                                                  
                                                                                                                                          
                                                                            
request flow (down) response flow (up) model API calls graph queries

Agent Runtime

The runtime is a single Go binary. One goroutine per session — 500+ concurrent sessions per instance. The agent loop manages context window, token budget, and multi-turn tool use.

A2A Server :8080

JSON-RPC 2.0 over HTTP + SSE streaming. Accepts tasks from CLI, other agents, and REST clients. Exposes /.well-known/agent.json for agent discovery.

MCP Server :8081

Exposes agent_run, agent_list as MCP tools. Skills registered as MCP Prompts. Dual-transport: Streamable HTTP (POST /mcp, MCP spec 2025-03-26) and legacy SSE (GET /sse). Also supports stdio. Used by Claude Desktop, Cursor, etc.

Agent Loop

Core reasoning loop: receives task → assembles context → calls model → dispatches tool calls → injects results → repeats until done. Manages token budget with warn / compact / abort thresholds.

Model Router

Routes to Anthropic SDK (direct, preferred — preserves stop_reason) or OpenAI-compat endpoint. Supports fallback chains: primary model → cheaper fallback → local model.

MCP Client

Connects outbound to MCP tool servers. Uses modelcontextprotocol/go-sdk v1.0.0. Manages tool server subprocesses (mcp-toolbox instances). Per-session tool filtering via middleware.

Code Sandboxes

Tier 1: QuickJS→WASM via wazero (<5 ms cold start, all platforms). Tier 2: CPython→WASM for stdlib. Tier 3: Firecracker microVM for packages + shell (Linux/KVM only).

Local Mode

ai run agent.toml — single binary, zero infra. Kuzu embedded graph, sidecars launched on demand.

         
                                                          YOUR MACHINE                                                  
                   
                           ai  binary                                                                            
             <25MB·<200ms cold start ·  CGO_ENABLED=0                                                            
                                                                                  
                  Agent Runtime                                                                                
                  MCP Client  +  Model Router                                                                  
                                                                                  
                                                                                                               
                                                                                             
                                                                                                               
                                 subprocess spawn                                                               
                                                                                                               
                                                                                                               
                                    
               mcp-toolbox :15000 ·  mcp-toolbox :15001                                                       
               Graph Build :8090 ·  GraphRAG :8091 ·  Memory :8092 ·  Eval Bridge :8093                       
                                    
                                                                                                               
                                                         Bolt                                                   
                                                                                                               
             HTTPS                                                                                              
                                                                   
                                        Kuzu                                                                  
                                        embedded ·  CGO ·  local .kuzu/ directory                             
                                                                   
                                                                                                                
                   
                                                                                                                       
         
                                                                                                                         
                                                                                                                         
                                                                                               
 Anthropic / OpenAI APIs                                                                                                
 HTTPS outbound                                                                                                         
                                                                                               

Cloud Mode

ai deploy — packages the runtime into a Docker image (<50 MB) and deploys to Fly.io. Neo4j Aura replaces Kuzu. Python sidecars run as companion containers.

                                                                                
                            CLOUDFLARE                                                                         
                            DNS ·  CDN ·  TLS termination                                                      
                                                                                
                                                                                                                
                                            HTTPS                                                                
                                                                                                                
                                                                                                                
 
                                                     FLY.IO MACHINE                                            
 per-region ·  shared-cpu-1x ·  256 MB RAM                                                                     
         
     airuntimecontainer              Python sidecar (optional)                       
     A2A:8080·MCP:8081·Web:8888·REST/api/*GraphRAG:8091·Memory:8092 ·  Eval :8093    
         
                                                                                                            
 
                                                                                                              
                     HTTPS streaming                   Neo4j Bolt+s                                             
                                                                                                              
                                                                              Neo4jBolt+s            
                                                                                                              
                                                                                                              
                          
     Anthropic / OpenAI APIs                       Neo4j Aura                                                
     claude-opus-4-6                               managed ·  multi-region                                   
                          

CLI Command Flows

How each command routes through the system.

ai init my-agent ──▶ scaffold agent.toml ──▶ validate config ──▶ write .agent/ dir ai run agent.toml "find top 5 companies by revenue" ──▶ load + expand config ──▶ start runtime (in-process) ──▶ spawn sidecars ──▶ create session ──▶ agent loop ──▶ stream output to stdout ai serve --port 8080 ──▶ load config ──▶ start A2A :8080 + MCP :8081 + REST + Web :8888 ──▶ spawn MCP tool servers ──▶ health-check loop ──▶ ready ai graph build --source ./docs ──▶ start Graph Construction sidecar :8090 ──▶ poll /health ──▶ POST /build ──▶ stream ingestion progress ──▶ Kuzu / Aura ai eval --suite evals/ ──▶ start Eval Bridge sidecar :8093 ──▶ run eval harness ──▶ POST results to Opik / Arize ──▶ print score table ai deploy ──▶ make cross-compile linux/amd64 ──▶ docker build ──▶ fly deploy ──▶ tail logs ──▶ health-check https://<app>.fly.dev/health ai web ──▶ start Web UI on :8888 ──▶ chat + trace + graph explorer + eval tab + cost counter ai skill list | run <name> ──▶ load skill registry ──▶ execute skill prompt template ──▶ stream output ai sidecar start <name> | status | mcp-toolbox --config toolbox.yaml ──▶ spawn subprocess ──▶ health-check loop ──▶ register MCP tools ai show config | diagram | tools ──▶ resolve agent.toml ──▶ pretty-print to stdout ai status ──▶ query running sidecars + active sessions ──▶ print status table ai completion bash | zsh | fish ──▶ emit shell completion script to stdout

Protocols

A2A — Agent-to-Agent

JSON-RPC 2.0 over HTTP + SSE streaming. Agents discover each other via /.well-known/agent.json. Task lifecycle: submit → working → done / failed. Supports multi-turn delegation between agents.

MCP — Model Context Protocol

Bidirectional: the runtime is both an MCP client (calls tool servers) and an MCP server (exposes agent capabilities). Dual-transport server: Streamable HTTP (POST /mcp) and legacy SSE (GET /sse) on :8081. Also supports stdio for local IDE integrations.

REST Management API

GET /api/agents, PUT /api/agents/:id, GET /api/health, GET /openapi.json. Used by the Web UI and external orchestrators. OTel spans emitted for every request.

Neo4j Bolt

Cypher over Bolt protocol to Kuzu (local) or Neo4j Aura (neo4j+s://). Managed by mcp-toolbox (kind: neo4j, native Bolt support) and Python sidecar services.

Python Sidecars

Optional heavyweight services managed by the Go CLI via subprocess lifecycle. Each sidecar exposes a local HTTP API; the Go runtime polls /health every 250 ms and sends SIGTERM on shutdown (5 s deadline, then SIGKILL).

                                                                                                                          
                                                                Go Runtime                                                                                  
                                                                spawn ·  health-check ·  SIGTERM                                                            
                                                                                                                          
                                                                                                                                                          
                       HTTP                    HTTP                  
                                                                                                                                   
                                                                              HTTP                                      
                                                                                                                                   
                                                                            HTTP                                                                           
                                                                                                                                   
                                                                                                                                                          
       
 Graph Build  :8090                         GraphRAG  :8091                      Memory  :8092                    Eval Bridge  :8093                  
 llm-graph-builder                          neo4j-graphrag                       agent-memory                     Opik / Arize                        
       
                                                                                                                      
                                                           Neo4j Bolt                                                                                       
                       Neo4j Bolt      Neo4j Bolt                                                       
                                                                                                                                                           
                                                                                                                                                           
                                                                                                                                          
                                                        Kuzu / Neo4j Aura                                                                                   
                                                                                                                                                            
                                                                                                                                          

Sidecar Architecture

Port assignments for all managed services, the MCP tool registration flow, and the ai sidecar status output format. See Python Sidecars above for the topology diagram.

Port assignments

ServicePortRuntimeMCP tools registeredStart command
mcp-toolbox:15000Go binary Cypher query tools (defined in toolbox.yaml) ai serve (auto)
mcp-toolbox (Neo4j):15001Go binary Neo4j-native tools (kind: neo4j) ai sidecar mcp-toolbox --config toolbox.yaml
graphrag:8091Python / uvicorn graphrag_search ai sidecar start graphrag
agent-memory:8092Python / uvicorn memory_store, memory_recall ai sidecar start memory
graph-construction:8090Python / uvicorn ingest_documents, get_job_status ai sidecar start graph-construction
eval-bridge:8093Python / uvicorn (REST only — no MCP tools) ai sidecar start eval

Tool registration flow

When ai serve starts, each sidecar follows this lifecycle:

  1. Spawn subprocessexec.CommandContext starts the sidecar binary (toolbox serve for mcp-toolbox; uvicorn for Python sidecars).
  2. Health-check loop — Agent polls GET /health every 250 ms until 200 OK (5 s timeout before SIGKILL on failure).
  3. MCP connect — MCP client connects to http://localhost:{port}/mcp/sse (legacy SSE) or POST http://localhost:{port}/mcp (Streamable HTTP, MCP spec 2025-03-26).
  4. ListTools — Agent calls ListTools on the MCP session; receives all tool definitions in one JSON-RPC response.
  5. Tool set refresh — Tools are merged into the active tool set; the model sees them on the next /messages call.
  6. Live refresh on reconnect — If a sidecar restarts, the MCP client reconnects and calls ListTools again, updating the tool set mid-run.
  7. Graceful shutdown — On ai serve exit, SIGTERM is sent; process has 5 s to flush state before SIGKILL.

ai sidecar status

Run ai sidecar status at any time to inspect all managed services:

Sidecar status (YYYY-MM-DD HH:MM:SS)

SERVICE            PORT   STATUS   UPTIME    LAST_CHECK
mcp-toolbox        15000  ✓ up     43m 12s   200 OK
mcp-toolbox-neo4j  15001  ✓ up     43m 11s   200 OK
graphrag           8091   ✓ up     43m 08s   200 OK
agent-memory       8092   ✓ up     43m 05s   200 OK
graph-construction 8090   – down   –         not started
eval-bridge        8093   – down   –         not started

Tools available: 14 (5 Cypher + 4 generated + 3 graphrag + memory_store + memory_recall)

✓ up — sidecar healthy; MCP connection active and tools registered.
– down — sidecar not running; tools from this sidecar are unavailable.
not started — process was never launched in this ai serve session.

Key ADRs

ADR-001 — Go framework

Custom assembly chosen over google/adk-go. Rationale: adk-go is immature; custom MCP + A2A gives full control with no hidden abstractions.

ADR-003 — Anthropic SDK direct

Use anthropic-sdk-go directly, not OpenAI-compat shim. Shim loses stop_reason precision needed for reliable tool-use detection.

ADR-005 — Split MCP libraries

modelcontextprotocol/go-sdk v1.0.0 for the MCP client (stable API). mark3labs/mcp-go v0.45 for the MCP server (more mature server API).

ADR-006 — Dual-role MCP

Runtime is simultaneously MCP client + server in one process. Per-session tool authorization via middleware (ToolFilterFunc is static — no context).

ADR-004 — mcp-toolbox for Neo4j (Superseded)

mcp-toolbox supports Neo4j natively (kind: neo4j, 48 source kinds total). Original CypherMCP plan superseded — no custom server needed. Runs on :15000 / :15001.

Language strategy

Go for all runtime, CLI, and tool servers. Python for graph-heavy workloads (GraphRAG, graph construction, memory) — run as managed HTTP sidecar processes.