Roadmap
All planned work across the project, organized by area. The end-to-end agent loop is the
highest priority — everything else is sequenced around making that work reliably first.
completed
in progress
next
planned
backlog
Core System
Agent execution loop, model routing, config schema, context engineering, memory, cost tracking, and evaluation infrastructure.
| ID | Task | Status | |
|---|---|---|---|
| task-010 | Go module and project skeleton F03F02 | completed | |
| task-011 | Agent config schema (TOML) and loader F05 | completed | |
| task-040 | Model router — Anthropic + OpenAI-compat + fallback chain F10F06 | completed | |
| task-041 | Core agent execution loop with NextStep and Hooks F07F10 | completed | |
| task-042 | Context engineering layer — ContextEngine F06F05 | completed | |
| task-043 | Tool executor — parallel dispatch, partial failure, timeouts F10F11 | completed | |
| task-044 | Security layer — prompt injection detection + A2A input validation F05F06 | completed | |
| task-045 | Model fallback chain and cost circuit breaker F03F06 | completed | |
| task-022 | 4-layer memory architecture F07 | completed | |
| task-030 | AI-assisted config generation prompt and engine F02F05 | completed | |
| task-031 | Onboarding interview agent — ai init F02F04 | completed | |
| task-046 | Prompt caching optimization F06F03 | completed | |
| task-047 | Tool context management for large tool lists F06F04 | completed | |
| task-083 | Durable session store — FileSessionStore + crash recovery F07 | completed | |
| task-085 | Cost observability and budget reporting F11F06 | completed | |
| task-156 | Human-in-the-loop: built-in human_approval MCP tool F07 | medium | completed |
| task-157 | Explicit control flow: intent router and tool-call guards F05F06 | low | completed |
Validation spikes (completed)
| task-001 | Go agent framework evaluation F03F09 | completed | |
| task-003 | Anthropic Go SDK — streaming + tool-use validation F06 | completed | |
| task-005 | MCP Go library selection F09 | completed | |
| task-006 | MCP dual-role PoC — agent as client AND server F09F10 | completed | |
| task-007 | Context engineering validation harness F06 | completed | |
| task-002 | Kuzu Go bindings validation F08 | completed | |
| task-004 | mcp-toolbox Neo4j source evaluation F09 | completed | |
| task-160 | Prompt injection hardening in agent loop F05F06 | high | completed |
| task-161 | Reversibility taxonomy + default-approve-required for irreversible tool calls F07 | high | completed |
| task-162 | Trust hierarchy: principal tiers + trust-level enforcement in agent loop F05 | high | completed |
| task-163 | Platform safety floor: non-overridable hardcoded behaviours F11 | medium | completed |
| task-164 | Honesty guidelines: uncertainty language, capability statements, AI identity F05 | medium | completed |
| task-170 | Wire agent.Agent.Run as real TaskRunner in ai serve F07F09 | critical | completed |
| task-172 | Config-to-agent wiring verification: context thresholds and tool call Index sort F06F05 | high | completed |
| task-173 | cmd_init test coverage F02 | medium | completed |
| task-270 | GenerateToolboxConfig — read_cypher + write_cypher opt-in tools F10F05 | high | planned |
| task-272 | write_cypher — wire as R_DESTROY in generated agent.toml and built-in patterns F07F05 | high | planned |
| task-274 | Integration test — read_cypher executes live Neo4j query via mcp-toolbox F11 | medium | planned |
| task-281 | Dynamic tool descriptions — inject agent name + purpose into agent_run/agent_stream F02F01 | high | planned |
| task-290 | TokenValidator interface + auth middleware for MCP and A2A servers | critical | planned |
| task-291 | RFC 9728 Protected Resource Metadata endpoint | high | planned |
| task-292 | JWT validation with JWKS auto-discovery and key rotation | high | planned |
| task-293 | A2A AgentCard OAuth2 SecurityScheme + scope enforcement | medium | planned |
| task-294 | Auth configuration in agent.toml + deploy.auth schema F02 | high | planned |
| task-298 | E2E auth integration test: MCP client → OAuth → tool call | high | planned |
CLI
The ai command and its subcommands — the primary interface for developers
running, configuring, and serving agents locally.
| ID | Task | Status | |
|---|---|---|---|
| task-012 | CLI framework and top-level command structure F01F03F12 | completed | |
| task-031 | ai init — onboarding interview agent F02F04 | completed | |
| task-051 | ai serve and ai run — serve and single-run modes F01F02 | completed | |
| task-054 | Curl-based install script and binary distribution F02F03 | completed | |
| task-135 | Version in help output and ai with no arguments F01F12 | completed | |
| task-052 | ai show — display agent config with syntax highlighting F05F04 | completed | |
| task-053 | ai web — local developer console (chat + trace + cost) F11F04 | low | completed |
| task-136 | Global --quiet / -q flag for silent mode F01F12 | completed | |
| task-159 | ai init — progressive elicitation with agent type templates F02F04 | medium | completed |
| task-153 | ai show --diagram — ASCII architecture diagram via D2 F11F04 | low | completed |
| task-155 | --plain flag — screen reader and pipeline-safe output F12 | low | completed |
| task-165 | ai init validation: warn when generated config violates our own principles F04F10 | low | completed |
| task-180 | ai init: readline support for cursor/arrow key navigation F11F01F02 | medium | completed |
| task-181 | ai init: fix model name suggestions and add normalization F04F01 | high | completed |
| task-182 | ai init: investigate and fix crash after model name entry F04 | medium | completed |
| task-183 | ai init: fix --agent → --config flag name in 'Next step' output F04 | low | completed |
| task-184 | ai init: --credentials flag for Aura/Sandbox credential file loading F01 | low | completed |
| task-185 | NDJSON done event: include cost_usd, input_tokens, output_tokens F04F11 | medium | completed |
| task-186 | ai init: demo database catalog — gather and add known demo databases F02F03 | medium | completed |
| task-188 | ai serve: auto-detect and start mcp-toolbox sidecar without explicit flag F02F11 | high | completed |
| task-193 | ai graph seed — seed local Kuzu DB from named dataset or Cypher file F08F02F01 | medium | planned |
| task-191 | ai init — local-first Kuzu path (option 5 in URI preset menu) F02F08F04 | high | planned |
| task-194 | ai init — AI-generated domain seed data for local Kuzu F02F08F04 | medium | planned |
| task-208 | Extend --credentials to parse Supabase, PlanetScale, Neon, MongoDB, Redis URLs F02F03 | medium | planned |
| task-240 | ai init — database type selector menu (replaces URI preset menu) F02F04F01 | high | planned |
| task-248 | Extend --credentials to parse Supabase, PlanetScale, Neon, MongoDB, Redis URLs F02F03 | medium | planned |
| task-175 | Shell completion: ai completion bash|zsh|fish — static completions F04F01 | medium | completed |
| task-176 | Shell completion: dynamic completions — config files, skills, task IDs F04F01 | low | completed |
| task-177 | Coding agent UX: --output flag + structured errors + exit codes on all commands F11F01F04 | high | completed |
| task-178 | Coding agent UX: --non-interactive, stdin piping, AI_ env var overrides F02F01F04 | high | completed |
| task-179 | Coding agent UX: ai status, --dry-run, --timeout, and completion self-doc F11F07F02 | medium | completed |
| task-250 | ai show — opinionated default view (identity + truncated prompt + non-defaults) F01F02 | high | completed |
| task-251 | ai show — --full flag (preserve current full TOML dump behavior) F01 | high | completed |
| task-252 | ai show — --tools flag (parse toolbox YAML, show sources + tools + env var status) F02F03 | high | completed |
| task-253 | ai show — --section flag (show single named config section) F01 | medium | completed |
| task-254 | ai show — --filter flag (interactive fzf / built-in line filter) F01F08 | medium | completed |
| task-255 | ai show — --grep flag (non-interactive regex line filter) F01 | low | completed |
| task-271 | ai init — offer generic Cypher fallback tools in onboarding interview F02F04F05 | high | planned |
| task-273 | ai show --tools — annotate fallback tools with visual separator and warning F11F01 | medium | planned |
| task-280 | ai mcp — MCP stdio server command F02F01 | high | planned |
| task-282 | ai show --mcp-config [target] — print ready-to-paste coding agent config snippets F05F11 | high | planned |
| task-283 | ai init — write MCP config files for Claude Code and Cursor at end of onboarding F02F01F11 | medium | planned |
| task-284 | ai init — write MCP config files for Claude Code and Cursor at end of onboarding F02F04 | high | planned |
| task-285 | ai show — MCP tool identity preview section F11F05 | low | planned |
Integrations
MCP client and server, mcp-toolbox sidecar, graph backends (Neo4j, Kuzu), and Python sidecars for GraphRAG, graph construction, memory, and eval.
| ID | Task | Status | |
|---|---|---|---|
| task-034 | MCP client — agent connects to external MCP servers F09 | completed | |
| task-060 | Skills system F10F05 | completed | |
| task-070 | Python sidecar service manager (Go) F10F08 | completed | |
| task-020 | Neo4j connection and schema introspection F08F09 | completed | |
| task-032 | mcp-toolbox config generation and sidecar manager (ai sidecar mcp-toolbox) F09F08 | completed | |
| task-033 | MCP server — agent exposes its tools over MCP F09 | completed | |
| task-071 | GraphRAG retrieval sidecar (Python, port 8091) F08F10 | completed | |
| task-072 | Graph construction sidecar (Python, port 8090) F08F10 | completed | |
| task-073 | Agent memory sidecar (Python, port 8092) F07F08 | completed | |
| task-021 | Kuzu local graph backend F08 | completed | |
| task-158 | Schema-first tool definitions: Go structs as tool contracts F09 | low | completed |
| task-171 | MCP client reconnect backoff implementation F03F11 | high | completed |
| task-174 | mcpserver agent_stream: true SSE relay from agent execution F09F11 | medium | completed |
| task-190 | Add [agent.graph] config section (backend, kuzu_path, seed_dataset) F08F02F04 | high | planned |
| task-192 | Embed seed datasets: recommendations, northwind, countries F08F02F03 | medium | planned |
| task-195 | Wire Kuzu backend into ai serve as runtime graph query target F08F02F11 | high | planned |
| task-206 | Config generator: Redis / Valkey F03F02 | low | planned |
| task-207 | SQL schema introspection for ai init (postgres, mysql, sqlite) F02F04F11 | medium | planned |
| task-209 | SQL seed data for DuckDB and SQLite (recommendations, northwind, countries) F08F02F03 | medium | planned |
| task-241 | Config generator: PostgreSQL (Supabase, Neon, plain postgres) F02F03F08 | high | planned |
| task-242 | Config generator: MySQL (PlanetScale, plain mysql) F02F03 | high | planned |
| task-243 | Config generator: SQLite (local embedded relational) F08F02F03 | medium | planned |
| task-244 | Config generator: DuckDB (local) and MotherDuck (cloud) F08F02F03 | medium | planned |
| task-245 | Config generator: MongoDB Atlas (document store) F02F04 | medium | planned |
| task-246 | Config generator: Redis / Valkey F03F02 | low | planned |
| task-247 | SQL schema introspection for ai init (postgres, mysql, sqlite) F02F04F11 | medium | planned |
| task-249 | SQL seed data for DuckDB and SQLite (recommendations, northwind, countries) F08F02F03 | medium | planned |
Web UI
Local developer console — chat, trace, cost counter, eval integration, graph explorer. See web.html for design mockups. Depends on the agent execution loop (task-041) and REST API (task-091).
| ID | Task | Status | |
|---|---|---|---|
| task-091 | REST management API — /api/* endpoints F09F04 | completed | |
| task-090 | Web UI shell — routing, theme, agent list, config editor F04F11 | low | completed |
| task-053 | ai web v1 — chat + ASCII trace + cost counter + eval F11F04 | low | completed |
| task-142 | Graph explorer tier 1 — schema view (read-only) F11F08 | low | completed |
| task-141 | Full eval tab — run evals, dataset management, diffs F11F04 | low | completed |
| task-143 | Graph explorer tier 2 — Cypher playground F11F08 | low | completed |
| task-147 | Session persistence — JSON/YAML in .ai/ folder F07 | low | completed |
| task-152 | Live streaming trace — spans appear in real time F11 | low | completed |
| task-148 | Session tab switcher and history browser F07F04 | low | completed |
| task-145 | Web accessibility — WCAG 2.1 AA F12 | low | completed |
| task-154 | D2-powered trace diagram — optional visual mode F11 | low | completed |
| task-146 | TUI mode — ai web --tui / ai tui via bubbletea F01F12 | low | completed |
| task-150 | System prompt diff on agent.toml reload F05 | low | completed |
| task-151 | Span replay — re-run any LLM or tool call in isolation F11F07 | low | completed |
| task-144 | Graph explorer tier 3 — visual canvas (Neo4j NVL) F11 | low | completed |
| task-149 | Team sharing — auth, per-user session isolation F09 | low | completed |
| task-265 | ai web — command palette keyboard navigation + fuzzy search F04F01 | medium | completed |
| task-266 | ai web — auto-start ai serve when backend is not running F02F04 | high | completed |
| task-260 | ai web — graceful port conflict handling + reload signal F02F04 | medium | completed |
| task-261 | ai web — unified Data tab with source switcher and generic schema view F04F11F08 | medium | completed |
| task-262 | ai web — SQL query playground (postgres, mysql, sqlite, duckdb) F11F04 | medium | completed |
| task-263 | ai web — MongoDB document browser F11F04 | low | planned |
| task-264 | ai web — Redis / Valkey key browser F11F04 | low | planned |
Deployment & Production
A2A protocol server, OpenTelemetry instrumentation, evaluation framework, Fly.io and Cloud Run deploy targets, distribution, and integration test suites.
| ID | Task | Status | |
|---|---|---|---|
| task-050 | A2A protocol server with auth and task ownership F09F10 | completed | |
| task-055 | A2A auth middleware and rate limiting F09 | completed | |
| task-080 | OpenTelemetry instrumentation F11 | completed | |
| task-081 | Agent execution trace graph writer F11 | completed | |
| task-082 | Evaluation sidecar (Python, port 8093) F11F10 | completed | |
| task-084 | Structural evaluation — no-LLM, CI-runnable F11 | completed | |
| task-110 | Distribution — go install, brew tap, GitHub releases F02F03 | completed | |
| task-120 | Integration tests — end-to-end agent bootstrap F02F11 | completed | |
| task-121 | Integration tests — graph construction and GraphRAG retrieval F08F11 | completed | |
| task-122 | Acceptance tests — mcp-toolbox + Neo4j article and codelab flows F09F11 | completed | |
| task-123 | Integration tests — LLM provider and model router (Anthropic + OpenAI) F10F11 | completed | |
| task-124 | Integration tests — Neo4j and Aura database connectivity F08F11 | completed | |
| task-100 | ai deploy — Fly.io container deployment F02F08 | completed | |
| task-101 | Cloudflare domain management and edge routing F02 | completed | |
| task-102 | ai deploy — Cloud Run for graph construction jobs F08F10 | completed | |
| task-103 | Spike — WASM/WASI feasibility (Cloudflare Workers + agent binary) F03F08 | completed | |
| task-187 | CLI: add make install target and ai --version flag F02F01 | medium | completed |
| task-295 | Cloudflare Worker OAuth 2.1 auth proxy with DCR + CIMD F02 | medium | planned |
| task-296 | ai deploy --target cloudflare command F02 | medium | planned |
| task-297 | ai deploy --target cloudrun with IAP auth F02 | low | planned |
Demo Series
Five end-to-end walkthroughs — movie recommendations, company intelligence,
clinical knowledge graph, multi-agent A2A pipeline, and cloud deployment.
Includes fixture configs, validation scripts, and an interactive demo page at
agent-intelligence.ai/demos.
| ID | Task | Status | |
|---|---|---|---|
| task-200 | Demos: Write demo-01 movie recommendations script F02F04 | high | completed |
| task-201 | Demos: Write demo-02 company intelligence + GraphRAG script F09F11 | high | completed |
| task-202 | Demos: Write demo-03 clinical knowledge graph script F10F11 | high | completed |
| task-203 | Demos: Write demo-04 multi-agent A2A pipeline script F07F09 | high | completed |
| task-204 | Demos: Write demo-05 deploy + Claude Desktop script F02F09 | high | completed |
| task-205 | Demos: Build docs/demos.html interactive demo selector page F01F04 | high | completed |
| task-210 | Demos: Create fixture agent.toml + toolbox.yaml for each demo F02 | high | completed |
| task-220 | Demos: Implement A2A client call from agent (agent → agent delegation) F09F10 | high | completed |
| task-221 | Demos: Add --stream flag to ai run for SSE output F01F11 | high | completed |
| task-222 | Demos: Add --endpoint and --token flags to ai run F01F09 | high | completed |
| task-223 | Demos: Package and host fixture tarballs for quick-start F02 | medium | completed |
| task-225 | Demos: Document Claude Desktop MCP integration in docs/api.html F01F09 | medium | completed |
| task-226 | Demos: Document ai run --endpoint, --token, --stream flags in CLI reference F01F04 | medium | completed |
| task-227 | Demos: Add multi-sidecar architecture section to docs/architecture.html F10F11 | medium | completed |
| task-228 | Demos: Add multi-agent A2A topology section to docs/api.html F07F09 | medium | completed |
| task-230 | Demos: Verify all demo Cypher queries against live Neo4j demo databases F11 | high | completed |
| task-231 | Demos: Create end-to-end validation script for demo 01 F11 | medium | completed |
| task-232 | Demos: Add Demo Series section and link to docs/roadmap.html F02 | low | completed |
Docs & Landing Page
Public site at agent-intelligence.ai — landing page, docs pages, content
compliance, analytics, and deployment to Cloudflare Pages.
| Task | Status | |
|---|---|---|
| Landing page — above-the-fold layout, hero, feature strip | completed | |
| Terminal animation engine and screenplay | completed | |
CLI reference page — docs/cli.html | completed | |
Config reference page — docs/config.html | completed | |
API reference page — docs/api.html | completed | |
| Architecture page with ASCII diagrams (D2 pipeline) | completed | |
Web UI design page — docs/web.html with TUI mockups | completed | |
Roadmap page — docs/roadmap.html | completed | |
Demos page — docs/demos.html interactive demo selector | completed | |
docs/llms.txt for LLM navigation | completed | |
| Copy buttons and curl install block on landing page | completed | |
| GitHub star badge / widget on landing page | planned | |
| Inline email capture form with backend endpoint | planned | |
| Cloudflare Web Analytics script tag | planned | |
| Configure Cloudflare Pages deployment | planned | |
| Full accessibility pass — WCAG 2.1 AA | planned | |
| Performance audit and asset weight optimization | planned | |
| Content compliance — remove internal tech framing from copy | planned | |
| Cross-browser validation and smoke test | planned |
Sandbox & Code Execution
Secure multi-tier code execution: JavaScript via QJS + wazero (tier 1), Python via CPython WASM (tier 2a) or Firecracker (tier 2b), mini-Linux via Firecracker + nsjail fallback (tier 3).
| ID | Task | Status | |
|---|---|---|---|
| task-130 | Design secure code execution sandbox subsystem F10 | completed | |
| task-131 | JavaScript sandbox — QJS + wazero, tier 1 F10F08 | completed | |
| task-132 | Python sandbox — CPython WASM (tier 2a) and Firecracker (tier 2b) F10F08 | completed | |
| task-133 | Mini-Linux sandbox — Firecracker + nsjail fallback, tier 3 F10F08 | completed | |
| task-134 | Sandbox MCP tools and code-execution skill F09F10 | completed | |
| task-132b | Python Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08 | planned | |
| task-132b | Python Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08 | planned | |
| task-132b | Python Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08 | planned |