Roadmap

All planned work across the project, organized by area. The end-to-end agent loop is the highest priority — everything else is sequenced around making that work reliably first.

completed   in progress   next   planned   backlog

135Completed
0In Progress
39Planned
174Total
78% complete

Core System

Agent execution loop, model routing, config schema, context engineering, memory, cost tracking, and evaluation infrastructure.

IDTaskStatus
task-010Go module and project skeleton F03F02completed
task-011Agent config schema (TOML) and loader F05completed
task-040Model router — Anthropic + OpenAI-compat + fallback chain F10F06completed
task-041Core agent execution loop with NextStep and Hooks F07F10completed
task-042Context engineering layer — ContextEngine F06F05completed
task-043Tool executor — parallel dispatch, partial failure, timeouts F10F11completed
task-044Security layer — prompt injection detection + A2A input validation F05F06completed
task-045Model fallback chain and cost circuit breaker F03F06completed
task-0224-layer memory architecture F07completed
task-030AI-assisted config generation prompt and engine F02F05completed
task-031Onboarding interview agent — ai init F02F04completed
task-046Prompt caching optimization F06F03completed
task-047Tool context management for large tool lists F06F04completed
task-083Durable session store — FileSessionStore + crash recovery F07completed
task-085Cost observability and budget reporting F11F06completed
task-156Human-in-the-loop: built-in human_approval MCP tool F07mediumcompleted
task-157Explicit control flow: intent router and tool-call guards F05F06lowcompleted

Validation spikes (completed)

task-001Go agent framework evaluation F03F09completed
task-003Anthropic Go SDK — streaming + tool-use validation F06completed
task-005MCP Go library selection F09completed
task-006MCP dual-role PoC — agent as client AND server F09F10completed
task-007Context engineering validation harness F06completed
task-002Kuzu Go bindings validation F08completed
task-004mcp-toolbox Neo4j source evaluation F09completed
task-160Prompt injection hardening in agent loop F05F06highcompleted
task-161Reversibility taxonomy + default-approve-required for irreversible tool calls F07highcompleted
task-162Trust hierarchy: principal tiers + trust-level enforcement in agent loop F05highcompleted
task-163Platform safety floor: non-overridable hardcoded behaviours F11mediumcompleted
task-164Honesty guidelines: uncertainty language, capability statements, AI identity F05mediumcompleted
task-170Wire agent.Agent.Run as real TaskRunner in ai serve F07F09criticalcompleted
task-172Config-to-agent wiring verification: context thresholds and tool call Index sort F06F05highcompleted
task-173cmd_init test coverage F02mediumcompleted
task-270GenerateToolboxConfig — read_cypher + write_cypher opt-in tools F10F05highplanned
task-272write_cypher — wire as R_DESTROY in generated agent.toml and built-in patterns F07F05highplanned
task-274Integration test — read_cypher executes live Neo4j query via mcp-toolbox F11mediumplanned
task-281Dynamic tool descriptions — inject agent name + purpose into agent_run/agent_stream F02F01highplanned
task-290TokenValidator interface + auth middleware for MCP and A2A serverscriticalplanned
task-291RFC 9728 Protected Resource Metadata endpointhighplanned
task-292JWT validation with JWKS auto-discovery and key rotationhighplanned
task-293A2A AgentCard OAuth2 SecurityScheme + scope enforcementmediumplanned
task-294Auth configuration in agent.toml + deploy.auth schema F02highplanned
task-298E2E auth integration test: MCP client → OAuth → tool callhighplanned

CLI

The ai command and its subcommands — the primary interface for developers running, configuring, and serving agents locally.

IDTaskStatus
task-012CLI framework and top-level command structure F01F03F12completed
task-031ai init — onboarding interview agent F02F04completed
task-051ai serve and ai run — serve and single-run modes F01F02completed
task-054Curl-based install script and binary distribution F02F03completed
task-135Version in help output and ai with no arguments F01F12completed
task-052ai show — display agent config with syntax highlighting F05F04completed
task-053ai web — local developer console (chat + trace + cost) F11F04lowcompleted
task-136Global --quiet / -q flag for silent mode F01F12completed
task-159ai init — progressive elicitation with agent type templates F02F04mediumcompleted
task-153ai show --diagram — ASCII architecture diagram via D2 F11F04lowcompleted
task-155--plain flag — screen reader and pipeline-safe output F12lowcompleted
task-165ai init validation: warn when generated config violates our own principles F04F10lowcompleted
task-180ai init: readline support for cursor/arrow key navigation F11F01F02mediumcompleted
task-181ai init: fix model name suggestions and add normalization F04F01highcompleted
task-182ai init: investigate and fix crash after model name entry F04mediumcompleted
task-183ai init: fix --agent → --config flag name in 'Next step' output F04lowcompleted
task-184ai init: --credentials flag for Aura/Sandbox credential file loading F01lowcompleted
task-185NDJSON done event: include cost_usd, input_tokens, output_tokens F04F11mediumcompleted
task-186ai init: demo database catalog — gather and add known demo databases F02F03mediumcompleted
task-188ai serve: auto-detect and start mcp-toolbox sidecar without explicit flag F02F11highcompleted
task-193ai graph seed — seed local Kuzu DB from named dataset or Cypher file F08F02F01mediumplanned
task-191ai init — local-first Kuzu path (option 5 in URI preset menu) F02F08F04highplanned
task-194ai init — AI-generated domain seed data for local Kuzu F02F08F04mediumplanned
task-208Extend --credentials to parse Supabase, PlanetScale, Neon, MongoDB, Redis URLs F02F03mediumplanned
task-240ai init — database type selector menu (replaces URI preset menu) F02F04F01highplanned
task-248Extend --credentials to parse Supabase, PlanetScale, Neon, MongoDB, Redis URLs F02F03mediumplanned
task-175Shell completion: ai completion bash|zsh|fish — static completions F04F01mediumcompleted
task-176Shell completion: dynamic completions — config files, skills, task IDs F04F01lowcompleted
task-177Coding agent UX: --output flag + structured errors + exit codes on all commands F11F01F04highcompleted
task-178Coding agent UX: --non-interactive, stdin piping, AI_ env var overrides F02F01F04highcompleted
task-179Coding agent UX: ai status, --dry-run, --timeout, and completion self-doc F11F07F02mediumcompleted
task-250ai show — opinionated default view (identity + truncated prompt + non-defaults) F01F02highcompleted
task-251ai show — --full flag (preserve current full TOML dump behavior) F01highcompleted
task-252ai show — --tools flag (parse toolbox YAML, show sources + tools + env var status) F02F03highcompleted
task-253ai show — --section flag (show single named config section) F01mediumcompleted
task-254ai show — --filter flag (interactive fzf / built-in line filter) F01F08mediumcompleted
task-255ai show — --grep flag (non-interactive regex line filter) F01lowcompleted
task-271ai init — offer generic Cypher fallback tools in onboarding interview F02F04F05highplanned
task-273ai show --tools — annotate fallback tools with visual separator and warning F11F01mediumplanned
task-280ai mcp — MCP stdio server command F02F01highplanned
task-282ai show --mcp-config [target] — print ready-to-paste coding agent config snippets F05F11highplanned
task-283ai init — write MCP config files for Claude Code and Cursor at end of onboarding F02F01F11mediumplanned
task-284ai init — write MCP config files for Claude Code and Cursor at end of onboarding F02F04highplanned
task-285ai show — MCP tool identity preview section F11F05lowplanned

Integrations

MCP client and server, mcp-toolbox sidecar, graph backends (Neo4j, Kuzu), and Python sidecars for GraphRAG, graph construction, memory, and eval.

IDTaskStatus
task-034MCP client — agent connects to external MCP servers F09completed
task-060Skills system F10F05completed
task-070Python sidecar service manager (Go) F10F08completed
task-020Neo4j connection and schema introspection F08F09completed
task-032mcp-toolbox config generation and sidecar manager (ai sidecar mcp-toolbox) F09F08completed
task-033MCP server — agent exposes its tools over MCP F09completed
task-071GraphRAG retrieval sidecar (Python, port 8091) F08F10completed
task-072Graph construction sidecar (Python, port 8090) F08F10completed
task-073Agent memory sidecar (Python, port 8092) F07F08completed
task-021Kuzu local graph backend F08completed
task-158Schema-first tool definitions: Go structs as tool contracts F09lowcompleted
task-171MCP client reconnect backoff implementation F03F11highcompleted
task-174mcpserver agent_stream: true SSE relay from agent execution F09F11mediumcompleted
task-190Add [agent.graph] config section (backend, kuzu_path, seed_dataset) F08F02F04highplanned
task-192Embed seed datasets: recommendations, northwind, countries F08F02F03mediumplanned
task-195Wire Kuzu backend into ai serve as runtime graph query target F08F02F11highplanned
task-206Config generator: Redis / Valkey F03F02lowplanned
task-207SQL schema introspection for ai init (postgres, mysql, sqlite) F02F04F11mediumplanned
task-209SQL seed data for DuckDB and SQLite (recommendations, northwind, countries) F08F02F03mediumplanned
task-241Config generator: PostgreSQL (Supabase, Neon, plain postgres) F02F03F08highplanned
task-242Config generator: MySQL (PlanetScale, plain mysql) F02F03highplanned
task-243Config generator: SQLite (local embedded relational) F08F02F03mediumplanned
task-244Config generator: DuckDB (local) and MotherDuck (cloud) F08F02F03mediumplanned
task-245Config generator: MongoDB Atlas (document store) F02F04mediumplanned
task-246Config generator: Redis / Valkey F03F02lowplanned
task-247SQL schema introspection for ai init (postgres, mysql, sqlite) F02F04F11mediumplanned
task-249SQL seed data for DuckDB and SQLite (recommendations, northwind, countries) F08F02F03mediumplanned

Web UI

Local developer console — chat, trace, cost counter, eval integration, graph explorer. See web.html for design mockups. Depends on the agent execution loop (task-041) and REST API (task-091).

IDTaskStatus
task-091REST management API — /api/* endpoints F09F04completed
task-090Web UI shell — routing, theme, agent list, config editor F04F11lowcompleted
task-053ai web v1 — chat + ASCII trace + cost counter + eval F11F04lowcompleted
task-142Graph explorer tier 1 — schema view (read-only) F11F08lowcompleted
task-141Full eval tab — run evals, dataset management, diffs F11F04lowcompleted
task-143Graph explorer tier 2 — Cypher playground F11F08lowcompleted
task-147Session persistence — JSON/YAML in .ai/ folder F07lowcompleted
task-152Live streaming trace — spans appear in real time F11lowcompleted
task-148Session tab switcher and history browser F07F04lowcompleted
task-145Web accessibility — WCAG 2.1 AA F12lowcompleted
task-154D2-powered trace diagram — optional visual mode F11lowcompleted
task-146TUI mode — ai web --tui / ai tui via bubbletea F01F12lowcompleted
task-150System prompt diff on agent.toml reload F05lowcompleted
task-151Span replay — re-run any LLM or tool call in isolation F11F07lowcompleted
task-144Graph explorer tier 3 — visual canvas (Neo4j NVL) F11lowcompleted
task-149Team sharing — auth, per-user session isolation F09lowcompleted
task-265ai web — command palette keyboard navigation + fuzzy search F04F01mediumcompleted
task-266ai web — auto-start ai serve when backend is not running F02F04highcompleted
task-260ai web — graceful port conflict handling + reload signal F02F04mediumcompleted
task-261ai web — unified Data tab with source switcher and generic schema view F04F11F08mediumcompleted
task-262ai web — SQL query playground (postgres, mysql, sqlite, duckdb) F11F04mediumcompleted
task-263ai web — MongoDB document browser F11F04lowplanned
task-264ai web — Redis / Valkey key browser F11F04lowplanned

Deployment & Production

A2A protocol server, OpenTelemetry instrumentation, evaluation framework, Fly.io and Cloud Run deploy targets, distribution, and integration test suites.

IDTaskStatus
task-050A2A protocol server with auth and task ownership F09F10completed
task-055A2A auth middleware and rate limiting F09completed
task-080OpenTelemetry instrumentation F11completed
task-081Agent execution trace graph writer F11completed
task-082Evaluation sidecar (Python, port 8093) F11F10completed
task-084Structural evaluation — no-LLM, CI-runnable F11completed
task-110Distribution — go install, brew tap, GitHub releases F02F03completed
task-120Integration tests — end-to-end agent bootstrap F02F11completed
task-121Integration tests — graph construction and GraphRAG retrieval F08F11completed
task-122Acceptance tests — mcp-toolbox + Neo4j article and codelab flows F09F11completed
task-123Integration tests — LLM provider and model router (Anthropic + OpenAI) F10F11completed
task-124Integration tests — Neo4j and Aura database connectivity F08F11completed
task-100ai deploy — Fly.io container deployment F02F08completed
task-101Cloudflare domain management and edge routing F02completed
task-102ai deploy — Cloud Run for graph construction jobs F08F10completed
task-103Spike — WASM/WASI feasibility (Cloudflare Workers + agent binary) F03F08completed
task-187CLI: add make install target and ai --version flag F02F01mediumcompleted
task-295Cloudflare Worker OAuth 2.1 auth proxy with DCR + CIMD F02mediumplanned
task-296ai deploy --target cloudflare command F02mediumplanned
task-297ai deploy --target cloudrun with IAP auth F02lowplanned

Demo Series

Five end-to-end walkthroughs — movie recommendations, company intelligence, clinical knowledge graph, multi-agent A2A pipeline, and cloud deployment. Includes fixture configs, validation scripts, and an interactive demo page at agent-intelligence.ai/demos.

IDTaskStatus
task-200Demos: Write demo-01 movie recommendations script F02F04highcompleted
task-201Demos: Write demo-02 company intelligence + GraphRAG script F09F11highcompleted
task-202Demos: Write demo-03 clinical knowledge graph script F10F11highcompleted
task-203Demos: Write demo-04 multi-agent A2A pipeline script F07F09highcompleted
task-204Demos: Write demo-05 deploy + Claude Desktop script F02F09highcompleted
task-205Demos: Build docs/demos.html interactive demo selector page F01F04highcompleted
task-210Demos: Create fixture agent.toml + toolbox.yaml for each demo F02highcompleted
task-220Demos: Implement A2A client call from agent (agent → agent delegation) F09F10highcompleted
task-221Demos: Add --stream flag to ai run for SSE output F01F11highcompleted
task-222Demos: Add --endpoint and --token flags to ai run F01F09highcompleted
task-223Demos: Package and host fixture tarballs for quick-start F02mediumcompleted
task-225Demos: Document Claude Desktop MCP integration in docs/api.html F01F09mediumcompleted
task-226Demos: Document ai run --endpoint, --token, --stream flags in CLI reference F01F04mediumcompleted
task-227Demos: Add multi-sidecar architecture section to docs/architecture.html F10F11mediumcompleted
task-228Demos: Add multi-agent A2A topology section to docs/api.html F07F09mediumcompleted
task-230Demos: Verify all demo Cypher queries against live Neo4j demo databases F11highcompleted
task-231Demos: Create end-to-end validation script for demo 01 F11mediumcompleted
task-232Demos: Add Demo Series section and link to docs/roadmap.html F02lowcompleted

Docs & Landing Page

Public site at agent-intelligence.ai — landing page, docs pages, content compliance, analytics, and deployment to Cloudflare Pages.

TaskStatus
Landing page — above-the-fold layout, hero, feature stripcompleted
Terminal animation engine and screenplaycompleted
CLI reference page — docs/cli.htmlcompleted
Config reference page — docs/config.htmlcompleted
API reference page — docs/api.htmlcompleted
Architecture page with ASCII diagrams (D2 pipeline)completed
Web UI design page — docs/web.html with TUI mockupscompleted
Roadmap page — docs/roadmap.htmlcompleted
Demos page — docs/demos.html interactive demo selectorcompleted
docs/llms.txt for LLM navigationcompleted
Copy buttons and curl install block on landing pagecompleted
GitHub star badge / widget on landing pageplanned
Inline email capture form with backend endpointplanned
Cloudflare Web Analytics script tagplanned
Configure Cloudflare Pages deploymentplanned
Full accessibility pass — WCAG 2.1 AAplanned
Performance audit and asset weight optimizationplanned
Content compliance — remove internal tech framing from copyplanned
Cross-browser validation and smoke testplanned

Sandbox & Code Execution

Secure multi-tier code execution: JavaScript via QJS + wazero (tier 1), Python via CPython WASM (tier 2a) or Firecracker (tier 2b), mini-Linux via Firecracker + nsjail fallback (tier 3).

IDTaskStatus
task-130Design secure code execution sandbox subsystem F10completed
task-131JavaScript sandbox — QJS + wazero, tier 1 F10F08completed
task-132Python sandbox — CPython WASM (tier 2a) and Firecracker (tier 2b) F10F08completed
task-133Mini-Linux sandbox — Firecracker + nsjail fallback, tier 3 F10F08completed
task-134Sandbox MCP tools and code-execution skill F09F10completed
task-132bPython Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08planned
task-132bPython Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08planned
task-132bPython Firecracker sandbox — Tier 2b (follow-on to task-132) F10F08planned