Skip to content

Architecture Overview

Nova AI's multi-agent orchestration system design and component overview.

Table of Contents


System Architecture

User → Slash Command → Claude SDK Executor → Agent Layer → MCP Tools → Services
         (/novaai)                              ↓
                                        Session Management
                                         Cost Tracking
                                     Approval Gates (Security)

Architecture Layers

1. Interface Layer - Claude Code CLI with /novaai and /novaai-execute commands - Python SDK for programmatic access - RESTful API (future)

2. Orchestration Layer - ClaudeSDKExecutor - Main orchestration engine - SessionManager - State persistence and context compression - CostTracker - Token usage and cost monitoring - ApprovalGate - Security approval workflow

3. Agent Layer (6 specialized agents) - orchestrator - Task decomposition and coordination - architect - System design and technical decisions - implementer - Code implementation - code-reviewer - Security and quality validation - tester - Test generation and execution - debugger - Error analysis and fixing

4. Tool Layer - SDK MCP - In-process tools (10-100x faster) - Knowledge Base (FAISS/HNSW) - GitHub operations - External MCP - Stdio servers - Browser automation - Voice transcription/TTS - AWS/Docker operations

5. Service Layer - Anthropic API (Claude Messages) - GitHub API - LangFuse (observability) - Vector databases


Core Components

ClaudeSDKExecutor

Main orchestration engine wrapping Claude Agent SDK.

Responsibilities: - Task execution with retry logic - Agent lifecycle management - Session persistence - Tool coordination - Cost tracking

Key Features: - API retry with exponential backoff - Circuit breaker pattern (5 failures → open) - Session compression (88-95% overhead reduction) - Prompt caching (90% cost savings)

executor = ClaudeSDKExecutor(
    project_root=Path.cwd(),
    agent_name="orchestrator",
    use_sdk_mcp=True,  # Enable SDK MCP
    model="sonnet"     # Sonnet 4.5
)

result = await executor.run_task("implement feature")

File: src/orchestrator/claude_sdk_executor.py

SessionManager

Manages conversation state and context compression.

Features: - Lightweight state (max 150K tokens) - Resume conversations across executions - Compress decision history (10 decisions → summary) - Session cleanup (auto-expire after 24h)

File: src/orchestrator/session_manager.py

CostTracker

Monitors token usage and API costs.

Metrics Tracked: - Input/output tokens - Cache hits/misses - Total cost (USD) - Duration (ms)

File: src/orchestrator/cost_tracker.py


Data Flow

Task Execution Flow

1. User Request → /novaai "implement feature X"
2. Command Parser → Extract task description
3. ClaudeSDKExecutor → Initialize session (or resume)
4. Orchestrator Agent → Decompose task into subtasks
5. Specialist Agents → Execute subtasks in parallel
6. Code-Reviewer Agent → Validate output (mandatory)
7. Results Aggregation → Combine agent outputs
8. Session Save → Persist state for resumption
9. Response → Return to user

Knowledge Base Query Flow

User Query → KB Server (SDK MCP)
         Vector Search (FAISS/HNSW)
         Top-K Results (similarity ranked)
         Context Enhancement
         Return to Agent

Performance: 69.6x faster than external MCP (4ms vs 278ms)


Agent System

Agent Specification

Each agent defined in .claude/agents/<agent-name>.md:

---
name: code-reviewer
description: Security and quality code review
tools:
  - Read
  - Grep
  - Glob
  - Bash
model: sonnet
---

# Agent Instructions

[Markdown instructions for agent behavior]

Agent Coordination

Orchestrator Agent coordinates specialist agents:

  1. Task Analysis - Understand user request
  2. Decomposition - Break into subtasks
  3. Assignment - Route subtasks to specialists
  4. Aggregation - Combine results
  5. Validation - Code review before returning

Dev→Review→Refine Loop:

Implementer → Code-Reviewer → Pass? → Return
                  ↓ Fail
              Implementer (retry with fixes)

Agent Tools

Standard Tools: - Read - File reading - Write - File writing - Edit - In-place editing - Grep - Content search - Glob - File pattern matching - Bash - Command execution

MCP Tools (via SDK): - search_knowledge_base - Vector search - github_search_code - GitHub search - github_get_pr - PR retrieval


MCP Integration

SDK MCP (In-Process)

Advantages: - 10-100x faster (no stdio overhead) - Direct Python function calls - Shared memory space - No serialization overhead

Implementation:

# Register SDK MCP server
kb_sdk_server = build_kb_sdk_server()

mcp_servers = [kb_sdk_server]

client = ClaudeAgent(
    mcp_servers=mcp_servers,  # In-process
    system_prompt="..."
)

Files: - src/orchestrator/tools/sdk_mcp_kb_server.py - src/orchestrator/tools/sdk_mcp_github_server.py

External MCP (Stdio)

Status: Not used - Nova AI uses SDK MCP only for maximum performance (10-100x speedup).

All MCP servers are in-process (SDK-based). No stdio/external MCP servers are configured.


Performance Optimizations

1. Prompt Caching (90% Cost Savings)

system_prompt = {
    "type": "preset",
    "preset": "claude_code",
    "cache_control": {"type": "ephemeral"}  # Cache system prompt
}

Results: - First request: 50K tokens @ $3/M = $0.15 - Cached request: 5K tokens @ $0.30/M = $0.002 - 98.7% cost reduction on cache hits

2. SDK MCP (10-100x Faster)

Operation External MCP SDK MCP Speedup
KB Search 278ms 4ms 69.6x
GitHub API 150ms 15ms 10x

3. Session Compression (88-95% Overhead)

Before: - Full conversation history (150K tokens) - Every message added to history - Context window fills quickly

After: - Lightweight state (decision summaries) - Compress every 10 decisions - Resume with minimal context

Savings: - Overhead: 50K → 5K tokens (90% reduction) - Cost per resume: $0.15 → $0.015

4. Parallel Agent Execution

# Execute agents in parallel
tasks = [
    ParallelTask("code-reviewer", "review src/"),
    ParallelTask("tester", "run tests"),
    ParallelTask("architect", "assess design")
]

results = await parallel_executor.execute(tasks)

Performance: - Sequential: 3 × 60s = 180s - Parallel: max(60s) = 60s - 3x faster



Last Updated: November 2025 Version: 2.3.0