Skip to content

ADR-005: Skills System Architecture

Date: 2025-10-26 Status: Accepted

Context

Nova AI's agents need specialized knowledge across diverse domains:

  • Python best practices (PEP 8, type hints, async patterns)
  • Testing strategies (pytest, mocking, coverage)
  • Security patterns (input validation, authentication, encryption)
  • GitHub automation (PR workflows, commit conventions, CI/CD)
  • Documentation standards (Sphinx, docstrings, ADRs)

Initial Approach Problems

Monolithic Agent Prompts:

<!-- code-reviewer.md -->
You are a code reviewer. You must check:
- Security: SQL injection, XSS, CSRF...
- Style: PEP 8, naming, imports...
- Testing: Coverage, edge cases...
- Documentation: Docstrings, comments...
(... 5000 lines of detailed instructions)

Issues: 1. Prompt Length: Agent files ballooned to 5K+ lines (>80% of token limit) 2. Context Waste: Most knowledge irrelevant for specific tasks 3. Duplication: Same security patterns in 5+ agent files 4. Maintenance: Updates required editing multiple files 5. No Reusability: Knowledge locked in specific agents

Requirements

  1. Modularity: Knowledge separated into reusable skills
  2. Progressive Disclosure: Load only relevant knowledge per task
  3. Maintainability: Update knowledge in one place
  4. Composability: Mix and match skills per agent
  5. Efficiency: Minimize prompt tokens (leverage caching)

Decision

We implemented a hierarchical skills system with progressive disclosure via @import directives.

Architecture

.claude/skills/
├── README.md                          # Skills system documentation
├── development/                       # Coding best practices
│   ├── SKILL.md                      # Domain overview
│   ├── python-best-practices.md      # PEP 8, type hints, async
│   ├── testing-strategies.md         # pytest, mocking, coverage
│   ├── error-handling.md             # Exception patterns
│   └── performance-optimization.md   # Profiling, caching
├── github-automation/                # GitHub workflows
│   ├── SKILL.md                      # Domain overview
│   ├── pr-workflows.md               # Creating, reviewing PRs
│   ├── commit-conventions.md         # Conventional commits
│   ├── ci-cd-patterns.md             # GitHub Actions best practices
│   └── release-management.md         # Versioning, changelogs
├── operations/                       # DevOps and deployment
│   ├── SKILL.md                      # Domain overview
│   ├── deployment-strategies.md      # Blue-green, canary, rollback
│   ├── monitoring.md                 # Logging, metrics, alerts
│   └── security-hardening.md         # Secrets, permissions, auditing
└── meta/                             # Meta-skills
    ├── SKILL.md                      # Domain overview
    ├── agent-communication.md        # Inter-agent protocols
    ├── cost-optimization.md          # Token usage, caching
    └── debugging-agents.md           # Agent troubleshooting

Progressive Disclosure via @import

Agent Base Prompt (concise):

<!-- .claude/agents/code-reviewer.md -->
---
name: code-reviewer
mode: auto
tools:
  - allow: Read, Grep, Glob
  - ask: Bash, Write
---

# Code Reviewer Agent

You perform security, correctness, and maintainability reviews.

Core checklist:
- Security vulnerabilities
- Logic errors and edge cases
- Code style and maintainability
- Test coverage

@import .claude/skills/development/python-best-practices.md
@import .claude/skills/development/testing-strategies.md
@import .claude/skills/operations/security-hardening.md

Skill File (detailed, cached):

<!-- .claude/skills/development/python-best-practices.md -->

# Python Best Practices

## Type Hints
Always use type hints for function signatures:
```python
def process_data(items: List[Dict[str, Any]]) -> pd.DataFrame:
    """Process items into DataFrame."""
    ...

Async Patterns

Use async/await for I/O-bound operations:

async def fetch_data(url: str) -> Dict:
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.json()

(... detailed patterns with examples)

**How It Works**:
1. Agent base prompt (500 tokens) loaded on every call
2. `@import` directives trigger skill loading (2K tokens each)
3. Skills marked with `cache_control` for 90% cost reduction
4. Only imported skills loaded (not entire skills library)

### Skill Composition

Agents can import multiple skills:

```markdown
<!-- .claude/agents/architect.md -->
@import .claude/skills/development/python-best-practices.md
@import .claude/skills/development/performance-optimization.md
@import .claude/skills/operations/deployment-strategies.md
@import .claude/skills/meta/cost-optimization.md

Benefits: - Mix domain expertise per agent - Share common knowledge (DRY principle) - Cache skill content across agents (90% savings)

Skill Metadata (YAML Frontmatter)

Each skill file includes metadata:

---
skill_name: python-best-practices
domain: development
version: 1.2.0
updated: 2025-10-26
cache: true  # Enable prompt caching
prerequisites: []  # Other skills to import first
applicable_agents:
  - code-reviewer
  - architect
  - debugger
---

Implementation

Agent Loading with Skills (src/orchestrator/claude_sdk_executor.py):

def load_agent_with_skills(self, agent_name: str) -> str:
    """Load agent prompt and resolve @import directives."""
    agent_path = self.agents_dir / f"{agent_name}.md"
    agent_content = agent_path.read_text()

    # Find all @import directives
    imports = re.findall(r'@import\s+(.+\.md)', agent_content)

    # Load skill files
    skill_content = []
    for skill_path in imports:
        full_path = self.project_root / skill_path
        skill = full_path.read_text()

        # Mark skill for prompt caching
        skill_content.append({
            "type": "text",
            "text": skill,
            "cache_control": {"type": "ephemeral"}  # 90% cost reduction
        })

    # Combine agent + skills
    return agent_content, skill_content

Skill Validation (ensures no broken imports):

def validate_skills(self) -> List[str]:
    """Validate all @import directives resolve correctly."""
    errors = []

    for agent_file in self.agents_dir.glob("*.md"):
        content = agent_file.read_text()
        imports = re.findall(r'@import\s+(.+\.md)', content)

        for skill_path in imports:
            full_path = self.project_root / skill_path
            if not full_path.exists():
                errors.append(f"{agent_file.name}: Missing skill {skill_path}")

    return errors

Consequences

Positive

  1. 80% Smaller Agent Files: 5K lines → 1K lines (skills separated)
  2. 90% Cost Reduction on Skills: Prompt caching eliminates repeated skill loading
  3. DRY Principle: Update knowledge in one place (e.g., security patterns)
  4. Composability: Mix skills per agent (e.g., architect = dev + ops + meta)
  5. Progressive Disclosure: Load only relevant knowledge per task
  6. Versioning: Track skill versions independently (v1.2.0)
  7. Reusability: Same skill used by 5+ agents

Negative

  1. Indirection: Must follow @import to see full agent context
  2. Cache Dependency: Requires Claude SDK v0.1.4+ with prompt caching
  3. Validation Overhead: Must check all imports resolve correctly
  4. Learning Curve: Developers must understand skills system

Trade-offs

Considered Alternatives:

  1. Monolithic Agent Prompts (Original)
  2. ❌ 5K+ line files
  3. ❌ 95% duplication across agents
  4. ❌ No caching benefits
  5. ✅ Simple (everything in one file)

  6. External Knowledge Base

  7. ✅ Centralized knowledge
  8. ❌ Requires KB search per query (50-200ms)
  9. ❌ Retrieval may miss relevant context
  10. ❌ More complex architecture

  11. Skills System with @import (Chosen)

  12. ✅ Modular, reusable
  13. ✅ 90% cost savings via caching
  14. ✅ Progressive disclosure
  15. ⚠️ Requires import validation

  16. Python Package Imports

  17. ✅ Familiar to developers
  18. ❌ Skills are not code (Markdown)
  19. ❌ Breaks prompt caching
  20. ❌ Runtime overhead

  21. LangChain Tools

  22. ✅ Composable tools
  23. ❌ Vendor lock-in (LangChain)
  24. ❌ Not designed for knowledge (more for actions)
  25. ❌ No caching benefits

Why We Chose Skills with @import: - Maximizes prompt caching benefits (90% savings) - Simple, declarative syntax (Markdown) - Works with Claude SDK native features - Clear separation of concerns (agent vs skills)

Skill Organization Principles

1. Domain-Based Hierarchy

Skills organized by domain: - development/ - Coding best practices - github-automation/ - GitHub workflows - operations/ - DevOps and deployment - meta/ - Meta-skills (agent communication, cost optimization)

2. SKILL.md Convention

Each domain has a SKILL.md overview:

<!-- .claude/skills/development/SKILL.md -->

# Development Skills

This domain contains coding best practices and patterns.

## Available Skills

- [python-best-practices.md](./python-best-practices.md) - PEP 8, type hints, async
- [testing-strategies.md](./testing-strategies.md) - pytest, mocking, coverage
- [error-handling.md](./error-handling.md) - Exception patterns
- [performance-optimization.md](./performance-optimization.md) - Profiling, caching

## Usage

Import specific skills in agent frontmatter:
@import .claude/skills/development/python-best-practices.md

3. Skill Size Guidelines

  • Target: 1-3K tokens per skill (fits in single cache block)
  • Maximum: 5K tokens (split if larger)
  • Minimum: 200 tokens (merge if smaller)

4. Versioning

Skills follow semantic versioning: - v1.0.0 - Initial version - v1.1.0 - Add new patterns (backward compatible) - v2.0.0 - Breaking changes (rename, restructure)

Cost Impact

Before Skills System:

Code review (10 calls/day):
  Agent prompt: 5K tokens × 10 = 50K tokens
  No caching (different content each time)
  Cost: 50K × $3.00/MTok = $0.15/day

After Skills System:

Code review (10 calls/day):
  Agent prompt: 1K tokens × 10 = 10K tokens (80% reduction)
  Skills: 4K tokens × 1 (cached, loaded once)
  Cache reads: 4K tokens × 9 × 0.1 (90% discount)
  Cost: (10K + 4K + 3.6K) × $3.00/MTok = $0.053/day

Savings: $0.097/day = $35.40/year per agent
With 10 agents: $354/year

Implementation Timeline

  • October 8: Analyzed agent prompt duplication (95% overlap)
  • October 10: Designed skills hierarchy (4 domains)
  • October 12: Implemented @import directive parser
  • October 14: Added prompt caching for skills (90% savings)
  • October 16: Migrated 5 agents to skills system
  • October 18: Validated cost savings ($354/year)
  • October 20: Completed migration (10 agents)

Validation

Tested skills system with: - ✅ 10 agents using skills (80% smaller prompts) - ✅ 20+ skill files across 4 domains - ✅ Prompt caching (90% cost reduction) - ✅ Import validation (no broken links) - ✅ Skill versioning (semantic versions tracked) - ✅ Production use (2+ weeks stable)

Migration Path

Converting Monolithic Agent to Skills:

  1. Extract Common Patterns:

    # Identify duplicated content
    grep -r "PEP 8" .claude/agents/*.md
    # Found in 5 agents → Extract to skill
    

  2. Create Skill File:

    <!-- .claude/skills/development/python-best-practices.md -->
    ---
    skill_name: python-best-practices
    domain: development
    version: 1.0.0
    cache: true
    ---
    
    # Python Best Practices
    (... content extracted from agents)
    

  3. Update Agent Files:

    <!-- Before -->
    # Code Reviewer
    You must follow PEP 8...
    (... 500 lines of Python rules)
    
    <!-- After -->
    # Code Reviewer
    @import .claude/skills/development/python-best-practices.md
    

  4. Validate:

    python scripts/validate_skills.py
    # ✅ All imports resolve correctly
    

References

  • Implementation: src/orchestrator/claude_sdk_executor.py (import parser)
  • Skills Directory: .claude/skills/
  • Validation Script: scripts/validate_skills.py
  • Documentation: .claude/skills/README.md
  • Agent Examples: .claude/agents/code-reviewer.md