Skip to content

Agent Guardrails

Security layers and safety controls for Nova AI agent system.

Table of Contents


Overview

Nova AI implements defense-in-depth security with multiple layers of protection to prevent malicious or accidental misuse of agent capabilities.

Security Philosophy

  1. Fail-Safe Defaults: Deny by default, explicit allow
  2. Least Privilege: Agents get minimum permissions needed
  3. Human Oversight: Critical operations require approval
  4. Defense in Depth: Multiple independent security layers
  5. Audit Everything: Comprehensive logging for accountability

Defense Layers

Layer 1: Input Validation
   ↓ (Malformed input rejected)
Layer 2: Permission System
   ↓ (Unauthorized tools blocked)
Layer 3: Approval Gates
   ↓ (Destructive ops require approval)
Layer 4: Execution Sandbox
   ↓ (Isolated execution environment)
Layer 5: Output Sanitization
   ↓ (XSS/injection prevention)
Layer 6: Audit Logging
   ↓ (All actions logged)
Layer 7: Rate Limiting
   ↓ (Prevents abuse)

Input Validation

1. Path Validation

Location: src/utils/path_security.py

Protections: - Path traversal prevention (../, /etc/, etc.) - Symlink validation (no links to forbidden dirs) - Ownership verification (UID must match process) - Length limits (max 4096 bytes - prevents DoS) - Forbidden directory blocking (/etc, /sys, /proc, etc.)

Example:

from src.utils.path_security import validate_safe_path

# Safe path validation
def read_file(path: str):
    # Validates against:
    # - Path traversal (../)
    # - Symlink attacks
    # - System directories
    # - Ownership mismatch
    safe_path = validate_safe_path(
        path,
        allowed_base=project_root,
        must_exist=True
    )

    with open(safe_path) as f:
        return f.read()

Threat Prevented: Arbitrary file read/write

2. Command Injection Prevention

Location: src/orchestrator/security/validators.py

Protections: - Shell metacharacter escaping (;, |, &&, etc.) - Command whitelist validation - Argument length limits - No shell=True in subprocess calls

Example:

import shlex
import subprocess

def safe_command(command: str):
    # Validate command is in whitelist
    if not is_allowed_command(command):
        raise SecurityError(f"Command not allowed: {command}")

    # Escape arguments
    safe_args = shlex.split(command)

    # Execute without shell
    subprocess.run(safe_args, shell=False)

Threat Prevented: Command injection, arbitrary code execution

3. SQL Injection Prevention

Best Practice: Always use parameterized queries

# ❌ VULNERABLE: String concatenation
cursor.execute(f"SELECT * FROM users WHERE id = '{user_id}'")

# ✅ SAFE: Parameterized query
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

Threat Prevented: SQL injection, data breach

4. XSS Prevention

Location: src/orchestrator/security/xss_protection.py

Protections: - HTML entity encoding (<, >, &, ", ') - JavaScript escape sequence handling - Recursive sanitization of nested structures - Depth limits (max 10 levels - prevents infinite recursion)

Example:

from src.orchestrator.security.xss_protection import sanitize_for_display

# Sanitize user input before display
user_input = "<script>alert('XSS')</script>"
safe_output = sanitize_for_display(user_input)
# Result: "&lt;script&gt;alert(&#x27;XSS&#x27;)&lt;/script&gt;"

Threat Prevented: Cross-site scripting (XSS)

5. SSRF Prevention

Location: Claude Code built-in WebFetch tool

Nova AI uses Claude Code's built-in WebFetch/WebSearch tools which include SSRF protection: - Private IP range blocking - Cloud metadata endpoint blocking - Localhost blocking - DNS resolution validation - Scheme validation (http/https only)

Example:

def fetch_url(url: str) -> str:
    # Parse URL
    parsed = urlparse(url)

    # Validate scheme
    if parsed.scheme not in ("http", "https"):
        raise SecurityError(f"Scheme not allowed: {parsed.scheme}")

    # Resolve hostname
    ip = socket.gethostbyname(parsed.hostname)

    # Check if private IP
    if is_private_ip(ip):
        raise SecurityError(f"Access to private IP forbidden: {ip}")

    # Safe to fetch
    return requests.get(url).text

Threat Prevented: Server-side request forgery (SSRF), cloud metadata access


Approval Gates

Architecture

Location: src/orchestrator/security/approval_gate.py

Flow:

Agent requests tool execution
Is tool require approval?
   ├─ Yes → Send approval request
   │         ↓
   │      User approves/denies
   │         ↓
   │      Execute or block
   └─ No → Execute immediately

Approval Categories

ALWAYS REQUIRE APPROVAL: - File deletion - Database modification - Network requests to external services - Bash commands (except whitelisted) - PR creation/merge - Deployment operations

NEVER REQUIRE APPROVAL: - File reads - Code analysis - Test execution (read-only) - Documentation generation

CONDITIONAL APPROVAL: - File writes (if overwriting existing) - Bash commands (if on whitelist) - API calls (if under rate limit)

Implementation

from src.orchestrator.security.approval_gate import ApprovalGate

gate = ApprovalGate()

# Check if approval needed
if gate.requires_approval(tool="Write", args={"path": "src/auth.py"}):
    # Send approval request
    approved = await gate.request_approval(
        tool="Write",
        args={"path": "src/auth.py", "content": "..."},
        reason="Implementing authentication module"
    )

    if not approved:
        raise PermissionError("User denied approval")

Approval UI

┌─────────────────────────────────────────────┐
│ Approval Required                            │
├─────────────────────────────────────────────┤
│                                              │
│ Tool: Write                                  │
│ File: src/auth.py                            │
│ Action: Create new file                      │
│                                              │
│ Reason: Implementing authentication module   │
│                                              │
│ Preview:                                     │
│ ┌──────────────────────────────────────┐   │
│ │ def authenticate_user(...):          │   │
│ │     """Authenticate user."""         │   │
│ │     ...                              │   │
│ └──────────────────────────────────────┘   │
│                                              │
│ [Approve]  [Deny]  [View Full]              │
└─────────────────────────────────────────────┘

Access Control

Permission System

Agent Permissions (defined in .claude/agents/*.md):

---
name: implementer
tools:
  - Read      # Can read files
  - Write     # Can write files
  - Edit      # Can edit files
  - Grep      # Can search content
  - Glob      # Can find files
  - Bash      # Can run commands
---

Permission Enforcement:

def execute_tool(agent: str, tool: str, args: dict):
    # Load agent configuration
    agent_config = load_agent_config(agent)

    # Check if tool allowed
    if tool not in agent_config["tools"]:
        raise PermissionError(
            f"Agent '{agent}' not permitted to use tool '{tool}'"
        )

    # Execute tool
    return tool_registry[tool](**args)

File System Restrictions

Allowed Directories: - Project root and subdirectories - Temp directory (/tmp/nova-ai-*) - User-specified output directories

Forbidden Directories: - /etc - System configuration - /sys - System information - /proc - Process information - /dev - Device files - /boot - Boot files - User home outside project (prevents ~/.ssh access)

Implementation:

FORBIDDEN_DIRS = [
    "/etc", "/sys", "/proc", "/dev", "/boot",
    "/root", "/var/log", "/var/spool"
]

def is_allowed_path(path: Path) -> bool:
    resolved = path.resolve()

    for forbidden in FORBIDDEN_DIRS:
        if resolved.is_relative_to(forbidden):
            return False

    return resolved.is_relative_to(PROJECT_ROOT)

Network Restrictions

Allowed: - Anthropic API (api.anthropic.com) - GitHub API (api.github.com) - LangFuse (langfuse.com) - Public documentation sites

Forbidden: - Private IP ranges (10.0.0.0/8, etc.) - Localhost (127.0.0.1) - Cloud metadata (169.254.169.254) - Internal network ranges


Audit Logging

What is Logged

Session Events: - Session creation/destruction - Agent switches - Session forks - Session compression

Tool Executions: - Tool name and arguments - Execution timestamp - User who approved (if required) - Execution result (success/failure) - Execution duration

Security Events: - Permission denials - Approval requests/responses - Validation failures - Rate limit hits

Cost Events: - Token usage per request - Cost per request - Cache hit rates - Model used

Log Format

{
  "timestamp": "2025-11-07T18:30:00Z",
  "event_type": "tool_execution",
  "session_id": "sess_abc123",
  "agent": "implementer",
  "tool": "Write",
  "args": {
    "path": "src/auth.py",
    "size_bytes": 1024
  },
  "approved_by": "user@example.com",
  "result": "success",
  "duration_ms": 150,
  "security_checks": {
    "path_validation": "passed",
    "ownership_check": "passed",
    "approval_required": true,
    "approval_granted": true
  }
}

Log Storage

Locations: - Console: Real-time streaming (stderr) - File: logs/audit.jsonl (JSON Lines format) - LangFuse: Cloud tracing (optional) - OpenTelemetry: Distributed tracing (optional)

Retention: - Local logs: 30 days - LangFuse: Per plan (typically 90 days) - Long-term: Export to S3/GCS for compliance


Threat Model

Threats Considered

Threat Mitigation Layer
Path Traversal validate_safe_path() Input Validation
Command Injection shlex.quote(), no shell=True Input Validation
SQL Injection Parameterized queries Input Validation
XSS HTML entity encoding Output Sanitization
SSRF Private IP blocking Network Control
Unauthorized File Access Ownership checks, forbidden dirs Access Control
Privilege Escalation Tool permissions per agent Access Control
Data Exfiltration Approval gates, audit logging Approval + Audit
Resource Exhaustion Rate limiting, timeout Rate Limiting
Malicious Prompts Approval for destructive ops Approval Gates

Attack Scenarios

Scenario 1: Malicious Prompt

Attack:

User: "Delete all files in /etc"

Defense: 1. Agent proposes Bash tool with rm -rf /etc 2. Path validation blocks /etc access (forbidden directory) 3. Approval gate triggered (if path validation bypassed) 4. User must explicitly approve deletion 5. Audit log records attempt

Result: Attack blocked by multiple layers

Scenario 2: Path Traversal

Attack:

User: "Read ../../etc/passwd"

Defense: 1. Path validation detects ../ sequence 2. Resolved path checked against forbidden directories 3. Access denied before file operation 4. Security event logged

Result: Attack blocked at input validation

Scenario 3: SSRF

Attack:

User: "Fetch http://169.254.169.254/latest/meta-data/"

Defense: 1. URL parsed 2. IP address resolved (169.254.169.254) 3. Detected as cloud metadata endpoint 4. Request blocked before connection 5. Security event logged

Result: Attack blocked at network control


Security Best Practices

For Developers

1. Validate All Inputs

# Always validate paths
safe_path = validate_safe_path(user_path, allowed_base=project_root)

# Always escape commands
safe_args = shlex.split(user_command)

# Always parameterize SQL
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

2. Use Approval Gates

# For destructive operations
if is_destructive(operation):
    if not await request_approval(operation):
        raise PermissionError("Operation denied")

3. Log Security Events

# Log all security-relevant events
logger.warning(
    "Path validation failed",
    extra={
        "path": user_path,
        "reason": "Path traversal attempt",
        "user": current_user
    }
)

4. Principle of Least Privilege

# Give agents minimum permissions needed
---
name: code-reviewer
tools:
  - Read      # Only read access
  - Grep      # Only search
  - Glob      # Only find files
  # No Write, Edit, or Bash
---

For Users

1. Review Approval Requests - Read the full operation before approving - Verify file paths are correct - Check for unexpected arguments - Deny if uncertain

2. Monitor Audit Logs

# Review recent security events
grep "security_event" logs/audit.jsonl | tail -20

3. Use Read-Only Agents

# For exploration, use code-reviewer (read-only)
/novaai-code-reviewer analyze security vulnerabilities

4. Set Up Alerts

# Alert on suspicious activity
# - Multiple permission denials
# - Path traversal attempts
# - Unusual file access patterns


Rate Limiting

Cost-Based Rate Limiting (src/orchestrator/hooks/cost_checker.py):

# Sliding window hourly limits
RATE_LIMITS = {
    "aws_ops": {"calls": 10, "window": 3600},  # 10/hour
    "docker_ops": {"calls": 20, "window": 3600},  # 20/hour
    "web_search": {"calls": 50, "window": 3600},  # 50/hour
}

def check_rate_limit(tool: str):
    if is_rate_limited(tool):
        raise RateLimitError(f"Rate limit exceeded for {tool}")

Benefits: - Prevents runaway costs - Protects against abuse - Enforces usage quotas


Incident Response

Detection

Automated Detection: - Failed validation attempts (> 5/minute) - Permission denials (> 10/hour) - Rate limit hits (sustained) - Unusual file access patterns

Manual Review: - Daily audit log review - Weekly security event analysis - Monthly threat assessment

Response Procedure

1. Detection - Alert triggered or manual discovery

2. Assessment - Review audit logs - Identify attack vector - Assess impact

3. Containment - Block malicious agent/user - Disable affected tools - Isolate compromised sessions

4. Eradication - Fix vulnerability - Update validation rules - Deploy patches

5. Recovery - Restore from backup if needed - Re-enable tools - Resume normal operation

6. Lessons Learned - Document incident - Update threat model - Improve defenses


Security Testing

Unit Tests

# Test path validation
pytest tests/security/test_path_security.py

# Test command injection prevention
pytest tests/security/test_command_injection.py

# Test SSRF prevention
pytest tests/security/test_ssrf_prevention.py

# Test XSS prevention
pytest tests/security/test_xss_protection.py

Integration Tests

# Test approval gates
pytest tests/security/test_approval_gates.py

# Test permission system
pytest tests/security/test_permissions.py

# Test audit logging
pytest tests/security/test_audit_logging.py

Penetration Testing

Recommended Tests: 1. Path traversal attempts 2. Command injection attempts 3. SQL injection attempts 4. XSS injection attempts 5. SSRF attempts 6. Privilege escalation attempts


Compliance

OWASP Top 10

Vulnerability Status Mitigation
A01: Broken Access Control ✅ Protected Permission system, approval gates
A02: Cryptographic Failures ⚠️ Partial Use environment variables for secrets
A03: Injection ✅ Protected Input validation, parameterized queries
A04: Insecure Design ✅ Protected Defense in depth, fail-safe defaults
A05: Security Misconfiguration ✅ Protected Secure defaults, validation
A06: Vulnerable Components ✅ Protected Dependency scanning (Dependabot)
A07: Auth Failures N/A No authentication (local tool)
A08: Data Integrity Failures ✅ Protected Audit logging, input validation
A09: Logging Failures ✅ Protected Comprehensive audit logging
A10: SSRF ✅ Protected Private IP blocking, DNS validation

Future Enhancements

Planned Security Features

  1. Sandboxed Execution
  2. Docker container per agent
  3. Resource limits (CPU, memory, network)
  4. Read-only filesystem (except workspace)

  5. AI Safety Guardrails

  6. Prompt injection detection
  7. Output toxicity filtering
  8. Jailbreak attempt detection

  9. Advanced Audit

  10. Real-time anomaly detection
  11. ML-based threat detection
  12. Automated incident response

  13. Compliance

  14. SOC 2 compliance
  15. GDPR compliance
  16. HIPAA compliance (if needed)

See Also


Last Updated: November 7, 2025 Version: 2.3.0