Agent Guardrails¶

Security layers and safety controls for Nova AI agent system.

Table of Contents¶

Overview
Defense Layers
Input Validation
Approval Gates
Access Control
Audit Logging
Threat Model
Security Best Practices

Overview¶

Nova AI implements defense-in-depth security with multiple layers of protection to prevent malicious or accidental misuse of agent capabilities.

Security Philosophy¶

Fail-Safe Defaults: Deny by default, explicit allow
Least Privilege: Agents get minimum permissions needed
Human Oversight: Critical operations require approval
Defense in Depth: Multiple independent security layers
Audit Everything: Comprehensive logging for accountability

Defense Layers¶

Layer 1: Input Validation
   ↓ (Malformed input rejected)
Layer 2: Permission System
   ↓ (Unauthorized tools blocked)
Layer 3: Approval Gates
   ↓ (Destructive ops require approval)
Layer 4: Execution Sandbox
   ↓ (Isolated execution environment)
Layer 5: Output Sanitization
   ↓ (XSS/injection prevention)
Layer 6: Audit Logging
   ↓ (All actions logged)
Layer 7: Rate Limiting
   ↓ (Prevents abuse)

Input Validation¶

1. Path Validation¶

Location: src/utils/path_security.py

Protections: - Path traversal prevention (../, /etc/, etc.) - Symlink validation (no links to forbidden dirs) - Ownership verification (UID must match process) - Length limits (max 4096 bytes - prevents DoS) - Forbidden directory blocking (/etc, /sys, /proc, etc.)

Example:

from src.utils.path_security import validate_safe_path

# Safe path validation
def read_file(path: str):
    # Validates against:
    # - Path traversal (../)
    # - Symlink attacks
    # - System directories
    # - Ownership mismatch
    safe_path = validate_safe_path(
        path,
        allowed_base=project_root,
        must_exist=True
    )

    with open(safe_path) as f:
        return f.read()

Threat Prevented: Arbitrary file read/write

2. Command Injection Prevention¶

Location: src/orchestrator/security/validators.py

Protections: - Shell metacharacter escaping (;, |, &&, etc.) - Command whitelist validation - Argument length limits - No shell=True in subprocess calls

Example:

import shlex
import subprocess

def safe_command(command: str):
    # Validate command is in whitelist
    if not is_allowed_command(command):
        raise SecurityError(f"Command not allowed: {command}")

    # Escape arguments
    safe_args = shlex.split(command)

    # Execute without shell
    subprocess.run(safe_args, shell=False)

Threat Prevented: Command injection, arbitrary code execution

3. SQL Injection Prevention¶

Best Practice: Always use parameterized queries

# ❌ VULNERABLE: String concatenation
cursor.execute(f"SELECT * FROM users WHERE id = '{user_id}'")

# ✅ SAFE: Parameterized query
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

Threat Prevented: SQL injection, data breach

4. XSS Prevention¶

Location: src/orchestrator/security/xss_protection.py

Protections: - HTML entity encoding (<, >, &, ", ') - JavaScript escape sequence handling - Recursive sanitization of nested structures - Depth limits (max 10 levels - prevents infinite recursion)

Example:

from src.orchestrator.security.xss_protection import sanitize_for_display

# Sanitize user input before display
user_input = "<script>alert('XSS')</script>"
safe_output = sanitize_for_display(user_input)
# Result: "&lt;script&gt;alert(&#x27;XSS&#x27;)&lt;/script&gt;"

Threat Prevented: Cross-site scripting (XSS)

5. SSRF Prevention¶

Location: Claude Code built-in WebFetch tool

Nova AI uses Claude Code's built-in WebFetch/WebSearch tools which include SSRF protection: - Private IP range blocking - Cloud metadata endpoint blocking - Localhost blocking - DNS resolution validation - Scheme validation (http/https only)

Example:

def fetch_url(url: str) -> str:
    # Parse URL
    parsed = urlparse(url)

    # Validate scheme
    if parsed.scheme not in ("http", "https"):
        raise SecurityError(f"Scheme not allowed: {parsed.scheme}")

    # Resolve hostname
    ip = socket.gethostbyname(parsed.hostname)

    # Check if private IP
    if is_private_ip(ip):
        raise SecurityError(f"Access to private IP forbidden: {ip}")

    # Safe to fetch
    return requests.get(url).text

Threat Prevented: Server-side request forgery (SSRF), cloud metadata access

Approval Gates¶

Architecture¶

Location: src/orchestrator/security/approval_gate.py

Flow:

Agent requests tool execution
   ↓
Is tool require approval?
   ├─ Yes → Send approval request
   │         ↓
   │      User approves/denies
   │         ↓
   │      Execute or block
   ↓
   └─ No → Execute immediately

Approval Categories¶

ALWAYS REQUIRE APPROVAL: - File deletion - Database modification - Network requests to external services - Bash commands (except whitelisted) - PR creation/merge - Deployment operations

NEVER REQUIRE APPROVAL: - File reads - Code analysis - Test execution (read-only) - Documentation generation

CONDITIONAL APPROVAL: - File writes (if overwriting existing) - Bash commands (if on whitelist) - API calls (if under rate limit)

Implementation¶

from src.orchestrator.security.approval_gate import ApprovalGate

gate = ApprovalGate()

# Check if approval needed
if gate.requires_approval(tool="Write", args={"path": "src/auth.py"}):
    # Send approval request
    approved = await gate.request_approval(
        tool="Write",
        args={"path": "src/auth.py", "content": "..."},
        reason="Implementing authentication module"
    )

    if not approved:
        raise PermissionError("User denied approval")

Approval UI¶

┌─────────────────────────────────────────────┐
│ Approval Required                            │
├─────────────────────────────────────────────┤
│                                              │
│ Tool: Write                                  │
│ File: src/auth.py                            │
│ Action: Create new file                      │
│                                              │
│ Reason: Implementing authentication module   │
│                                              │
│ Preview:                                     │
│ ┌──────────────────────────────────────┐   │
│ │ def authenticate_user(...):          │   │
│ │     """Authenticate user."""         │   │
│ │     ...                              │   │
│ └──────────────────────────────────────┘   │
│                                              │
│ [Approve]  [Deny]  [View Full]              │
└─────────────────────────────────────────────┘

Access Control¶

Permission System¶

Agent Permissions (defined in .claude/agents/*.md):

---
name: implementer
tools:
  - Read      # Can read files
  - Write     # Can write files
  - Edit      # Can edit files
  - Grep      # Can search content
  - Glob      # Can find files
  - Bash      # Can run commands
---

Permission Enforcement:

def execute_tool(agent: str, tool: str, args: dict):
    # Load agent configuration
    agent_config = load_agent_config(agent)

    # Check if tool allowed
    if tool not in agent_config["tools"]:
        raise PermissionError(
            f"Agent '{agent}' not permitted to use tool '{tool}'"
        )

    # Execute tool
    return tool_registry[tool](**args)

File System Restrictions¶

Allowed Directories: - Project root and subdirectories - Temp directory (/tmp/nova-ai-*) - User-specified output directories

Forbidden Directories: - /etc - System configuration - /sys - System information - /proc - Process information - /dev - Device files - /boot - Boot files - User home outside project (prevents ~/.ssh access)

Implementation:

FORBIDDEN_DIRS = [
    "/etc", "/sys", "/proc", "/dev", "/boot",
    "/root", "/var/log", "/var/spool"
]

def is_allowed_path(path: Path) -> bool:
    resolved = path.resolve()

    for forbidden in FORBIDDEN_DIRS:
        if resolved.is_relative_to(forbidden):
            return False

    return resolved.is_relative_to(PROJECT_ROOT)

Network Restrictions¶

Allowed: - Anthropic API (api.anthropic.com) - GitHub API (api.github.com) - LangFuse (langfuse.com) - Public documentation sites

Forbidden: - Private IP ranges (10.0.0.0/8, etc.) - Localhost (127.0.0.1) - Cloud metadata (169.254.169.254) - Internal network ranges

Audit Logging¶

What is Logged¶

Session Events: - Session creation/destruction - Agent switches - Session forks - Session compression

Tool Executions: - Tool name and arguments - Execution timestamp - User who approved (if required) - Execution result (success/failure) - Execution duration

Security Events: - Permission denials - Approval requests/responses - Validation failures - Rate limit hits

Cost Events: - Token usage per request - Cost per request - Cache hit rates - Model used

Log Format¶

{
  "timestamp": "2025-11-07T18:30:00Z",
  "event_type": "tool_execution",
  "session_id": "sess_abc123",
  "agent": "implementer",
  "tool": "Write",
  "args": {
    "path": "src/auth.py",
    "size_bytes": 1024
  },
  "approved_by": "user@example.com",
  "result": "success",
  "duration_ms": 150,
  "security_checks": {
    "path_validation": "passed",
    "ownership_check": "passed",
    "approval_required": true,
    "approval_granted": true
  }
}

Log Storage¶

Locations: - Console: Real-time streaming (stderr) - File: logs/audit.jsonl (JSON Lines format) - LangFuse: Cloud tracing (optional) - OpenTelemetry: Distributed tracing (optional)

Retention: - Local logs: 30 days - LangFuse: Per plan (typically 90 days) - Long-term: Export to S3/GCS for compliance

Threat Model¶

Threats Considered¶

Threat	Mitigation	Layer
Path Traversal	validate_safe_path()	Input Validation
Command Injection	shlex.quote(), no shell=True	Input Validation
SQL Injection	Parameterized queries	Input Validation
XSS	HTML entity encoding	Output Sanitization
SSRF	Private IP blocking	Network Control
Unauthorized File Access	Ownership checks, forbidden dirs	Access Control
Privilege Escalation	Tool permissions per agent	Access Control
Data Exfiltration	Approval gates, audit logging	Approval + Audit
Resource Exhaustion	Rate limiting, timeout	Rate Limiting
Malicious Prompts	Approval for destructive ops	Approval Gates

Attack Scenarios¶

Scenario 1: Malicious Prompt¶

Attack:

User: "Delete all files in /etc"

Defense: 1. Agent proposes Bash tool with rm -rf /etc 2. Path validation blocks /etc access (forbidden directory) 3. Approval gate triggered (if path validation bypassed) 4. User must explicitly approve deletion 5. Audit log records attempt

Result: Attack blocked by multiple layers

Scenario 2: Path Traversal¶

Attack:

User: "Read ../../etc/passwd"

Defense: 1. Path validation detects ../ sequence 2. Resolved path checked against forbidden directories 3. Access denied before file operation 4. Security event logged

Result: Attack blocked at input validation

Scenario 3: SSRF¶

Attack:

User: "Fetch http://169.254.169.254/latest/meta-data/"

Defense: 1. URL parsed 2. IP address resolved (169.254.169.254) 3. Detected as cloud metadata endpoint 4. Request blocked before connection 5. Security event logged

Result: Attack blocked at network control

Security Best Practices¶

For Developers¶

1. Validate All Inputs

# Always validate paths
safe_path = validate_safe_path(user_path, allowed_base=project_root)

# Always escape commands
safe_args = shlex.split(user_command)

# Always parameterize SQL
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))

2. Use Approval Gates

# For destructive operations
if is_destructive(operation):
    if not await request_approval(operation):
        raise PermissionError("Operation denied")

3. Log Security Events

# Log all security-relevant events
logger.warning(
    "Path validation failed",
    extra={
        "path": user_path,
        "reason": "Path traversal attempt",
        "user": current_user
    }
)

4. Principle of Least Privilege

# Give agents minimum permissions needed
---
name: code-reviewer
tools:
  - Read      # Only read access
  - Grep      # Only search
  - Glob      # Only find files
  # No Write, Edit, or Bash
---

For Users¶

1. Review Approval Requests - Read the full operation before approving - Verify file paths are correct - Check for unexpected arguments - Deny if uncertain

2. Monitor Audit Logs

# Review recent security events
grep "security_event" logs/audit.jsonl | tail -20

3. Use Read-Only Agents

# For exploration, use code-reviewer (read-only)
/novaai-code-reviewer analyze security vulnerabilities

4. Set Up Alerts

# Alert on suspicious activity
# - Multiple permission denials
# - Path traversal attempts
# - Unusual file access patterns

Rate Limiting¶

Cost-Based Rate Limiting (src/orchestrator/hooks/cost_checker.py):

# Sliding window hourly limits
RATE_LIMITS = {
    "aws_ops": {"calls": 10, "window": 3600},  # 10/hour
    "docker_ops": {"calls": 20, "window": 3600},  # 20/hour
    "web_search": {"calls": 50, "window": 3600},  # 50/hour
}

def check_rate_limit(tool: str):
    if is_rate_limited(tool):
        raise RateLimitError(f"Rate limit exceeded for {tool}")

Benefits: - Prevents runaway costs - Protects against abuse - Enforces usage quotas

Incident Response¶

Detection¶

Automated Detection: - Failed validation attempts (> 5/minute) - Permission denials (> 10/hour) - Rate limit hits (sustained) - Unusual file access patterns

Manual Review: - Daily audit log review - Weekly security event analysis - Monthly threat assessment

Response Procedure¶

1. Detection - Alert triggered or manual discovery

2. Assessment - Review audit logs - Identify attack vector - Assess impact

3. Containment - Block malicious agent/user - Disable affected tools - Isolate compromised sessions

4. Eradication - Fix vulnerability - Update validation rules - Deploy patches

5. Recovery - Restore from backup if needed - Re-enable tools - Resume normal operation

6. Lessons Learned - Document incident - Update threat model - Improve defenses

Security Testing¶

Unit Tests¶

# Test path validation
pytest tests/security/test_path_security.py

# Test command injection prevention
pytest tests/security/test_command_injection.py

# Test SSRF prevention
pytest tests/security/test_ssrf_prevention.py

# Test XSS prevention
pytest tests/security/test_xss_protection.py

Integration Tests¶

# Test approval gates
pytest tests/security/test_approval_gates.py

# Test permission system
pytest tests/security/test_permissions.py

# Test audit logging
pytest tests/security/test_audit_logging.py

Penetration Testing¶

Recommended Tests: 1. Path traversal attempts 2. Command injection attempts 3. SQL injection attempts 4. XSS injection attempts 5. SSRF attempts 6. Privilege escalation attempts

Compliance¶

OWASP Top 10¶

Vulnerability	Status	Mitigation
A01: Broken Access Control	✅ Protected	Permission system, approval gates
A02: Cryptographic Failures	⚠️ Partial	Use environment variables for secrets
A03: Injection	✅ Protected	Input validation, parameterized queries
A04: Insecure Design	✅ Protected	Defense in depth, fail-safe defaults
A05: Security Misconfiguration	✅ Protected	Secure defaults, validation
A06: Vulnerable Components	✅ Protected	Dependency scanning (Dependabot)
A07: Auth Failures	N/A	No authentication (local tool)
A08: Data Integrity Failures	✅ Protected	Audit logging, input validation
A09: Logging Failures	✅ Protected	Comprehensive audit logging
A10: SSRF	✅ Protected	Private IP blocking, DNS validation

Future Enhancements¶

Planned Security Features¶

Sandboxed Execution
Docker container per agent
Resource limits (CPU, memory, network)
Read-only filesystem (except workspace)
AI Safety Guardrails
Prompt injection detection
Output toxicity filtering
Jailbreak attempt detection
Advanced Audit
Real-time anomaly detection
ML-based threat detection
Automated incident response
Compliance
SOC 2 compliance
GDPR compliance
HIPAA compliance (if needed)

Agent Guardrails¶

Table of Contents¶

Overview¶

Security Philosophy¶

Defense Layers¶

Input Validation¶

1. Path Validation¶

2. Command Injection Prevention¶

3. SQL Injection Prevention¶

4. XSS Prevention¶

5. SSRF Prevention¶

Approval Gates¶

Architecture¶

Approval Categories¶

Implementation¶

Approval UI¶

Access Control¶

Permission System¶

File System Restrictions¶

Network Restrictions¶

Audit Logging¶

What is Logged¶

Log Format¶

Log Storage¶

Threat Model¶

Threats Considered¶

Attack Scenarios¶

Scenario 1: Malicious Prompt¶

Scenario 2: Path Traversal¶

Scenario 3: SSRF¶

Security Best Practices¶

For Developers¶

For Users¶

Rate Limiting¶

Incident Response¶

Detection¶

Response Procedure¶

Security Testing¶

Unit Tests¶

Integration Tests¶

Penetration Testing¶

Compliance¶

OWASP Top 10¶

Future Enhancements¶

Planned Security Features¶

See Also¶