Agent Guardrails¶
Security layers and safety controls for Nova AI agent system.
Table of Contents¶
- Overview
- Defense Layers
- Input Validation
- Approval Gates
- Access Control
- Audit Logging
- Threat Model
- Security Best Practices
Overview¶
Nova AI implements defense-in-depth security with multiple layers of protection to prevent malicious or accidental misuse of agent capabilities.
Security Philosophy¶
- Fail-Safe Defaults: Deny by default, explicit allow
- Least Privilege: Agents get minimum permissions needed
- Human Oversight: Critical operations require approval
- Defense in Depth: Multiple independent security layers
- Audit Everything: Comprehensive logging for accountability
Defense Layers¶
Layer 1: Input Validation
↓ (Malformed input rejected)
Layer 2: Permission System
↓ (Unauthorized tools blocked)
Layer 3: Approval Gates
↓ (Destructive ops require approval)
Layer 4: Execution Sandbox
↓ (Isolated execution environment)
Layer 5: Output Sanitization
↓ (XSS/injection prevention)
Layer 6: Audit Logging
↓ (All actions logged)
Layer 7: Rate Limiting
↓ (Prevents abuse)
Input Validation¶
1. Path Validation¶
Location: src/utils/path_security.py
Protections:
- Path traversal prevention (../, /etc/, etc.)
- Symlink validation (no links to forbidden dirs)
- Ownership verification (UID must match process)
- Length limits (max 4096 bytes - prevents DoS)
- Forbidden directory blocking (/etc, /sys, /proc, etc.)
Example:
from src.utils.path_security import validate_safe_path
# Safe path validation
def read_file(path: str):
# Validates against:
# - Path traversal (../)
# - Symlink attacks
# - System directories
# - Ownership mismatch
safe_path = validate_safe_path(
path,
allowed_base=project_root,
must_exist=True
)
with open(safe_path) as f:
return f.read()
Threat Prevented: Arbitrary file read/write
2. Command Injection Prevention¶
Location: src/orchestrator/security/validators.py
Protections:
- Shell metacharacter escaping (;, |, &&, etc.)
- Command whitelist validation
- Argument length limits
- No shell=True in subprocess calls
Example:
import shlex
import subprocess
def safe_command(command: str):
# Validate command is in whitelist
if not is_allowed_command(command):
raise SecurityError(f"Command not allowed: {command}")
# Escape arguments
safe_args = shlex.split(command)
# Execute without shell
subprocess.run(safe_args, shell=False)
Threat Prevented: Command injection, arbitrary code execution
3. SQL Injection Prevention¶
Best Practice: Always use parameterized queries
# ❌ VULNERABLE: String concatenation
cursor.execute(f"SELECT * FROM users WHERE id = '{user_id}'")
# ✅ SAFE: Parameterized query
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
Threat Prevented: SQL injection, data breach
4. XSS Prevention¶
Location: src/orchestrator/security/xss_protection.py
Protections:
- HTML entity encoding (<, >, &, ", ')
- JavaScript escape sequence handling
- Recursive sanitization of nested structures
- Depth limits (max 10 levels - prevents infinite recursion)
Example:
from src.orchestrator.security.xss_protection import sanitize_for_display
# Sanitize user input before display
user_input = "<script>alert('XSS')</script>"
safe_output = sanitize_for_display(user_input)
# Result: "<script>alert('XSS')</script>"
Threat Prevented: Cross-site scripting (XSS)
5. SSRF Prevention¶
Location: Claude Code built-in WebFetch tool
Nova AI uses Claude Code's built-in WebFetch/WebSearch tools which include SSRF protection: - Private IP range blocking - Cloud metadata endpoint blocking - Localhost blocking - DNS resolution validation - Scheme validation (http/https only)
Example:
def fetch_url(url: str) -> str:
# Parse URL
parsed = urlparse(url)
# Validate scheme
if parsed.scheme not in ("http", "https"):
raise SecurityError(f"Scheme not allowed: {parsed.scheme}")
# Resolve hostname
ip = socket.gethostbyname(parsed.hostname)
# Check if private IP
if is_private_ip(ip):
raise SecurityError(f"Access to private IP forbidden: {ip}")
# Safe to fetch
return requests.get(url).text
Threat Prevented: Server-side request forgery (SSRF), cloud metadata access
Approval Gates¶
Architecture¶
Location: src/orchestrator/security/approval_gate.py
Flow:
Agent requests tool execution
↓
Is tool require approval?
├─ Yes → Send approval request
│ ↓
│ User approves/denies
│ ↓
│ Execute or block
↓
└─ No → Execute immediately
Approval Categories¶
ALWAYS REQUIRE APPROVAL: - File deletion - Database modification - Network requests to external services - Bash commands (except whitelisted) - PR creation/merge - Deployment operations
NEVER REQUIRE APPROVAL: - File reads - Code analysis - Test execution (read-only) - Documentation generation
CONDITIONAL APPROVAL: - File writes (if overwriting existing) - Bash commands (if on whitelist) - API calls (if under rate limit)
Implementation¶
from src.orchestrator.security.approval_gate import ApprovalGate
gate = ApprovalGate()
# Check if approval needed
if gate.requires_approval(tool="Write", args={"path": "src/auth.py"}):
# Send approval request
approved = await gate.request_approval(
tool="Write",
args={"path": "src/auth.py", "content": "..."},
reason="Implementing authentication module"
)
if not approved:
raise PermissionError("User denied approval")
Approval UI¶
┌─────────────────────────────────────────────┐
│ Approval Required │
├─────────────────────────────────────────────┤
│ │
│ Tool: Write │
│ File: src/auth.py │
│ Action: Create new file │
│ │
│ Reason: Implementing authentication module │
│ │
│ Preview: │
│ ┌──────────────────────────────────────┐ │
│ │ def authenticate_user(...): │ │
│ │ """Authenticate user.""" │ │
│ │ ... │ │
│ └──────────────────────────────────────┘ │
│ │
│ [Approve] [Deny] [View Full] │
└─────────────────────────────────────────────┘
Access Control¶
Permission System¶
Agent Permissions (defined in .claude/agents/*.md):
---
name: implementer
tools:
- Read # Can read files
- Write # Can write files
- Edit # Can edit files
- Grep # Can search content
- Glob # Can find files
- Bash # Can run commands
---
Permission Enforcement:
def execute_tool(agent: str, tool: str, args: dict):
# Load agent configuration
agent_config = load_agent_config(agent)
# Check if tool allowed
if tool not in agent_config["tools"]:
raise PermissionError(
f"Agent '{agent}' not permitted to use tool '{tool}'"
)
# Execute tool
return tool_registry[tool](**args)
File System Restrictions¶
Allowed Directories:
- Project root and subdirectories
- Temp directory (/tmp/nova-ai-*)
- User-specified output directories
Forbidden Directories:
- /etc - System configuration
- /sys - System information
- /proc - Process information
- /dev - Device files
- /boot - Boot files
- User home outside project (prevents ~/.ssh access)
Implementation:
FORBIDDEN_DIRS = [
"/etc", "/sys", "/proc", "/dev", "/boot",
"/root", "/var/log", "/var/spool"
]
def is_allowed_path(path: Path) -> bool:
resolved = path.resolve()
for forbidden in FORBIDDEN_DIRS:
if resolved.is_relative_to(forbidden):
return False
return resolved.is_relative_to(PROJECT_ROOT)
Network Restrictions¶
Allowed: - Anthropic API (api.anthropic.com) - GitHub API (api.github.com) - LangFuse (langfuse.com) - Public documentation sites
Forbidden: - Private IP ranges (10.0.0.0/8, etc.) - Localhost (127.0.0.1) - Cloud metadata (169.254.169.254) - Internal network ranges
Audit Logging¶
What is Logged¶
Session Events: - Session creation/destruction - Agent switches - Session forks - Session compression
Tool Executions: - Tool name and arguments - Execution timestamp - User who approved (if required) - Execution result (success/failure) - Execution duration
Security Events: - Permission denials - Approval requests/responses - Validation failures - Rate limit hits
Cost Events: - Token usage per request - Cost per request - Cache hit rates - Model used
Log Format¶
{
"timestamp": "2025-11-07T18:30:00Z",
"event_type": "tool_execution",
"session_id": "sess_abc123",
"agent": "implementer",
"tool": "Write",
"args": {
"path": "src/auth.py",
"size_bytes": 1024
},
"approved_by": "user@example.com",
"result": "success",
"duration_ms": 150,
"security_checks": {
"path_validation": "passed",
"ownership_check": "passed",
"approval_required": true,
"approval_granted": true
}
}
Log Storage¶
Locations:
- Console: Real-time streaming (stderr)
- File: logs/audit.jsonl (JSON Lines format)
- LangFuse: Cloud tracing (optional)
- OpenTelemetry: Distributed tracing (optional)
Retention: - Local logs: 30 days - LangFuse: Per plan (typically 90 days) - Long-term: Export to S3/GCS for compliance
Threat Model¶
Threats Considered¶
| Threat | Mitigation | Layer |
|---|---|---|
| Path Traversal | validate_safe_path() | Input Validation |
| Command Injection | shlex.quote(), no shell=True | Input Validation |
| SQL Injection | Parameterized queries | Input Validation |
| XSS | HTML entity encoding | Output Sanitization |
| SSRF | Private IP blocking | Network Control |
| Unauthorized File Access | Ownership checks, forbidden dirs | Access Control |
| Privilege Escalation | Tool permissions per agent | Access Control |
| Data Exfiltration | Approval gates, audit logging | Approval + Audit |
| Resource Exhaustion | Rate limiting, timeout | Rate Limiting |
| Malicious Prompts | Approval for destructive ops | Approval Gates |
Attack Scenarios¶
Scenario 1: Malicious Prompt¶
Attack:
Defense:
1. Agent proposes Bash tool with rm -rf /etc
2. Path validation blocks /etc access (forbidden directory)
3. Approval gate triggered (if path validation bypassed)
4. User must explicitly approve deletion
5. Audit log records attempt
Result: Attack blocked by multiple layers
Scenario 2: Path Traversal¶
Attack:
Defense:
1. Path validation detects ../ sequence
2. Resolved path checked against forbidden directories
3. Access denied before file operation
4. Security event logged
Result: Attack blocked at input validation
Scenario 3: SSRF¶
Attack:
Defense: 1. URL parsed 2. IP address resolved (169.254.169.254) 3. Detected as cloud metadata endpoint 4. Request blocked before connection 5. Security event logged
Result: Attack blocked at network control
Security Best Practices¶
For Developers¶
1. Validate All Inputs
# Always validate paths
safe_path = validate_safe_path(user_path, allowed_base=project_root)
# Always escape commands
safe_args = shlex.split(user_command)
# Always parameterize SQL
cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
2. Use Approval Gates
# For destructive operations
if is_destructive(operation):
if not await request_approval(operation):
raise PermissionError("Operation denied")
3. Log Security Events
# Log all security-relevant events
logger.warning(
"Path validation failed",
extra={
"path": user_path,
"reason": "Path traversal attempt",
"user": current_user
}
)
4. Principle of Least Privilege
# Give agents minimum permissions needed
---
name: code-reviewer
tools:
- Read # Only read access
- Grep # Only search
- Glob # Only find files
# No Write, Edit, or Bash
---
For Users¶
1. Review Approval Requests - Read the full operation before approving - Verify file paths are correct - Check for unexpected arguments - Deny if uncertain
2. Monitor Audit Logs
3. Use Read-Only Agents
# For exploration, use code-reviewer (read-only)
/novaai-code-reviewer analyze security vulnerabilities
4. Set Up Alerts
# Alert on suspicious activity
# - Multiple permission denials
# - Path traversal attempts
# - Unusual file access patterns
Rate Limiting¶
Cost-Based Rate Limiting (src/orchestrator/hooks/cost_checker.py):
# Sliding window hourly limits
RATE_LIMITS = {
"aws_ops": {"calls": 10, "window": 3600}, # 10/hour
"docker_ops": {"calls": 20, "window": 3600}, # 20/hour
"web_search": {"calls": 50, "window": 3600}, # 50/hour
}
def check_rate_limit(tool: str):
if is_rate_limited(tool):
raise RateLimitError(f"Rate limit exceeded for {tool}")
Benefits: - Prevents runaway costs - Protects against abuse - Enforces usage quotas
Incident Response¶
Detection¶
Automated Detection: - Failed validation attempts (> 5/minute) - Permission denials (> 10/hour) - Rate limit hits (sustained) - Unusual file access patterns
Manual Review: - Daily audit log review - Weekly security event analysis - Monthly threat assessment
Response Procedure¶
1. Detection - Alert triggered or manual discovery
2. Assessment - Review audit logs - Identify attack vector - Assess impact
3. Containment - Block malicious agent/user - Disable affected tools - Isolate compromised sessions
4. Eradication - Fix vulnerability - Update validation rules - Deploy patches
5. Recovery - Restore from backup if needed - Re-enable tools - Resume normal operation
6. Lessons Learned - Document incident - Update threat model - Improve defenses
Security Testing¶
Unit Tests¶
# Test path validation
pytest tests/security/test_path_security.py
# Test command injection prevention
pytest tests/security/test_command_injection.py
# Test SSRF prevention
pytest tests/security/test_ssrf_prevention.py
# Test XSS prevention
pytest tests/security/test_xss_protection.py
Integration Tests¶
# Test approval gates
pytest tests/security/test_approval_gates.py
# Test permission system
pytest tests/security/test_permissions.py
# Test audit logging
pytest tests/security/test_audit_logging.py
Penetration Testing¶
Recommended Tests: 1. Path traversal attempts 2. Command injection attempts 3. SQL injection attempts 4. XSS injection attempts 5. SSRF attempts 6. Privilege escalation attempts
Compliance¶
OWASP Top 10¶
| Vulnerability | Status | Mitigation |
|---|---|---|
| A01: Broken Access Control | ✅ Protected | Permission system, approval gates |
| A02: Cryptographic Failures | ⚠️ Partial | Use environment variables for secrets |
| A03: Injection | ✅ Protected | Input validation, parameterized queries |
| A04: Insecure Design | ✅ Protected | Defense in depth, fail-safe defaults |
| A05: Security Misconfiguration | ✅ Protected | Secure defaults, validation |
| A06: Vulnerable Components | ✅ Protected | Dependency scanning (Dependabot) |
| A07: Auth Failures | N/A | No authentication (local tool) |
| A08: Data Integrity Failures | ✅ Protected | Audit logging, input validation |
| A09: Logging Failures | ✅ Protected | Comprehensive audit logging |
| A10: SSRF | ✅ Protected | Private IP blocking, DNS validation |
Future Enhancements¶
Planned Security Features¶
- Sandboxed Execution
- Docker container per agent
- Resource limits (CPU, memory, network)
-
Read-only filesystem (except workspace)
-
AI Safety Guardrails
- Prompt injection detection
- Output toxicity filtering
-
Jailbreak attempt detection
-
Advanced Audit
- Real-time anomaly detection
- ML-based threat detection
-
Automated incident response
-
Compliance
- SOC 2 compliance
- GDPR compliance
- HIPAA compliance (if needed)
See Also¶
- Architecture Overview - System architecture
- API Overview - API reference
- GitHub Security - Vulnerability reporting
Last Updated: November 7, 2025 Version: 2.3.0