Configuration Reference
Shield can be configured in three contexts: as part of OpenParallax (config.yaml), as a standalone binary (shield.yaml), or through environment variables. This page covers all configuration options, startup validation, provider setup, and operational guidance.
OpenParallax Integration (config.yaml)
When Shield runs inside OpenParallax, it is configured in the shield section of the workspace config.yaml:
shield:
# Path to the YAML policy file (relative to workspace root)
policy_file: security/shield/default.yaml
# Tier 2 LLM evaluator configuration
evaluator:
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
# ONNX classifier threshold (0.0 - 1.0)
# Higher = fewer blocks, more escalations
# Lower = more blocks, fewer escalations
onnx_threshold: 0.85
# Enable heuristic pattern matching
heuristic_enabled: true
# Block on errors (true) vs. allow with reduced confidence (false)
fail_closed: true
# Maximum Tier 2 evaluations per minute
rate_limit: 60
# Maximum Tier 2 evaluations per day
daily_budget: 100
# Verdict cache TTL in seconds
verdict_ttl: 300The canary token and evaluator prompt path are managed automatically by OpenParallax. The canary is generated at workspace initialization and the prompt is loaded from the embedded prompts/ directory.
Standalone Configuration (shield.yaml)
When Shield runs as a standalone binary (openparallax-shield serve), all configuration lives in shield.yaml:
# ── Server ──
listen: localhost:9090 # REST API listen address
grpc_listen: localhost:9091 # gRPC API listen address
# ── Policy ──
policy:
file: security/shield/default.yaml # Path to YAML policy file
# ── Classifier ──
classifier:
model_dir: ~/.openparallax/models/prompt-injection/ # ONNX model directory
threshold: 0.85 # INJECTION threshold
# ── Heuristic ──
heuristic:
enabled: true # Enable heuristic pattern matching
# ── Tier 2 Evaluator ──
evaluator:
provider: anthropic # LLM provider
model: claude-sonnet-4-6 # LLM model
api_key_env: ANTHROPIC_API_KEY # Env var for API key
base_url: # Custom base URL (Ollama, proxies)
# ── Security ──
canary_token: # Canary token (auto-generated if blank)
fail_closed: true # Block on errors
# ── Rate Limiting ──
rate_limit: 60 # Evaluations per minute
daily_budget: 100 # Tier 2 evaluations per day
verdict_ttl: 300 # Verdict cache TTL (seconds)
# ── MCP Proxy (optional) ──
mcp:
servers:
- name: filesystem
transport: stdio
command: npx
args: ["@modelcontextprotocol/server-filesystem", "/home/user"]
- name: remote
transport: streamable-http
url: https://mcp-server.example.com
tool_mapping: # MCP tool name → Shield action type
custom_tool: execute_command
# ── Audit ──
audit:
enabled: true # Enable audit logging
file: shield-audit.jsonl # Audit log file path
# ── Logging ──
log_level: info # debug, info, warn, error
log_file: shield.log # Log file (stdout if omitted)Environment Variable Overrides
Environment variables override configuration file values. They use the OP_SHIELD_ prefix:
| Variable | Description | Overrides |
|---|---|---|
OP_SHIELD_POLICY | Path to the YAML policy file | policy.file / shield.policy_file |
OP_SHIELD_CLASSIFIER_DIR | ONNX model directory | classifier.model_dir |
OP_SHIELD_THRESHOLD | ONNX INJECTION confidence threshold | classifier.threshold / shield.onnx_threshold |
OP_SHIELD_FAIL_CLOSED | true or false | fail_closed / shield.fail_closed |
OP_SHIELD_DAILY_BUDGET | Daily Tier 2 evaluation budget | daily_budget / shield.daily_budget |
OP_SHIELD_RATE_LIMIT | Evaluations per minute | rate_limit / shield.rate_limit |
OP_SHIELD_LOG_LEVEL | Log level | log_level |
The API key environment variable names (e.g., ANTHROPIC_API_KEY, OPENAI_API_KEY) are configured in the evaluator.api_key_env field -- they are not Shield-specific variables.
Example
# Override the policy file and threshold via environment
export OP_SHIELD_POLICY=/opt/shield/security/shield/strict.yaml
export OP_SHIELD_THRESHOLD=0.90
export OP_SHIELD_DAILY_BUDGET=200
export ANTHROPIC_API_KEY=sk-ant-...
openparallax-shield serve --config shield.yamlConfiguration Precedence
Configuration values are resolved in this order (highest priority first):
- Environment variables (
OP_SHIELD_*) - CLI flags (
--port,--config) - Configuration file (
shield.yamlorconfig.yaml) - Defaults
Separate Providers for Chat and Shield
It is recommended to use different LLM providers or models for the chat conversation and the Shield Tier 2 evaluator. This provides security diversity -- if an attack is crafted to exploit a specific model's weaknesses, a different model in the evaluator may catch it.
models:
- name: chat
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
- name: shield
provider: openai
model: gpt-5.4
api_key_env: OPENAI_API_KEY
roles:
chat: chat
shield: shieldBoth providers must have valid API keys set. The Shield evaluator only runs for actions that escalate to Tier 2, so the cost is proportional to the number of escalations, not the number of total actions.
You can also use the same provider with a different model:
models:
- name: chat
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
- name: shield
provider: anthropic
model: claude-opus-4-20250514 # More capable model for security decisions
api_key_env: ANTHROPIC_API_KEY
roles:
chat: chat
shield: shieldClassifier Setup
The ONNX DeBERTa classifier runs locally for Tier 1 evaluation. It detects prompt injection patterns in action payloads.
Installing the Classifier Model
# Removed — see roadmap for sidecarThis downloads:
model.onnx-- the DeBERTa v3 model fine-tuned for prompt injection detectiontokenizer.json-- the tokenizer configuration- The ONNX Runtime library for your platform
Files are placed in ~/.openparallax/models/prompt-injection/.
Verifying the Classifier
After installation, verify the model loads correctly:
openparallax doctorThe doctor check reports:
- Whether the model files exist at the expected path
- Whether the ONNX Runtime library loads correctly
- Whether inference produces valid output on a test input
Classifier Configuration
| Field | Type | Default | Description |
|---|---|---|---|
onnx_threshold / classifier.threshold | float64 | 0.85 | Confidence threshold for the INJECTION label. Scores >= threshold trigger BLOCK; scores below trigger ESCALATE to Tier 2. |
classifier.model_dir | string | ~/.openparallax/models/prompt-injection/ | Directory containing model.onnx, tokenizer.json, and the ONNX Runtime library. |
Heuristic-Only Mode
If the ONNX model is not installed, Tier 1 operates in heuristic-only mode. The heuristic engine (regex pattern matching for known attack signatures) runs alone, without the DeBERTa classifier. A one-time warning is logged at startup:
WARN: ONNX classifier not found, running heuristic-only modeThis is a valid configuration for environments where:
- The ONNX model cannot be downloaded (air-gapped networks)
- The ONNX Runtime is not available for the platform
- You want to minimize memory usage (the DeBERTa model uses ~300MB RAM)
Heuristic-only mode provides less coverage than the full DualClassifier but still catches known attack patterns.
Evaluator Setup
The Tier 2 LLM evaluator uses a separate LLM to reason about whether an action is safe in context.
Configuration Fields
| Field | Type | Default | Description |
|---|---|---|---|
evaluator.provider | string | -- | LLM provider: anthropic, openai, google, ollama. Omit to disable Tier 2. |
evaluator.model | string | -- | Model name (e.g., claude-sonnet-4-6, gpt-5.4, gemini-3.1-pro). |
evaluator.api_key_env | string | -- | Name of the environment variable containing the API key. |
evaluator.base_url | string | -- | Custom base URL for the provider (e.g., http://localhost:11434 for Ollama). |
Canary Token
| Field | Type | Default | Description |
|---|---|---|---|
canary_token | string | auto-generated | Random token embedded in the evaluator prompt to detect injection. Auto-generated at workspace init (OpenParallax) or first run (standalone). Must be random and unpredictable. |
Evaluator Prompt
The Tier 2 evaluator prompt is compiled into the binary and does not exist on disk. No configuration field is needed.
Daily Budget and Rate Limiting
| Field | Type | Default | Description |
|---|---|---|---|
rate_limit | int | 60 | Maximum Shield evaluations per minute. Uses token bucket algorithm. Applies to all tiers. |
daily_budget | int | 100 | Maximum Tier 2 (LLM evaluator) evaluations per day. Resets at midnight server local time. When exhausted, actions that would escalate to Tier 2 are blocked (fail-closed) or allowed with reduced confidence (fail-open). |
verdict_ttl | int | 300 | How long a verdict is valid in seconds. The same action hash returns the cached verdict within the TTL, avoiding duplicate evaluations for identical actions. |
Disabling Tier 2
To run without the LLM evaluator (Tier 0 + Tier 1 only), omit the evaluator section entirely:
shield:
policy_file: security/shield/default.yaml
heuristic_enabled: true
onnx_threshold: 0.85
fail_closed: trueActions that would escalate to Tier 2 will be blocked (fail-closed) or allowed with reduced confidence (fail-open), depending on the fail_closed setting.
Security Configuration
| Field | Type | Default | Description |
|---|---|---|---|
fail_closed | bool | true | When true, any error in the pipeline (classifier failure, evaluator timeout, parse error, missing canary) results in BLOCK. When false, errors result in ALLOW with reduced confidence (0.5). |
DANGER
Setting fail_closed: false weakens security. In fail-open mode, a crashed classifier or unreachable evaluator silently allows actions through. Only use fail-open in development environments where blocking would disrupt testing workflows.
Startup Validation
Shield validates its configuration at startup. Missing or invalid configuration produces specific errors:
| Condition | Behavior |
|---|---|
| Missing policy file | Fatal error. Shield cannot start without a policy. |
| Invalid YAML in policy file | Fatal error. The policy cannot be parsed. |
| Invalid glob pattern in a policy rule | Warning log. The pattern is skipped, other patterns still work. |
| Missing evaluator prompt (when evaluator is configured) | Fatal error. Tier 2 cannot function without its prompt. |
| Missing evaluator API key (when evaluator is configured) | Fatal error. The evaluator cannot authenticate. |
| Missing ONNX model | Warning. Tier 1 runs in heuristic-only mode. |
| Missing canary token (when evaluator is configured) | Auto-generated. A new canary token is created and saved. |
Invalid onnx_threshold (outside 0.0-1.0) | Fatal error. |
Complete Configuration Fields Reference
Policy Configuration
| Field | Type | Default | Description |
|---|---|---|---|
policy_file / policy.file | string | required | Path to the YAML policy file. Relative paths are resolved from the working directory (OpenParallax: workspace root). |
Classifier Configuration
| Field | Type | Default | Description |
|---|---|---|---|
onnx_threshold / classifier.threshold | float64 | 0.85 | Confidence threshold for the INJECTION label. |
classifier.model_dir | string | ~/.openparallax/models/prompt-injection/ | Directory containing the ONNX model and tokenizer. |
heuristic_enabled | bool | true | Enable the heuristic pattern matching engine in Tier 1. |
Evaluator Configuration
| Field | Type | Default | Description |
|---|---|---|---|
evaluator.provider | string | -- | LLM provider for Tier 2. Omit to disable Tier 2. |
evaluator.model | string | -- | Model name. |
evaluator.api_key_env | string | -- | Environment variable containing the API key. |
evaluator.base_url | string | -- | Custom base URL (Ollama, proxies, Azure). |
canary_token | string | auto-generated | Token for evaluator response verification. |
Security Configuration
| Field | Type | Default | Description |
|---|---|---|---|
fail_closed | bool | true | Block on any pipeline error. |
Rate Limiting Configuration
| Field | Type | Default | Description |
|---|---|---|---|
rate_limit | int | 60 | Maximum evaluations per minute. |
daily_budget | int | 100 | Maximum Tier 2 evaluations per day. |
verdict_ttl | int | 300 | Verdict cache TTL in seconds. |
Server Configuration (Standalone Only)
| Field | Type | Default | Description |
|---|---|---|---|
listen | string | localhost:9090 | REST API listen address. |
grpc_listen | string | localhost:9091 | gRPC API listen address. |
log_level | string | info | Logging level: debug, info, warn, error. |
log_file | string | -- | Log file path. Stdout if omitted. |
MCP Proxy Configuration (Standalone Only)
| Field | Type | Default | Description |
|---|---|---|---|
mcp.servers | list | -- | Upstream MCP servers. |
mcp.servers[].name | string | required | Unique server name. |
mcp.servers[].transport | string | required | stdio or streamable-http. |
mcp.servers[].command | string | -- | Command to run (stdio). |
mcp.servers[].args | list | -- | Command arguments (stdio). |
mcp.servers[].env | map | -- | Environment variables (stdio). Supports ${ENV_VAR} substitution. |
mcp.servers[].url | string | -- | Server URL (streamable-http). |
mcp.servers[].headers | map | -- | HTTP headers (streamable-http). Supports ${ENV_VAR} substitution. |
mcp.tool_mapping | map | -- | Custom MCP tool name to Shield action type mapping. |
Audit Configuration (Standalone Only)
| Field | Type | Default | Description |
|---|---|---|---|
audit.enabled | bool | false | Enable audit logging. |
audit.file | string | shield-audit.jsonl | Audit log file path. |
Minimal Configurations
Tier 0 Only (Policy Matching)
The smallest useful configuration. Policy matching only, no ML, no LLM:
shield:
policy_file: security/shield/default.yaml
heuristic_enabled: false
fail_closed: trueTier 0 + Tier 1 (Policy + Classifier)
Adds heuristic pattern matching. Optionally install the ONNX model for the DeBERTa classifier:
shield:
policy_file: security/shield/default.yaml
heuristic_enabled: true
onnx_threshold: 0.85
fail_closed: trueFull Pipeline (All Three Tiers)
Complete security pipeline with policy, classifier, and LLM evaluator:
shield:
policy_file: security/shield/default.yaml
heuristic_enabled: true
onnx_threshold: 0.85
fail_closed: true
rate_limit: 60
daily_budget: 100
evaluator:
provider: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEYTuning Guide
False Positives Too High
If legitimate actions are being blocked or escalated too often:
- Increase
onnx_thresholdfrom 0.85 to 0.90 or 0.95 (fewer classifier blocks) - Switch to
permissive.yamlpolicy (fewer deny rules) - Add explicit
allowrules for the action patterns being falsely blocked - Review the audit log to identify which rules or classifier patterns cause false positives
Security Too Loose
If too many actions are passing without evaluation:
- Switch to
strict.yamlpolicy (more deny rules, higher tier overrides) - Lower
onnx_thresholdto 0.80 (more aggressive classifier) - Set
fail_closed: true(block on errors instead of allowing) - Add
verifyrules withtier_override: 2for sensitive operations - Add
denyrules for paths and action types that should never be allowed
Tier 2 Costs Too High
If the LLM evaluator costs are a concern:
- Increase
onnx_threshold(fewer escalations from Tier 1 to Tier 2) - Lower
daily_budgetto cap daily costs - Use a cheaper model for the evaluator (e.g.,
claude-haiku-4-5-20251001instead ofclaude-sonnet-4-6) - Add more
allowanddenyrules to Tier 0 so fewer actions reach Tier 2 - Increase
verdict_ttlto cache verdicts longer (reduces repeat evaluations)
Latency Too High
If Shield evaluation is slowing down the agent:
- Add
allowrules for common safe operations to keep them at Tier 0 (microseconds) - Increase
verdict_ttlto cache more verdicts - Disable Tier 2 for non-critical deployments (Tier 0 + Tier 1 only)
- If using the classifier sidecar, ensure it is co-located or on a low-latency network path
Next Steps
- Policy Syntax -- full policy file reference
- Tier 0 -- Policy -- how policy matching works
- Tier 1 -- Classifier -- classifier model details
- Tier 2 -- Evaluator -- LLM evaluator details
- ONNX Classifier -- classifier model deep dive
- Standalone binary -- running Shield as a service