Action Validation

These mechanisms evaluate every tool call proposed by the agent before it reaches execution. They form the gateway between "the LLM wants to do X" and "X actually happens."

Shield Pipeline (4 Tiers)

Threat: The LLM proposing harmful tool calls — reading sensitive files, executing destructive commands, exfiltrating data via HTTP, modifying system configuration.

Defense: Every tool call passes through a 4-tier evaluation pipeline. Tiers are structurally heterogeneous — defeating one does not defeat the others.

Tier	Name	Mechanism	Speed	Configurable
0	Policy Engine	YAML pattern matching (deny/verify/allow rules)	<1ms	Yes — `security/shield/*.yaml` presets
1	Heuristic Classifier	79-rule regex engine + optional ONNX DeBERTa model	<5ms	Partially — ONNX opt-in, heuristics always on
2	LLM Evaluator	Independent LLM with no agent state; sees action as JSON data	1–3s	Yes — model, budget, provider
3	Human Approval	Broadcasts to all connected channels; first response wins	Minutes	Yes — hourly cap, timeout

A financial analyst's agent proposing to read ~/tax_returns/2024.pdf passes through Tier 0 (no deny rule), Tier 1 (heuristic finds no destructive pattern), and proceeds. The same agent proposing to email that file triggers a Tier 2 evaluation and potentially Tier 3 human approval.

Relevant threats:

OWASP LLM01: Prompt Injection (attacker tricks agent into proposing harmful actions)
OWASP LLM06: Excessive Agency
MITRE ATLAS T0051: LLM Prompt Injection
CWE-77: Command Injection

Code: shield/ (full pipeline), shield/gateway.go (orchestrator), shield/tier0_policy.go, shield/tier1_*.go, shield/tier2_*.go

Non-negotiable (the pipeline exists and runs). Tunable (policy presets, evaluator model, budget). See Shield documentation for full reference.

Safe-Command Fast Path

Threat: Unnecessary latency on known-safe developer commands (git status, npm install, make, go build) that waste Tier 2 LLM tokens.

Defense: A curated, compiled-in allowlist of command prefixes bypasses all four Shield tiers and returns ALLOW with confidence 1.0. Single-statement commands only — anything with shell metacharacters (;, &, |, >, <, `, $(...)) falls through to normal evaluation. The allowlist excludes commands that take arbitrary path arguments (cat, ls, rm, cp, mv).

Relevant threats: Performance, not security — the allowlist only permits, never blocks.

Code: shield/safe_commands.go, platform/safe_commands_{unix,windows}.go

Non-negotiable. The allowlist is compiled into the binary and is not user-extensible.

Hardcoded Protection Layer

Threat: Modification of the agent's own identity, configuration, or security-critical files — either through direct LLM action or through a prompt injection that targets internal files.

Defense: A pre-Shield gate that enforces workspace boundaries and file-level protection. Runs before Tier 0 and cannot be bypassed:

File	Protection	Effect
`soul.md`, `identity.md`	Read-only	Agent can read for context; writes are blocked
`config.yaml`, `canary.token`, `audit.jsonl`, `openparallax.db`	Full block	No read, no write
`.openparallax/` directory	Full block	Entire internal directory sealed
`security/` directory	Full block	Shield and IFC policies sealed from agent writes
`skills/` directory	Read-only	Agent loads skills; cannot modify them

Relevant threats:

OWASP A08: Software and Data Integrity Failures
MITRE ATLAS T0042: Verify Attack (modifying verification mechanisms)

Code: internal/engine/protection.go

Non-negotiable.

Cross-Platform Default Denylist

Threat: The agent accessing credential files, SSH keys, cloud provider configs, or system secrets anywhere on disk — not just within the workspace.

Defense: A curated denylist of restricted and protected paths applies to any path the agent touches. The denylist runs after symlink resolution — a symlink in /tmp/safe.txt pointing at ~/.ssh/id_rsa is blocked.

Categories:

Restricted (no read, no write): ~/.ssh, ~/.aws, ~/.gnupg, ~/.docker, ~/.kube, /etc/shadow, credential files by extension (.pem, .key, .p12, etc.)
Protected (read OK, write blocked): shell rc files (.bashrc, .zshrc), VCS configs (.gitconfig, .npmrc), system reference files (/etc/hosts, /etc/passwd)

An operations engineer managing infrastructure can trust that even if the agent is asked to "read the AWS credentials to check the region," the denylist blocks ~/.aws/credentials before Shield even evaluates the request.

Relevant threats:

OWASP LLM02: Sensitive Information Disclosure
CWE-256: Plaintext Storage of Password
CWE-200: Information Exposure

Code: platform/denylist_{linux,darwin,windows}.go, internal/engine/protection.go

Non-negotiable. The denylist ships in the binary and is not user-extensible.

Information Flow Control (IFC)

Threat: Data at one sensitivity level (e.g., credentials, financial records, patient data) flowing to an action that would expose it to a lower-trust destination (e.g., HTTP request, email, shell command) — either directly, through an intermediate file, or via agent memory.

Defense: A YAML-driven policy with three enforcement layers:

Per-action classification — file paths are classified by policy source rules and a persistent activity table. The decision matrix maps sensitivity levels to sink categories.
Session taint — the highest sensitivity seen in a session propagates to all subsequent actions. An agent that reads .env (critical) cannot send email for the rest of the session, even though email actions have no file path to classify.
Content sensitivity tags — tool results carry sensitivity metadata through the LLM turn. Tags are inherited by subsequent tool proposals, closing within-turn propagation gaps.

Additionally, memory writes are gated by configurable memory_block_levels — when the session has seen data at a blocked sensitivity level, memory_write is rejected, preventing cross-session data laundering through the memory system.

The activity table provides cross-session persistence: when the agent writes classified data to a file, the destination inherits the classification. Future reads of that file — in any session — are classified from the table. Stale entries (deleted files) are managed via openparallax ifc sweep.

Three presets ship with OpenParallax:

Preset	Philosophy	Who it's for
`default`	Blocks credentials from external sinks; escalates restricted data; blocks memory writes for critical/restricted	Most users
`permissive`	Only blocks critical data; everything else flows freely	Trusted single-user workstations
`strict`	Blocks confidential+ data from writes, exec, and memory; default-deny	Regulated environments (healthcare, finance, legal)

See IFC reference for the full schema, worked examples, and CLI commands.

Relevant threats:

OWASP LLM02: Sensitive Information Disclosure
CWE-200: Information Exposure
CWE-668: Exposure of Resource to Wrong Sphere

Code: ifc/policy.go, internal/storage/ifc.go, security/ifc/*.yaml

The IFC subsystem is non-negotiable (it always runs). The policy is fully tunable via preset selection or custom YAML rules.

Hash Verifier (TOCTOU Prevention)

Threat: An action being modified between the time it's proposed (and evaluated by Shield) and the time it's executed — a time-of-check-to-time-of-use attack.

Defense: Every action is hashed (SHA-256 of tool name + arguments) at proposal time. Before execution, the hash is recomputed and compared. A mismatch blocks execution.

Relevant threats:

OWASP A08: Software and Data Integrity Failures
CWE-367: Time-of-Check Time-of-Use Race Condition

Code: internal/engine/verifier.go, crypto/hash.go

Non-negotiable.

Action Validation ​

Shield Pipeline (4 Tiers) ​

Safe-Command Fast Path ​

Hardcoded Protection Layer ​

Cross-Platform Default Denylist ​

Information Flow Control (IFC) ​

Hash Verifier (TOCTOU Prevention) ​

Action Validation

Shield Pipeline (4 Tiers)

Safe-Command Fast Path

Hardcoded Protection Layer

Cross-Platform Default Denylist

Information Flow Control (IFC)

Hash Verifier (TOCTOU Prevention)