Threat Model

This document maps OpenParallax's defenses to specific threats from industry-standard security frameworks. For each threat, we identify which mechanisms address it and honestly acknowledge gaps.

Referenced Frameworks

OWASP Top 10 (2021) — web application security risks
OWASP Top 10 for LLM Applications (2025) — risks specific to LLM-integrated systems
MITRE ATLAS — adversarial threat landscape for AI systems
CWE — common weakness enumeration (specific vulnerability types)

OWASP Top 10 for LLM Applications

LLM01: Prompt Injection

An attacker crafts input that causes the LLM to act as the attacker's proxy — executing unauthorized actions, leaking data, or overriding its instructions.

Defense	How it helps
Cognitive-executive separation	Even a fully jailbroken LLM can only propose actions; it cannot execute them
Shield 4-tier pipeline	Every proposed action is independently evaluated; the LLM cannot bypass evaluation
Tier 2 inverted context	The evaluator LLM has no agent identity, no conversation history — it cannot be injection-chained
Canary token verification	Prevents prompt injection from forging evaluator verdicts
Output sanitization	Explicit data boundaries reduce indirect injection via tool results
IFC	Even if injection succeeds in proposing an action, data flow controls prevent credential exfiltration

Gap: Social engineering of the human at Tier 3. If the attacker can craft a convincing approval prompt, the human may approve a harmful action. Mitigation: Tier 3 shows the raw action payload, not the LLM's summary.

LLM02: Sensitive Information Disclosure

The LLM reveals sensitive data — credentials, personal information, internal system details — through its responses or actions.

Defense	How it helps
IFC	Classifies data by sensitivity and blocks flows to external sinks
Kernel sandbox	Prevents the agent from reading files outside its workspace
Default denylist	Blocks access to credential directories (`~/.ssh`, `~/.aws`, etc.)
Secret redaction	Strips credential patterns from LLM output streams
OTR mode	Prevents sensitive conversations from persisting to disk

LLM06: Excessive Agency

The LLM takes actions beyond what the user intended — deleting files, modifying system config, sending unauthorized messages.

Defense	How it helps
Cognitive-executive separation	The agent cannot act; it can only propose
Shield pipeline	Every proposal is evaluated before execution
Protection layer	Hard-blocks modifications to identity, config, audit, and security files
Default denylist	Prevents access to system-critical paths
Chronicle snapshots	Destructive actions can be rolled back

LLM10: Unbounded Consumption

The system consumes excessive resources — token spend, compute, network, storage — due to LLM loops, recursive delegation, or misconfiguration.

Defense	How it helps
Rate limit	Caps actions per minute
Tier 2 daily budget	Caps expensive LLM evaluations per day
Sub-agent concurrency cap	Limits parallel delegation
Sub-agent timeout	Kills long-running sub-agents
Reasoning loop max rounds	Stops infinite tool-call loops
Crash restart budget	Prevents restart loops

OWASP Top 10 (2021) — Selected

A01: Broken Access Control

Defense	How it helps
Kernel sandbox	OS-level access control on the agent process
Default denylist	Path-level access control across the filesystem
Protection layer	File-level access control within the workspace
Agent auth token	Process-level authentication for gRPC connections

A03: Injection (CWE-77, CWE-78)

Defense	How it helps
Shield Tier 1 heuristics	Pattern-matches command injection signatures
Git flag injection prevention	`--` separator stops flag smuggling
SSRF protection	Blocks internal IP injection via HTTP requests

A08: Software and Data Integrity Failures

Defense	How it helps
Hash verifier	TOCTOU prevention via action hashing
Audit hash chain	Tamper-evident logging
Sealed security config	Digest-verified immutable config
Chronicle	Recoverable workspace state

A09: Security Logging and Monitoring Failures (CWE-778)

Defense	How it helps
Audit chain	28 event types covering the full pipeline lifecycle
Security dashboard	Real-time integrity canaries with alerting on non-zero values
Agent diagnostics	Exit code, signal, and stderr capture for crash investigation

A10: Server-Side Request Forgery (CWE-918)

Defense	How it helps
SafeHTTPClient	Blocks private/loopback/link-local IPs at dial time
Redirect re-validation	Re-checks IP on every redirect hop

MITRE ATLAS — Selected Techniques

T0051: LLM Prompt Injection

Covered by: cognitive-executive separation, Shield pipeline, Tier 2 inverted context, canary tokens, output sanitization. See LLM01 above.

T0054: LLM Jailbreak

Covered by: same defenses as T0051. The key insight is that jailbreaking the agent's LLM is insufficient — the agent still cannot execute actions, and Shield evaluates independently.

T0024: Exfiltration via ML Inference API

Covered by: kernel sandbox (network restrictions), IFC (blocks sensitive data → external sinks), SSRF protection (blocks internal IP access).

T0048: External Harms

Covered by: cognitive-executive separation (the agent cannot cause harm because it cannot act), Shield pipeline (the engine evaluates before acting), Tier 3 (human approval for high-risk actions).

Honest Gaps

Model-level data leakage. If the LLM provider retains prompts for training, sensitive data in prompts could leak through the model. Mitigation: use providers with data retention policies; OTR mode reduces persistent data.
Social engineering at Tier 3. A sophisticated attacker could craft an action that looks safe to a human reviewer but is harmful in context. Mitigation: Tier 3 shows raw payloads, not LLM summaries.
Side-channel timing attacks. An attacker could infer information about Shield's evaluation by measuring response times. Not currently mitigated.
Supply chain attacks on the LLM itself. If the LLM provider is compromised, the model could be instructed to generate harmful proposals at a rate that exhausts Shield's evaluation budget. Mitigation: budget limits, rate limits, and the structural impossibility of the agent executing directly.

Threat Model ​

Referenced Frameworks ​

OWASP Top 10 for LLM Applications ​

LLM01: Prompt Injection ​

LLM02: Sensitive Information Disclosure ​

LLM06: Excessive Agency ​

LLM10: Unbounded Consumption ​

OWASP Top 10 (2021) — Selected ​

A01: Broken Access Control ​

A03: Injection (CWE-77, CWE-78) ​

A08: Software and Data Integrity Failures ​

A09: Security Logging and Monitoring Failures (CWE-778) ​

A10: Server-Side Request Forgery (CWE-918) ​

MITRE ATLAS — Selected Techniques ​

T0051: LLM Prompt Injection ​

T0054: LLM Jailbreak ​

T0024: Exfiltration via ML Inference API ​

T0048: External Harms ​

Honest Gaps ​