PRAGATIX’s AI Firewall inspects prompts and responses in real time, detects jailbreak/prompt-injection attempts, prevents system-prompt leakage, applies DLP/redaction and routing policies, and provides centralized auditing and governance.
This page describes how Pragatix helps organizations reduce risk from prompt injection, jailbreak attempts, and sensitive prompt/response leakage when employees use public AI services (ChatGPT, Copilot, Gemini, Claude, etc.) and private/enterprise LLMs (including via the Pragatix Gateway).
What We Protect Against
Prompt injection and jailbreaks are attempts to manipulate an AI system into ignoring its intended behavior. Common goals include:
Extracting hidden instructions (system prompts, internal policies, tool instructions)
Bypassing safety boundaries to obtain restricted output
Coercing the model to reveal sensitive enterprise data
Inducing unsafe tool use or unsafe external actions
Exfiltrating data via encoding/obfuscation (e.g., base64-like payloads, delimiter tricks)
Where Protection Runs
Pragatix enforces these protections in the AI Firewall layer, which is deployed as an in‑path control point:
Forward proxy / PAC routing (network-level interception)
Proxy chaining (corporate proxy → PRAGATIX AI Firewall)
Browser extension mode (web UI capture)
API mode (applications submit prompts/responses to the firewall API)
This provides a single source of truth for policy and reduces “split brain” enforcement across multiple components.
How Protection Works (High Level)
PRAGATIX applies real-time policy enforcement on both:
User prompts before they leave the organization
AI responses before they reach the user
The firewall evaluates both content and context signals (identity, application, destination AI service, device/network posture, and policy configuration).
Input Guardrails (Prompt Inspection)
Before a prompt is forwarded to an AI service, the firewall can:
Detect injection/jailbreak patterns using configurable detection rules (e.g., regex/keyword/heuristics)
Classify and inspect for sensitive data (DLP/classification/risk scoring)
Enforce action based on policy:
Allow (log-only)
Warn (allow but notify/user guidance)
Redact/Mask sensitive segments (send a safer prompt externally)
Block with a friendly explanation
Route to a safer destination (e.g., private model) based on data sensitivity
Output Guardrails (Response Inspection)
Before a response is shown to the user, the firewall can:
Detect leakage indicators, including:
Potential disclosure of hidden instructions or internal prompt templates
“Jailbreak acknowledgement” style responses (signals the model was coerced)
Sensitive content that violates policy or data classification rules
Enforce action:
Allow (log-only)
Warn
Block with a generic message
Mask or remove restricted content (policy-dependent)
System Prompt Leakage Protections
Beyond generic patterns, PRAGATIX can optionally harden against accidental disclosure of proprietary instructions by:
Generating detection signatures from protected prompts/templates (e.g., fingerprints) to identify likely prompt leakage in outputs without storing or exposing the protected prompt text.
Applying output leakage blocking when those signatures are detected.
This is designed to protect proprietary “how the assistant works” instructions and internal governance prompts.
Governance, Auditing, and Visibility
Each decision can be recorded (based on customer settings) for governance and compliance:
User identity and context (who/where/how)
AI destination/service and model metadata (what)
Policy decision (allow/warn/mask/block/route)
Detection reason(s) and rule category
Optional prompt/response retention controls (full, masked, hashed, or metadata-only)
These records feed reporting and alerting in the PRAGATIX platform.
Performance and Reliability
Guardrails are designed for low latency and high availability:
Pattern and policy evaluation is optimized for real-time usage.
Protection runs in-path; no “human-in-the-loop” is required to block obvious attacks.
Policies are centrally managed and can be tailored per tenant, group, application, and AI service.
Customer Controls (What You Can Configure)
Typical customer configuration includes:
Which AI services are in scope (public AI, Copilot, private LLMs, etc.)
Which users/groups are governed by which policies
Whether to block, warn, mask, or log per category
What is retained in logs (privacy-by-design controls)
Which models/services are allowed, and when to route to private models
Important Notes / Limitations
No security control can guarantee 100% prevention of all prompt injection attempts; PRAGATIX is designed to materially reduce risk through layered policy enforcement, inspection, redaction, and auditing.
Customers can tune policies to balance security and user experience (false positives/false negatives).
For the strongest outcome, protections should be paired with least-privilege access, strong identity controls, and appropriate internal data governance.