TECHNIQUE
Guardrails & Safety
Input guards are implemented as concrete pre-processing controls at gateways, APIs, RAG query paths, and privacy/data-enforcement boundaries; the pool shows different guard targets rather than one shared pattern.
Filter or handle unsafe prompt/query inputs before downstream AI processing, including repeated-token prompts, prompt injection, jailbreaks, content safety, PII redaction, or RAG retrieval.
3 of 5 operatorsPut guard and policy enforcement in mediation layers before requests reach downstream tools, services, model providers, or data systems.
3 of 5 operatorsUse privacy/data-policy guards to constrain ingestion, processing, access, and training-data use.
1 of 5 operatorsRequire review and checklist gates before new AI Gateway use cases are onboarded.
1 of 5 operatorsOperators differ on where input guards sit in the system path.
APPROACH 01
Gateway/API mediation before model, provider, tool, or service access.
APPROACH 02
RAG online query path before retrieval.
APPROACH 03
Privacy-aware infrastructure at data ingestion, processing, access, lineage, and training-data boundaries.
Operators differ on what the guard is checking for.
APPROACH 01
Prompt-pattern filtering for repeated single-token prompts.
APPROACH 02
Prompt injection, jailbreaks, content safety, PII redaction, tool access checks, and sensitive-data redaction.
APPROACH 03
Access authorization, path-based provider/feature authorization, authentication, and rate limiting.
APPROACH 04
Privacy constraints and permitted-purpose checks on data assets before processing or training use.
APPROACH 05
Input guardrails before retrieval in a RAG system; the pool does not specify the exact checks.
Single-token repetition filters did not cover all repeated-token divergence paths; multi-token repeats could still elicit model divergence and training-data extraction.
Repeated-token divergence could be used to bypass prompt guardrails and produce hallucinatory responses.
GPT-4 repeated-phrase behavior was non-deterministic in reported tests, and some repeat requests timed out after ten minutes.
| Name | Kind | When | Maturity |
|---|---|---|---|
| Prompt-injection classifier gate | pattern | untrusted input screened by a fast classifier before reaching the main model | established |
| Llama Guard | library | self-hosted safety classification with customizable policy taxonomy | established |
| Presidio | library | PII detection and redaction before content enters prompts or logs | established |