HOME/TECHNIQUE/Guardrails & Safety/Input guards

TECHNIQUE

Input guards

Guardrails & Safety

1APPLICATIONS

2OBSERVED OPERATORS

State of Practice

CROSS-VALIDATED — 7 OPERATORS

Input guards are deployed as explicit platform, API, retrieval, data-policy, and agent-script controls; the common pattern is guarding inputs before downstream AI actions, but operators differ sharply on what they guard and where they place the control.

Observed Practices

Filter or handle unsafe prompt/query content before the downstream AI path: OpenAI filters repeated single-token prompts, Uber detects and handles prompt injection/jailbreak/content-safety/PII issues, and TraceIQ applies input guardrails before retrieval.

3 of 7 operators

OpenAIUberTraceIQ

Scrub or enforce controls on sensitive data before AI processing or external AI API access: Thumbtack scrubs PII in an inference layer, Uber redacts sensitive data when needed, and Meta uses automated detection, tagging, and policy-enforcement APIs across data storage, processing, and access layers.

3 of 7 operators

ThumbtackUberMeta

Put AI access behind central gateways or proxies that authenticate, authorize, rate-limit, or enforce tool/provider policies before requests reach models, tools, or downstream services.

2 of 7 operators

GrabUber

Use explicit task-flow logic to catch invalid user input instead of letting the model decide when a task is complete: Siemens uses Agent Script transitions and catches unrelated or unclear answers while guiding users through BANT qualification.

1 of 7 operators

Siemens

Require approval or policy review before AI use cases are allowed onto shared AI infrastructure: Grab requires a mini-RFC and checklist for every new use case, and Thumbtack updated AI/ML usage policies to track and approve usage across use cases.

2 of 7 operators

GrabThumbtack

Where Operators Converge

Every observed operator implements input guarding as an explicit system control outside the base model path: gateway controls, agent-script logic, inference wrappers, AI Guard, data-policy infrastructure, pre-retrieval guardrails, or prompt-input filtering.

Where Operators Diverge

Operators guard different input risk classes.

APPROACH 01

Prompt or query content risks: repeated-token divergence, prompt injection, jailbreaks, content safety, PII, or pre-retrieval query guarding.

OpenAIUberTraceIQ

APPROACH 02

Sensitive-data and privacy-policy risks: PII scrubbing, automated data detection/tagging, policy enforcement, and permitted-purpose checks.

ThumbtackMetaUber

APPROACH 03

Access and action-authorization risks: provider/feature authorization, rate limiting, tool access checks, and downstream proxying.

GrabUber

APPROACH 04

Business-process input validity: catching unrelated or unclear answers and guiding users through a fixed qualification framework.

Siemens

Guard placement differs by architecture.

APPROACH 01

Central AI gateway or proxy layer before provider/model/tool calls.

GrabUber

APPROACH 02

Application or inference-framework wrapper before secure external GenAI API access.

Thumbtack

APPROACH 03

Data-infrastructure enforcement using lineage, policy zones, automated detection, and policy APIs.

Watch Items

Narrow prompt filters can miss adjacent attack variants: OpenAI’s filtering focused on repeated single tokens, while Dropbox reported that multi-token repeats still induced divergence and training-data extraction.

Adversarial inputs and sensitive data remain active guard targets: Uber names prompt injection, jailbreaks, content safety, and PII redaction; Thumbtack built PII scrubbing before external GenAI APIs; Meta frames the problem as safeguarding GenAI data with privacy infrastructure.

False positives, hallucinations, and guardrail bypass are operational concerns: Thumbtack kept humans in the loop for policy-violation review because false positives and hallucinations were a concern, and OpenAI’s reported repeated-token vulnerability could make models disregard prompt guardrails and produce hallucinatory responses.

Implementation Menu

CURATED DEFAULTS

Name	Kind	When	Maturity
Prompt-injection classifier gate	pattern	untrusted input screened by a fast classifier before reaching the main model	established
Llama Guard	library	self-hosted safety classification with customizable policy taxonomy	established
Presidio	library	PII detection and redaction before content enters prompts or logs	established

Observed in Production

1 APP

TechnologyGROUNDED

LLM Application Quality Assurance

Meta, Uber2 OP