TECHNIQUE
Guardrails & Safety
Observed deployments use deterministic rule guards as hard validation, filtering, policy, and escalation layers around LLM output rather than relying on prompts alone.
Put generated outputs or AI-driven actions through deterministic validation gates before downstream use: GraphQL/schema validation, JSON/structure checks, rule-based eval checks, compiler/test/static-analysis feedback, privacy policy enforcement, or domain rule validators.
6 of 6 observed operators in this teardown poolEncode guards in domain artifacts that already define correctness: GraphQL schemas and client documents, TypeScript AST/import rules/linters/tests, human evaluation rubrics and safety checks, privacy policy zones and lineage, SonarQube/Tree-sitter/build/test checks, and relevance-score JSON contracts.
6 of 6 observed operators in this teardown poolUse guard failures to trigger a concrete control action: feed validation errors back to the LLM for repair, return steering messages to the agent, surface failures to human review, halt or pause runs at thresholds, or require developer approval before merge.
5 of 6 observed operators in this teardown poolCombine deterministic guards with monitoring or trace capture so failures, regressions, drift, and audit evidence remain visible after deployment.
4 of 6 observed operators in this teardown poolUse deterministic filtering to narrow where the LLM is allowed to operate, such as excluding files/functions, stripping irrelevant schema, selecting imports/hints from AST inspection, or applying policy zones before processing.
4 of 6 observed operators in this teardown poolEvery observed operator uses deterministic or rule-governed checks as an execution boundary around AI output, not as a standalone policy document.
The shared reason for these guards is operational reliability: operators report false positives, unsupported claims, invalid structure, policy violations, hallucinations, or unreliable naive AI as failure modes the guard layers are meant to catch or contain.
Operators differ on the main artifact that defines the rule boundary.
APPROACH 01
Schema, format, or output-contract guards: validate GraphQL JSON, JSON parseability, structure, formatting, length, or schema.
APPROACH 02
Codebase and static-analysis guards: use ASTs, import-rule DSLs, custom linters, SonarQube, Tree-sitter, build checks, tests, and domain rule validators.
APPROACH 03
Privacy and data-policy guards: use lineage, policy zones, policy-enforcement APIs, and verifiers over data-processing edges.
Operators differ on what happens after a guard catches a problem.
APPROACH 01
Automatic repair or steering loop: feed validation errors or diagnostics back to the LLM/agent so it can revise its output.
APPROACH 02
Human review escalation: route failures or critical outputs to humans, crowdsourced reviewers, subject-matter experts, or developer code reviewers.
APPROACH 03
Blocking, pausing, or remediation: stop runs at thresholds or remediate disallowed data use when policy checks fail.
Operators differ on where guards are placed in the workflow.
APPROACH 01
Pre-generation scoping: narrow the inputs, files, functions, schema, or data zones before the model acts.
APPROACH 02
Post-generation validation: inspect the completed output or proposed action before writing files, posting comments, merging code, processing invoices, or feeding downstream systems.
APPROACH 03
Continuous runtime or pipeline enforcement: monitor lineage, traces, metrics, scores, or drift over time after deployment.
False positives, hallucinations, and unsupported claims remain the recurring failure class that pushed operators to add deterministic guard layers.
Machine-readable structure is itself a guardrail risk: operators call out broken JSON, expected-structure failures, and invalid generated data as conditions that must be caught or repaired.
Operators report that prompt edits, model swaps, drift, and new or changed processing jobs can create regressions, so guardrails need monitoring rather than one-time setup.
Naive AI use is described as unreliable; observed operators compensate by turning repeated failures into explicit policy, diagnostics, validation, or human-review gates.
| Name | Kind | When | Maturity |
|---|---|---|---|
| Post-parse validators in code | pattern | hard constraints enforced after parsing, never delegated to the model | commodity |
| Open Policy Agent | library | rule sets maintained by non-ML teams as versioned policy | established |