HOME/TECHNIQUE/Guardrails & Safety/Fallback & escalation design

TECHNIQUE

Fallback & escalation design

Guardrails & Safety

3APPLICATIONS
3OBSERVED OPERATORS
01

State of Practice

CROSS-VALIDATED — 5 OPERATORS

Fallback & escalation design is observed as explicit degradation paths: cached-safe responses, alternate models/endpoints, old-system rollback, human review, and low-confidence suppression.

Observed Practices

Operators define automatic fallback paths when the primary AI-serving path degrades: Salesforce serves safe cached responses, Slack falls back to different models or healthy endpoints, and Wix falls back to the old routing system when wait times exceed expectations.

3 of 6 operators
SalesforceSlackWix

Operators escalate or restrict LLM output when the system lacks adequate certainty: Agoda routes cases for human review when the LLM lacks full context, while Meta avoids showing low-confidence recommendations and accepts lower reach for higher precision.

2 of 6 operators
AgodaMeta

Operators keep humans in the loop at explicit review or decision points: Agoda has a reviewer validate and publish generated incident reports, and Meta reports human oversight at key strategic decision points.

2 of 6 operators
AgodaMeta

Operators attach fallback/escalation to operational thresholds or monitoring: Salesforce tracks L1/L2 cache usage and triggers PagerDuty alerts when services shift to L2 cache usage, Meta halts or pauses runs when compute thresholds are reached, and Wix falls back if waiting times exceed expectations.

3 of 6 operators
SalesforceMetaWix

Operators use rollback or containment during production changes: Slack used feature flags and instant rollback during backend migration, and Wix adopted fallback to the old system during routing-model deployment.

2 of 6 operators
SlackWix

Where Operators Converge

Among the operators with observed fallback/escalation evidence, escalation is tied to named conditions rather than left implicit: backend outage or cache shift, missing context, model degradation or limits, low confidence, compute thresholds, or waiting-time thresholds.

Where Operators Diverge

Operators differ on what the system falls back or escalates to.

APPROACH 01

Serve cached safe responses instead of depending on backend services.

Salesforce

APPROACH 02

Switch to alternate models, regions, endpoints, or capacity tiers.

Slack

APPROACH 03

Route to human review or human oversight.

AgodaMeta

APPROACH 04

Suppress or stop the automated action when guardrails are hit.

Meta

APPROACH 05

Revert to the previous production system.

Wix

Operators differ on the trigger used for fallback or escalation.

APPROACH 01

Backend dependency outage or cache-usage shift.

Salesforce

APPROACH 02

Missing context for the LLM.

Agoda

APPROACH 03

Model degradation, regional health, or capacity limits.

Slack

APPROACH 04

Low confidence or resource thresholds.

Meta

APPROACH 05

Waiting-time threshold in the customer-care queue.

Wix

Watch Items

Context gaps and low-confidence outputs limit automation: Agoda says missing context should over-escalate to human review, and Meta avoids low-confidence recommendations while sacrificing reach for precision.

Fallback thresholds depend on monitoring and data integrity: Salesforce alerts when cache usage patterns shift, while Wix says statistics and data need regular updates and heavy validation for integrity and data drift.

Capacity and dependency failures are explicit fallback drivers: Salesforce reports improved availability during full backend outages, and Slack designed for regional outages, GPU scarcity, surge spillover, and rerouting around unhealthy endpoints.

02

Implementation Menu

CURATED DEFAULTS
NameKindMaturity
Confidence-gated human handoffpatternestablished
Graceful degradation ladderpatternestablished
03

Observed in Production

3 APPS