HOME/TECHNIQUE/Data & Context Engineering/Context & memory management

TECHNIQUE

Context & memory management

Data & Context Engineering

4APPLICATIONS
8OBSERVED OPERATORS
01

State of Practice

CROSS-VALIDATED — 11 OPERATORS

Observed practice: operators make context and memory an explicit production subsystem—retrieving, persisting, pruning, partitioning, and auditing context around LLM calls rather than relying on the model alone.

Observed Practices

Retrieve relevant prior or domain context from purpose-built stores before or during LLM calls, including vector indexes, semantic layers, knowledge graphs, historical experiment stores, memory stores, and key-value observation stores.

8 of 11 evidenced operators in this pool.
AgodaLinkedInRipplingAirbnbMetaNew ComputerClericAlibaba Cloud

Maintain runtime state or memory across steps in multi-call workflows, rather than treating each model call as isolated.

9 of 11 evidenced operators in this pool.
RipplingLinkedInAirbnbShopifyMetaSlackUberClericAlibaba Cloud

Compress, prune, or replace bulky context with compact references so long-running workflows can stay within practical context limits.

4 of 11 evidenced operators in this pool.
RipplingLinkedInAlibaba CloudSlack

Partition context by scope, type, channel, or domain so agents do not receive one undifferentiated memory blob.

6 of 11 evidenced operators in this pool.
LinkedInSlackClericAirbnbNew ComputerRippling

Use traces, evaluations, feedback, or review workflows to inspect whether context and memory are working in production.

7 of 11 evidenced operators in this pool.
RipplingLinkedInNew ComputerClericSlackAirbnbShopify

Store successful outcomes or learned patterns back into memory so later workflows can reuse them.

4 of 11 evidenced operators in this pool.
MetaClericLinkedInNew Computer

Where Operators Converge

Across the evidenced deployments, context and memory management is implemented as explicit application infrastructure around LLM calls—loaders, stores, retrieval orchestration, journals, variable stores, traces, or gateway/platform support—not as a model-only capability.

Where Operators Diverge

Operators differ in what they treat as the primary memory unit.

APPROACH 01

Conversation or workflow transcript memory carried forward within a session.

LinkedInShopify

APPROACH 02

Structured working memory or compact references instead of raw text carry-forward.

SlackRipplingAlibaba Cloud

APPROACH 03

Long-term learned memory about users, environments, investigations, or experiments.

New ComputerClericLinkedInMeta

Operators differ in how they retrieve context.

APPROACH 01

Vector or semantic retrieval over prior cases, conversations, or generated context.

AgodaLinkedInNew Computer

APPROACH 02

Hybrid retrieval, filtering, reranking, or domain routing before injecting context.

RipplingNew ComputerAirbnbLinkedIn

APPROACH 03

Key-based retrieval of large observations or named runtime variables instead of passing full observations through the prompt.

Alibaba CloudRippling

Operators differ in how context quality is controlled.

APPROACH 01

Human validation, guidance, or escalation when context is incomplete or decisions are strategic.

AgodaMetaCleric

APPROACH 02

Automated or semi-automated traces, evaluations, critic review, or feedback APIs tied to execution traces.

RipplingNew ComputerClericSlackLinkedIn

Watch Items

Context size is a recurring production constraint: operators reported context windows filling, investigations producing megabytes of output, enormous logs/code/database results, and the need to prune or compress token usage.

Incomplete or inaccurate context changes operating behavior: Agoda says missing prior alerts or historical signals should trigger human review; LinkedIn describes semantic search to refine context when inaccuracies arise; New Computer found retrieval method choice varies by case.

Sensitive production or user data in context requires controls: Alibaba Cloud flags privacy risk from transmitting production confidential data to external APIs; Uber routes agent tool calls through policy enforcement and redaction; Agoda’s context includes sensitive user, payment, and booking data.

Long-running or stateful environments make memory correctness harder: Slack reports investigations spanning hundreds of inference requests; Meta persists state across workflows spanning days or weeks; Cleric notes production environments are stateful and dynamic and cannot be reproduced later.

02

Implementation Menu

CURATED DEFAULTS
NameKindMaturity
Conversation summarization compactionpatternestablished
Vector-backed long-term memorypatternestablished
Letta (MemGPT)libraryemerging
03

Observed in Production

4 APPS