HOME/TECHNIQUE/Retrieval & Grounding/LLM-friendly knowledge representation

TECHNIQUE

LLM-friendly knowledge representation

Retrieval & Grounding

4APPLICATIONS
6OBSERVED OPERATORS
01

State of Practice

CROSS-VALIDATED — 6 OPERATORS

LLM-friendly knowledge representation is showing up as an intermediate context layer: operators convert internal knowledge into scoped, structured, retrievable artifacts before agents use it.

Observed Practices

Convert heterogeneous source material into LLM-readable structured text or task artifacts: markdown-normalized content, markdown tables, playbooks, review profiles, Q&A summaries, or agent-optimized docs/skills.

5 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
DropboxUberLinkedInDoorDashWix

Keep multiple representations of the same knowledge for retrieval and generation, instead of relying on a single vector index: lexical indexes, dense vectors, chunks, contextual graphs, document summaries, FAQs, chronological logs, semantic indexes, and memory layers.

3 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
DropboxUberLinkedIn

Scope or curate context before the main LLM sees it, using routing, source narrowing, meta-tools, or curated skills/profiles to avoid irrelevant context.

4 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
DoorDashUberLinkedInWix

Expose represented knowledge through agent-access layers such as APIs, tools, MCP servers, Knowledge Vault lookups, packaged skills, or playbooks-as-tools.

4 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
DropboxGrabLinkedInWix

Use LLM-powered preprocessing to turn raw or semi-structured inputs into more retrievable artifacts, such as markdown tables, long-term memory summaries, episodic extractions, Q&A pairs, hierarchical summaries, and facets.

2 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
UberLinkedIn

Measure representation quality or usage with evals, logs, retrieval metrics, or production acceptance so representation changes can be compared over time.

4 of 6 deployed/pilot operators in this pool; announced Meta evidence not counted.
DoorDashLinkedInWixDropbox

Where Operators Converge

Every deployed/pilot operator in this pool adds an organization- or task-specific context layer instead of expecting the LLM to work from generic model knowledge alone.

The represented knowledge is connected to agent runtime through retrieval, tools, APIs, workflows, skills, or document-fetch paths; none of the deployed/pilot examples present representation as a static document-only exercise.

Where Operators Diverge

Operators differ on what they package as the primary LLM-friendly artifact.

APPROACH 01

Retrieval corpus and index artifacts: normalize or enrich documents, then retrieve chunks, summaries, FAQs, graph representations, lexical hits, or vectors.

DropboxGrabUber

APPROACH 02

Task instructions and curated guides: encode review rules, step-by-step playbooks, or condensed skills/docs for agents to follow.

DoorDashLinkedInWix

APPROACH 03

Stateful memory: represent prior interactions and activity as conversational, episodic, semantic, and procedural memory exposed through tool abstractions.

LinkedIn

Operators differ on how they keep the LLM context small enough to be useful.

APPROACH 01

Narrow retrieval at query time using query optimization, source identification, post-processing, ranking, or document-set restriction.

UberDropbox

APPROACH 02

Pre-curate task-specific context packs so the agent loads only the relevant rules, skills, playbooks, or meta-tools.

DoorDashLinkedInWix

APPROACH 03

Compress history into memory summaries or semantic indexes so recent and similar interactions can be retrieved without replaying all raw dialogue.

LinkedIn

Watch Items

Generic or raw context remains a failure mode: operators report generic/noisy AI reviews, coding assistants that lack company context, agent audiences outgrowing human-oriented docs, and answer-quality challenges in internal support bots.

Context bloat and fragmented access paths are active risks: LinkedIn explicitly reduces a giant tool list, Wix reports MCP fragmentation causing more calls/latency/turns, and DoorDash routes to load only the relevant rules.

Freshness is not automatic: Grab added auto-updates for Knowledge Vaults, Wix says regular evaluations maintain skill freshness, and LinkedIn performs asynchronous indexing for high-volume activity.

Structured and non-text content often needs special representation work before retrieval: Uber converts tables to markdown, Dropbox calls out media-specific understanding/transcription needs, and Wix compares one-shot markdown pages against fragmented MCP access.

02

Implementation Menu

CURATED DEFAULTS
NameKindMaturity
Contextual chunk headerspatternestablished
Doclinglibraryemerging
Markdown normalization pipelinepatterncommodity
03

Observed in Production

4 APPS