HOME/TECHNIQUE/Retrieval & Grounding/LLM-friendly knowledge representation

TECHNIQUE

LLM-friendly knowledge representation

Retrieval & Grounding

8APPLICATIONS

9OBSERVED OPERATORS

State of Practice

CROSS-VALIDATED — 8 OPERATORS

Across deployed/pilot operators, LLM-friendly knowledge representation is mostly a packaging-and-filtering discipline: teams convert enterprise content into task-specific artifacts, index them for retrieval, and deliberately narrow what the LLM sees.

Observed Practices

Package source knowledge into task-specific artifacts instead of handing the model raw enterprise sprawl: Dropbox normalizes files to markdown and builds graph-derived knowledge bundles; Uber converts Google Doc tables to markdown and stores document titles/summaries/FAQs; Pinterest compiles table schemas, table summaries, and historical queries; DoorDash builds per-domain review profiles; LinkedIn encodes organizational procedures as playbooks; Wix uses AI-optimized docs/skills and feedback-generated knowledge or prompt-instruction documents; Culligan Quench organizes enablement content around sales use.

7 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

DropboxUberPinterestDoorDashLinkedInWixCulligan Quench

Persist represented knowledge in retrieval stores, usually embeddings/vector stores, and often pair that with lexical or metadata retrieval: Dropbox stores embeddings, chunks, and contextual graph representations and uses BM25 plus dense vectors; Pinterest generates embeddings for table summaries and historical queries in OpenSearch; Uber indexes chunks into a vector store and also adds BM25; LinkedIn stores conversation embeddings alongside chronological memory; Wix adds embeddings for new knowledge documents and retrieves top-scoring documents from a vector database.

5 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

DropboxPinterestUberLinkedInWix

Narrow the model’s context to the subset relevant to the task, domain, source, or user intent: DoorDash routes PRs to only relevant review rules; Uber QueryGPT maps prompts to business-domain workspaces and prunes schemas; Uber Genie uses source identification to restrict retrieval; Pinterest retrieves top-N tables and asks an LLM to choose top-K; LinkedIn hides thousands of tools behind a small set of meta-tools to reduce context bloat; Wix tests curated skills or MCP-packaged skills as constrained context.

5 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

DoorDashUberPinterestLinkedInWix

Use LLMs to create or enrich the knowledge representation before retrieval or generation: Uber converts extracted table contents into markdown tables; Pinterest forwards prompts to an LLM to create table/query summaries and uses an LLM to select relevant tables; Wix uses LLMs for feature extraction from owner feedback; LinkedIn uses LLMs to summarize learned patterns, extract episodic activities, and compress conversational memory.

4 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

UberPinterestWixLinkedIn

Make represented knowledge updateable rather than static: Grab auto-updates users’ Knowledge Vaults; Wix classifies owner feedback into new knowledge, prompt instructions, or nothing and stores new documents plus embeddings; LinkedIn asynchronously indexes user activity and periodically compresses conversational memory; Dropbox stages graph relationships asynchronously before creating knowledge bundles.

4 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

GrabWixLinkedInDropbox

Expose knowledge representations through tools, workflows, or agent interfaces when the task requires action: Dropbox wraps its index as a “super tool”; LinkedIn exposes playbooks as MCP tools; Grab combines LLM calls, Python execution, and Knowledge Vault lookups in structured workflows; DoorDash runs review/fix agents in remote VMs with full repository context.

4 of 8 deployed/pilot operators in the pool; Meta is announced and not counted.

DropboxLinkedInGrabDoorDash

Where Operators Converge

Every deployed/pilot operator in the pool represents domain knowledge in a form tailored to its AI workflow: sales spots, review profiles, knowledge vaults, schemas/summaries, chunks/graphs, memories, playbooks, skills, or feedback-derived documents.

Where Operators Diverge

Operators differ on the primary unit of knowledge they make LLM-friendly.

APPROACH 01

Represent source content as normalized documents, chunks, summaries, graph context, or retrieval artifacts.

DropboxUberGrab

APPROACH 02

Represent data-warehouse knowledge as schemas, table summaries, SQL samples, workspaces, or pruned columns.

PinterestUber

APPROACH 03

Represent operational know-how as curated instructions, review profiles, playbooks, or skills.

DoorDashLinkedInWix

APPROACH 04

Represent personalization as memory or feedback-derived documents that are retrieved later.

LinkedInWix

Operators differ on how they choose what knowledge reaches the model at runtime.

APPROACH 01

Embedding/vector or hybrid retrieval selects relevant chunks, tables, memories, or documents.

DropboxPinterestUberLinkedInWix

APPROACH 02

Routers, agents, or LLM steps first narrow the search space by domain, source, table set, rule set, or tool surface.

DoorDashPinterestUberLinkedIn

APPROACH 03

Users validate, alter, or teach the representation before it is reused.

PinterestUberWix

Operators differ on how they keep the representation current and governed.

APPROACH 01

Offline or asynchronous enrichment/indexing jobs update retrieval artifacts outside the online interaction.

DropboxPinterestUberLinkedIn

APPROACH 02

Feedback or memory ingestion changes future retrieval and prompts.

WixLinkedIn

APPROACH 03

Evaluation and instrumentation monitor whether curated representations remain useful.

DoorDashLinkedInUberWix

Watch Items

Raw enterprise formats are not automatically LLM-friendly: Uber moved from PDFs to Google Docs for more accurate extraction and uses LLM enrichment to turn tables into markdown; Dropbox says files are normalized to markdown and that images need multimodal understanding beyond CLIP-style starts.

Too much or poorly targeted context degrades usefulness: DoorDash started from noisy, generic AI review comments; LinkedIn says hiding a giant tool list reduces context bloat and improves accuracy; Uber narrows QueryGPT’s RAG search radius by business domain; Wix found MCP-style fragmentation can mean more calls, more inference latency, and more turns.

Curated representations require maintenance and measurement: Uber says golden question-to-SQL mappings required manual upfront investment; Wix says regular evaluations maintain skill freshness; DoorDash uses evals plus production acceptance to prevent cost reductions from quietly reducing quality; LinkedIn instruments every CAPT tool and playbook invocation.

Implementation Menu

CURATED DEFAULTS

Name	Kind	When	Maturity
Contextual chunk headers	pattern	chunks lose meaning without document and section context prepended	established
Docling	library	layout-aware conversion of PDFs and office docs to structured markdown	emerging
Markdown normalization pipeline	pattern	heterogeneous sources need one canonical LLM-readable format	commodity

Observed in Production

8 APPS

TechnologyGROUNDED

LLM-friendly knowledge representation

State of Practice

Observed Practices

Where Operators Converge

Where Operators Diverge

Watch Items

Implementation Menu

Observed in Production

LLM Application Quality Assurance

AI Sales Engagement Room and Role Play Certification

AI Security Decision Audit and Incident Report Generation

AI-Assisted Product and Developer Collaboration Workflows

Code and Query Defect Validation and Repair

LLM SQL and Knowledge Base Quality Evaluation

LLM-Assisted Code Review, Test Migration, and Agent Evaluation

Security and Privacy Policy On-Call Support Copilot