TECHNIQUE

Hybrid retrieval

Retrieval & Grounding

5APPLICATIONS
7OBSERVED OPERATORS
01

State of Practice

CROSS-VALIDATED — 7 OPERATORS

Hybrid retrieval is deployed as a signal-combination pattern: operators pair semantic/vector retrieval with lexical/BM25, structured filters, metadata, ranking, or source-selection rather than relying on embeddings alone.

Observed Practices

Combine semantic/vector retrieval with at least one non-vector retrieval signal or constraint, rather than using embeddings alone.

7 of 7 deployed operators in the pool show hybrid retrieval with semantic/vector/image embeddings plus traditional signals, keyword/BM25/sparse retrieval, metadata filters, strict filters, or source constraints.
AtlassianCanvaDropboxLinkedInNew ComputerTraceIQUber

Add lexical, keyword, BM25, or sparse indexes alongside dense/vector retrieval for exact-match and complementary recall.

4 of 7 deployed operators explicitly show BM25, sparse, keyword, or lexical retrieval combined with dense/vector retrieval.
DropboxNew ComputerTraceIQUber

Use structured fields, metadata, filters, permissions, or product signals as part of retrieval and ranking.

7 of 7 deployed operators show structured constraints or signals used with retrieval.
AtlassianCanvaDropboxLinkedInNew ComputerTraceIQUber

Shape the query or select sources before retrieval when user intent is ambiguous or multi-step.

4 of 7 deployed operators show query rewriting, query construction, intent classification, source identification, or query expansion before retrieval.
DropboxLinkedInTraceIQUber

Add ranking, reranking, or post-retrieval ordering after the initial retrieval step.

4 of 7 deployed operators explicitly show ranking, reranking, cross-encoder reranking, or separate retrieval-and-ranking stages.
AtlassianDropboxLinkedInTraceIQ

Evaluate retrieval variants with offline metrics, labeled examples, human inspection, or online experiments before choosing a retrieval setup.

4 of 7 deployed operators show explicit retrieval evaluation or experiment workflows.
AtlassianCanvaDropboxNew Computer

Where Operators Converge

Across deployed operators, hybrid retrieval means embeddings are not treated as sufficient by themselves; each observed deployment adds lexical/BM25/sparse retrieval, structured filters, metadata, permissions, traditional signals, or source-selection logic.

Observed deployments ground retrieval in operator-specific corpora or product data: workplace apps, documents, jobs, feeds, memories, images, or internal knowledge sources.

Where Operators Diverge

Operators differ on what non-vector layer they pair with semantic retrieval.

APPROACH 01

Lexical, keyword, BM25, or sparse retrieval is paired with dense/vector retrieval.

DropboxNew ComputerTraceIQUber

APPROACH 02

Structured fields, metadata filters, strict filters, or traditional product signals are paired with neural/embedding retrieval.

AtlassianCanvaLinkedIn

Operators differ on where retrieval intelligence is placed in the system.

APPROACH 01

A unified index or single retrieval tool hides multiple sources behind one retrieval interface.

DropboxLinkedIn

APPROACH 02

A query engine, search agent, or pre-processing agents classify intent, rewrite queries, or restrict sources before retrieval.

DropboxLinkedInTraceIQUber

Hybrid retrieval is applied to different corpus types and modalities.

APPROACH 01

Workplace and internal-document retrieval across apps, wikis, docs, messages, and knowledge sources.

AtlassianDropboxTraceIQUber

APPROACH 02

Product-object retrieval for jobs or feed content using profiles, engagement data, filters, and embeddings.

LinkedIn

APPROACH 03

Image retrieval using image embeddings plus metadata filters.

Canva

APPROACH 04

Personal memory retrieval using semantic, keyword, BM25, and meta-field filter techniques.

New Computer

Watch Items

Embedding-only retrieval is repeatedly called insufficient: operators add hybrid signals because raw semantic/vector retrieval misses exact matches, filters, policy constraints, or query intent.

Context and tool sprawl can degrade retrieval or agent performance, so operators trim context, route to specialized search agents, optimize ambiguous queries, and narrow source sets.

Access control, sensitive-data exposure, and guardrails remain retrieval-path concerns, not only generation concerns.

Quality, latency, and cost are managed as retrieval tradeoffs, with operators reporting latency budgets, compute requirements, and model choices made for quality/latency/cost balance.

02

Implementation Menu

CURATED DEFAULTS
NameKindMaturity
OpenSearch hybrid (BM25 + dense)serviceestablished
Reciprocal rank fusionpatterncommodity
Vespaserviceestablished
03

Observed in Production

5 APPS