TECHNIQUE
Retrieval & Grounding
Hybrid retrieval is deployed as a signal-combination pattern: operators pair semantic/vector retrieval with lexical/BM25, structured filters, metadata, ranking, or source-selection rather than relying on embeddings alone.
Combine semantic/vector retrieval with at least one non-vector retrieval signal or constraint, rather than using embeddings alone.
7 of 7 deployed operators in the pool show hybrid retrieval with semantic/vector/image embeddings plus traditional signals, keyword/BM25/sparse retrieval, metadata filters, strict filters, or source constraints.Add lexical, keyword, BM25, or sparse indexes alongside dense/vector retrieval for exact-match and complementary recall.
4 of 7 deployed operators explicitly show BM25, sparse, keyword, or lexical retrieval combined with dense/vector retrieval.Use structured fields, metadata, filters, permissions, or product signals as part of retrieval and ranking.
7 of 7 deployed operators show structured constraints or signals used with retrieval.Shape the query or select sources before retrieval when user intent is ambiguous or multi-step.
4 of 7 deployed operators show query rewriting, query construction, intent classification, source identification, or query expansion before retrieval.Add ranking, reranking, or post-retrieval ordering after the initial retrieval step.
4 of 7 deployed operators explicitly show ranking, reranking, cross-encoder reranking, or separate retrieval-and-ranking stages.Evaluate retrieval variants with offline metrics, labeled examples, human inspection, or online experiments before choosing a retrieval setup.
4 of 7 deployed operators show explicit retrieval evaluation or experiment workflows.Across deployed operators, hybrid retrieval means embeddings are not treated as sufficient by themselves; each observed deployment adds lexical/BM25/sparse retrieval, structured filters, metadata, permissions, traditional signals, or source-selection logic.
Observed deployments ground retrieval in operator-specific corpora or product data: workplace apps, documents, jobs, feeds, memories, images, or internal knowledge sources.
Operators differ on what non-vector layer they pair with semantic retrieval.
APPROACH 01
Lexical, keyword, BM25, or sparse retrieval is paired with dense/vector retrieval.
APPROACH 02
Structured fields, metadata filters, strict filters, or traditional product signals are paired with neural/embedding retrieval.
Operators differ on where retrieval intelligence is placed in the system.
APPROACH 01
A unified index or single retrieval tool hides multiple sources behind one retrieval interface.
APPROACH 02
A query engine, search agent, or pre-processing agents classify intent, rewrite queries, or restrict sources before retrieval.
Hybrid retrieval is applied to different corpus types and modalities.
APPROACH 01
Workplace and internal-document retrieval across apps, wikis, docs, messages, and knowledge sources.
APPROACH 02
Product-object retrieval for jobs or feed content using profiles, engagement data, filters, and embeddings.
APPROACH 03
Image retrieval using image embeddings plus metadata filters.
APPROACH 04
Personal memory retrieval using semantic, keyword, BM25, and meta-field filter techniques.
Embedding-only retrieval is repeatedly called insufficient: operators add hybrid signals because raw semantic/vector retrieval misses exact matches, filters, policy constraints, or query intent.
Context and tool sprawl can degrade retrieval or agent performance, so operators trim context, route to specialized search agents, optimize ambiguous queries, and narrow source sets.
Access control, sensitive-data exposure, and guardrails remain retrieval-path concerns, not only generation concerns.
Quality, latency, and cost are managed as retrieval tradeoffs, with operators reporting latency budgets, compute requirements, and model choices made for quality/latency/cost balance.
| Name | Kind | When | Maturity |
|---|---|---|---|
| OpenSearch hybrid (BM25 + dense) | service | existing Elasticsearch/OpenSearch infra gains dense fusion without a new store | established |
| Reciprocal rank fusion | pattern | merging sparse and dense result lists without training a fusion model | commodity |
| Vespa | service | native hybrid ranking profiles at high query volume | established |