TECHNIQUE
Retrieval & Grounding
Reranking is deployed mainly as a second-stage narrowing step: operators first retrieve or shortlist candidates, then use embedding, cross-encoder, or LLM rankers to reorder/prune them before answer generation, investigation, or search results.
Run reranking after an initial retrieval or candidate-shortlisting step, rather than as the first-stage search.
5 of 6 operators in the pool: Dropbox, Grab, Meta, Rippling, and TraceIQ.Use reranking to put the most relevant candidates at the top or aggressively prune context before downstream use.
4 of 6 operators in the pool: Dropbox, Grab, Meta, and Rippling.Pair lexical, sparse, dense-vector, heuristic, or semantic retrieval with a heavier reranking stage.
5 of 6 operators in the pool: Dropbox, Grab, Meta, Rippling, and TraceIQ.Evaluate reranking or retrieval-ranking quality with offline experiments, relevance metrics, backtesting, or confidence controls.
3 of 6 operators in the pool: Dropbox, Grab, and Meta.Across the operators with explicit reranking evidence, reranking is used as a second-stage candidate reducer or reorderer after a cheaper retrieval, search, or shortlist stage.
Operators use different reranker types.
APPROACH 01
Embedding-feature or embedding-model rerankers re-sort retrieved chunks/search results.
APPROACH 02
LLM-based rankers rank a candidate shortlist.
APPROACH 03
Cross-encoder reranking is used with hybrid retrieval.
APPROACH 04
Rerankers are reported as context pruners, without naming the model class.
The thing being reranked differs by use case.
APPROACH 01
Search/document chunks for RAG answers and work-content search.
APPROACH 02
Structured-data entity matches from vector similarity search.
APPROACH 03
Potential code changes for root-cause investigations.
APPROACH 04
Agent context for a cross-domain business ontology and RAG agents.
Operators expose different compression targets or output sizes.
APPROACH 01
Re-sort retrieved chunks so the most relevant chunks are at the top.
APPROACH 02
Feed a shortlist of 15 entity matches from FAISS to an LLM for ranking.
APPROACH 03
Reduce hundreds of code-change candidates to a top-five list, using prompts capped at 20 changes at a time and repeated aggregation.
APPROACH 04
Prune context size by 100 to 500x.
Candidate volume and context-window limits shape reranking design; Meta explicitly used election-style ranking to handle context-window limits, and Rippling reports aggressive reranker pruning of context by 100 to 500x.
LLM reranking can add latency; Grab says real-world applications must consider the additional latency introduced by the extra LLM query.
Reranking relevance is data- and query-dependent; Grab reports dependence on data quality, complexity, use case, and query patterns, while Meta avoids low-confidence recommendations by sacrificing reach for precision.
| Name | Kind | When | Maturity |
|---|---|---|---|
| bge-reranker | library | self-hosted cross-encoder reranking with GPU available | established |
| Cohere Rerank | service | managed reranking without hosting a model | established |
| ms-marco MiniLM cross-encoder | library | CPU-friendly baseline reranker for modest volumes | commodity |