HOME/TECHNIQUE/Model Adaptation/Supervised fine-tuning

TECHNIQUE

Supervised fine-tuning

Model Adaptation

7APPLICATIONS

8OBSERVED OPERATORS

State of Practice

CROSS-VALIDATED — 8 OPERATORS

Across the cited deployed/pilot pool, supervised fine-tuning is used as application-specific adaptation: operators fine-tune on task/domain data, embed the model inside larger production pipelines, and add evaluation, safety, latency, or review controls around it.

Observed Practices

Fine-tune on task- or domain-specific data rather than relying on a generic base model: examples include LinkedIn proprietary/professional-domain data, Criteo latest ad-interaction data, Uber invoice datasets, Pinterest GPT-generated labels for Qwen, Podium traced conversations, Thumbtack message samples, Atlassian Jira-like work data, and Meta historical investigations with known root causes.

8 of 8 operators with cited deployed/pilot evidence in this pool; LinkedIn counted once despite multiple teardowns.

LinkedInCriteoUberPinterestPodiumThumbtackAtlassianMeta

Use supervised fine-tuning to improve a specific production behavior: semantic job search or embeddings, ad-performance prediction, invoice extraction, journey relevance, agent conversation endings, policy-violation detection, enterprise semantic search, or root-cause ranking.

8 of 8 operators with cited deployed/pilot evidence in this pool; LinkedIn counted once despite multiple teardowns.

LinkedInCriteoUberPinterestPodiumThumbtackAtlassianMeta

Keep the fine-tuned model inside a larger system rather than deploying it as a standalone component: observed surrounding components include tool calling, RAG, retrieval/ranking layers, OCR and post-processing, clustering/diversification, observability datasets, rule/CNN prefilters, hybrid retrieval signals, and heuristic retrievers.

8 of 8 operators with cited deployed/pilot evidence in this pool; LinkedIn counted once despite multiple teardowns.

LinkedInCriteoUberPinterestPodiumThumbtackAtlassianMeta

Validate fine-tuned models with offline, online, benchmark, A/B, or backtesting workflows before or during production use.

8 of 8 operators with cited deployed/pilot evidence in this pool; LinkedIn counted once despite multiple teardowns.

LinkedInCriteoUberPinterestPodiumThumbtackAtlassianMeta

Pair fine-tuning with production-efficiency controls when scale, latency, or cost are explicit constraints: observed controls include LoRA, FlashAttention/custom kernels, mixed precision, DeepSpeed ZeRO, distributed GPU training, continuous lightweight fine-tuning, ONNX/Triton serving, in-house Qwen inference, and smaller/cost-effective domain models.

5 of 8 operators with cited deployed/pilot evidence in this pool.

LinkedInCriteoPinterestAtlassianMeta

Add human review, safety checks, or confidence gates around fine-tuned-model outputs in higher-risk workflows.

4 of 8 operators with cited deployed/pilot evidence in this pool.

UberPinterestThumbtackMeta

Where Operators Converge

Every cited operator applies supervised fine-tuning to a concrete application domain or task, not as an abstract model-improvement exercise.

Every cited operator surrounds the fine-tuned model with application infrastructure such as retrieval, ranking, orchestration, post-processing, observability, evaluation, or review layers.

Where Operators Diverge

Operators place the fine-tuned model in different parts of the application stack.

APPROACH 01

Fine-tuned embeddings or semantic retrieval/search models.

LinkedInAtlassian

APPROACH 02

Fine-tuned rankers, scorers, or classifiers for decisions such as ad outcomes, policy violations, or root-cause candidates.

CriteoThumbtackMeta

APPROACH 03

Fine-tuned LLMs used inside document-processing, journey-relevance, or agent-conversation workflows.

UberPinterestPodium

Training data sources differ substantially.

APPROACH 01

Use production or historical operational records as training data.

CriteoPodiumThumbtackMeta

APPROACH 02

Use human annotations, curated validation sets, or domain review labels.

LinkedInUberThumbtack

APPROACH 03

Use synthetic, LLM-generated, or public domain-like datasets for fine-tuning.

PinterestAtlassian

Fine-tuning cadence and training pattern vary.

APPROACH 01

Continuous or frequent fine-tuning/retraining to keep models fresh with latest data.

Criteo

APPROACH 02

Multi-step instruction tuning plus preference/safety alignment for domain-adapted foundation models.

APPROACH 03

Task-specific fine-tuning followed by evaluation for a bounded use case.

UberPinterestPodiumThumbtackAtlassianMeta

Watch Items

Cost, latency, and serving scale remain active constraints around fine-tuned models; operators report high compute costs, complex deployment pipelines, billion-inference-per-second latency targets, quality/latency/cost tradeoffs, and context-window-driven ranking workarounds.

Fine-tuning depends on label quality, dataset freshness, and coverage of edge cases; operators report continuous retraining needs, costly or inconsistent human evaluation, absence of reliable reference labels, and the need to expand/update datasets.

Several operators avoid fully automatic action in critical paths by adding human review, safety checks, or confidence filtering.

Operators use prefilters, retrievers, or batching strategies to restrict what the fine-tuned model must process when volume or context size is too large.

Implementation Menu

CURATED DEFAULTS

Name	Kind	When	Maturity
LoRA via PEFT	library	adapting open-weights models on modest GPU budgets	established
Axolotl	library	config-driven fine-tuning runs without custom training code	established
Provider fine-tuning APIs	service	no training infra; tune a hosted model on prepared examples	established

Observed in Production

7 APPS

TechnologyGROUNDED

Supervised fine-tuning

State of Practice

Observed Practices

Where Operators Converge

Where Operators Diverge

Watch Items

Implementation Menu

Observed in Production

LLM Application Quality Assurance

AI-Assisted Content and Metadata Data Collection

AI-Assisted Product and Developer Collaboration Workflows

Automated Quality Image Tagging and Cataloging

Compute-Efficient Media Preview and Qwen Journey Inference Optimization

Enterprise Search Synthetic Evaluation Data Generation

Monorepo Incident Root Cause Identification