Preference optimization

Model Adaptation

0APPLICATIONS

0OBSERVED OPERATORS

Implementation Menu

CURATED DEFAULTS

Name	Kind	When	Maturity
DPO via TRL	library	pairwise preference data exists; simpler and stabler than RLHF	established
KTO	pattern	only thumbs-up/down signals exist, not pairwise comparisons	emerging

0 APPS

No published applications observed using this technique yet.

Teardown coverage accrues forward — the taxonomy is the map, the count is the honest state of it.