TECHNIQUE
Decision Optimization
| Name | Kind | When | Maturity |
|---|---|---|---|
| Stable-Baselines3 | library | reliable single-agent RL baselines (PPO/SAC) against a simulator | established |
| Ray RLlib | library | distributed RL training and multi-agent setups at cluster scale | established |
| Contextual bandits (Vowpal Wabbit) | library | online decisioning with off-policy evaluation where full RL is overkill | established |
No published applications observed using this technique yet.
Teardown coverage accrues forward — the taxonomy is the map, the count is the honest state of it.
Back to the technique map