TECHNIQUE
Evaluation
| Name | Kind | When | Maturity |
|---|---|---|---|
| promptfoo | library | declarative golden-set evals wired into CI | established |
| Ragas | library | RAG-specific metrics: faithfulness, context precision/recall | established |
| pytest-style eval harness | pattern | evals as code in the existing test runner and review flow | commodity |