Notes
Notes on retrieval, evals, observability, and the engineering that starts once the demo is the easy part.
LLM evals for continuous delivery: turn production failures into automated tests, grade traces with task-specific graders, and block bad releases with eval-driven gates.