Agent Outcome Rubrics Are Just Prompts: How to Write Grading Criteria That Actually Fail Bad Output
Anthropic shipped the grading loop -- the rubric is now the hardest prompt you'll write
2 articles
Anthropic shipped the grading loop -- the rubric is now the hardest prompt you'll write
Why agents invoke wrong tools and the prompt structure that fixes it