Agent Outcome Rubrics Are Just Prompts: How to Write Grading Criteria That Actually Fail Bad Output
Anthropic shipped the grading loop -- the rubric is now the hardest prompt you'll write
1 article
Anthropic shipped the grading loop -- the rubric is now the hardest prompt you'll write