Cookbook¶
Practical recipes for evaluating text outputs with AutoRubric. Each recipe solves a specific real-world scenario with focused code snippets and complete runnable examples.
Recipe Index¶
Tier 1: Foundation¶
Start here if you're new to AutoRubric.
| Recipe | Domain | What You'll Learn |
|---|---|---|
| Your First Evaluation | Tech Support | Basic rubric creation and grading |
| Managing Datasets | Medical Triage | Loading, saving, and splitting datasets |
Tier 2: Reliability¶
Improve grading consistency and accuracy.
| Recipe | Domain | What You'll Learn |
|---|---|---|
| Few-Shot Calibration | Legal Contracts | Calibrating judges with labeled examples |
| Ensemble Judging | Job Applications | Multi-judge voting for high-stakes decisions |
| Handling CANNOT_ASSESS | RAG Responses | Strategies for uncertain verdicts |
Tier 3: Advanced Evaluation¶
Sophisticated evaluation techniques.
| Recipe | Domain | What You'll Learn |
|---|---|---|
| Multi-Choice Rubrics | Restaurant Reviews | Ordinal/nominal scales with Likert ratings |
| Extended Thinking | Security Assessments | Deep reasoning for complex evaluations |
| Length Penalty | Executive Summaries | Penalizing verbose responses |
Tier 4: Validation & Production¶
Deploy with confidence.
| Recipe | Domain | What You'll Learn |
|---|---|---|
| Evaluating Rubric Quality | Peer Review | Meta-rubrics to validate and improve rubrics |
| Automated Rubric Improvement | EV Analysis | LLM-driven iterative refinement of rubrics |
| Judge Validation | Content Moderation | Measuring agreement with human labels |
| Synthetic Ground Truth | Product Descriptions | Bootstrapping labels from strong models |
| Batch Evaluation | Customer Feedback | Checkpointing, resumption, and cost tracking |
Tier 5: Specialized¶
Advanced patterns for specific needs.
| Recipe | Domain | What You'll Learn |
|---|---|---|
| Per-Item Rubrics | Coding Interviews | Different rubrics for different items |
| Cost Optimization | News Fact-Checking | Caching and model selection strategies |
| Configuration Management | Academic Papers | Sharing reproducible configs across teams |
Quick Start¶
If you haven't installed AutoRubric yet:
Set up your API key for your preferred provider:
export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here
# or
export GEMINI_API_KEY=your_key_here
Then jump into Your First Evaluation to get started.
Recipe Format¶
Each recipe follows a consistent structure:
- The Scenario - A realistic problem you might face
- What You'll Learn - Key features and concepts covered
- The Solution - Step-by-step implementation with focused code snippets
- Key Takeaways - Summary of important points
- Appendix: Complete Code - Full runnable script you can copy-paste
Prerequisites¶
All recipes assume:
- Python 3.10+
- AutoRubric installed (
pip install autorubric) - An API key for at least one supported provider
- Basic familiarity with async/await (recipes use
asyncio.run()for simplicity)