Cookbook¶

Practical recipes for evaluating text outputs with AutoRubric. Each recipe solves a specific real-world scenario with focused code snippets and complete runnable examples.

Recipe Index¶

Tier 1: Foundation¶

Start here if you're new to AutoRubric.

Recipe	Domain	What You'll Learn
Your First Evaluation	Tech Support	Basic rubric creation and grading
Managing Datasets	Medical Triage	Loading, saving, and splitting datasets

| Working with Explanations | Essay Feedback | Accessing and formatting per-criterion reasons |

Tier 2: Reliability¶

Improve grading consistency and accuracy.

Recipe	Domain	What You'll Learn
Few-Shot Calibration	Legal Contracts	Calibrating judges with labeled examples
Ensemble Judging	Job Applications	Multi-judge voting for high-stakes decisions
Handling CANNOT_ASSESS	RAG Responses	Strategies for uncertain verdicts

Tier 3: Advanced Evaluation¶

Sophisticated evaluation techniques.

Recipe	Domain	What You'll Learn
Multi-Choice Rubrics	Restaurant Reviews	Ordinal/nominal scales with Likert ratings
Extended Thinking	Security Assessments	Deep reasoning for complex evaluations
Length Penalty	Executive Summaries	Penalizing verbose responses

Tier 4: Validation & Production¶

Deploy with confidence.

Recipe	Domain	What You'll Learn
Evaluating Rubric Quality	Peer Review	Meta-rubrics to validate and improve rubrics
Automated Rubric Improvement	EV Analysis	LLM-driven iterative refinement of rubrics
Judge Validation	Content Moderation	Measuring agreement with human labels
Synthetic Ground Truth	Product Descriptions	Bootstrapping labels from strong models
Batch Evaluation	Customer Feedback	Checkpointing, resumption, and cost tracking

Tier 5: Specialized¶

Advanced patterns for specific needs.

Recipe	Domain	What You'll Learn
Per-Item Rubrics	Coding Interviews	Different rubrics for different items
Cost Optimization	News Fact-Checking	Caching and model selection strategies
Configuration Management	Academic Papers	Sharing reproducible configs across teams
Evaluating Agent Skills	Peer Review	Skill evaluation with with/without-skill comparison

Quick Start¶

If you haven't installed AutoRubric yet:

pip install autorubric

Set up your API key for your preferred provider:

export OPENAI_API_KEY=your_key_here
# or
export ANTHROPIC_API_KEY=your_key_here
# or
export GEMINI_API_KEY=your_key_here

Then jump into Your First Evaluation to get started.

Recipe Format¶

Each recipe follows a consistent structure:

The Scenario - A realistic problem you might face
What You'll Learn - Key features and concepts covered
The Solution - Step-by-step implementation with focused code snippets
Key Takeaways - Summary of important points
Appendix: Complete Code - Full runnable script you can copy-paste

Prerequisites¶

All recipes assume:

Python 3.11+
AutoRubric installed (pip install autorubric)
An API key for at least one supported provider
Basic familiarity with async/await (recipes use asyncio.run() for simplicity)