Working with Grading Explanations¶
Learn how to access, display, and use per-criterion explanations from rubric grading.
The Scenario¶
You're building an automated essay feedback system. Students submit essays, and you want to provide not just a score, but per-criterion feedback explaining why each requirement was met or not met. AutoRubric's grading produces these explanations automatically — you just need to access them.
What You'll Learn¶
- Accessing
reasonfrom grading results - Formatting explanations for student feedback
- Working with ensemble explanations (combined judge reasons)
- Filtering and categorizing reasons programmatically
The Solution¶
flowchart LR
A[Submission] --> B[CriterionGrader]
B --> C{Mode}
C -->|Single Judge| D[One Reason per Criterion]
C -->|Ensemble| E[Multiple Judge Reasons]
E --> F[Aggregated final_reason]
D --> G[CriterionReport]
F --> H[EnsembleCriterionReport]
Step 1: Grade and Access Explanations¶
Every grading result contains a report — a list of CriterionReport objects, each with a reason field:
import asyncio
from autorubric import Rubric, LLMConfig
from autorubric.graders import CriterionGrader
rubric = Rubric.from_dict([
{"name": "causes", "weight": 30.0, "requirement": "Identifies at least 2 major causes of the Industrial Revolution"},
{"name": "effects", "weight": 30.0, "requirement": "Describes at least 2 major effects of the Industrial Revolution"},
{"name": "structure", "weight": 12.0, "requirement": "Clear essay structure with introduction and logical flow"},
{"name": "errors", "weight": -15.0, "requirement": "Contains significant factual errors"},
])
grader = CriterionGrader(
llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
)
async def main():
result = await rubric.grade(
to_grade="The Industrial Revolution began in Britain around 1760...",
grader=grader,
query="Explain the causes and effects of the Industrial Revolution.",
)
for cr in result.report:
verdict = cr.verdict.value if cr.verdict else "N/A"
name = cr.name or "unnamed"
print(f"[{verdict}] {name}: {cr.reason}")
asyncio.run(main())
Single vs. Ensemble Explanations
With a single judge, reason is the judge's direct explanation. With an ensemble,
final_reason concatenates all judges' reasons with a pipe separator, and individual
verdicts are accessible through cr.votes. Choose ensemble when you need multiple
perspectives or higher reliability.
Step 2: Format as Student Feedback¶
Structure the explanations into a readable feedback report:
def format_feedback(result):
lines = [f"Overall Score: {result.score:.0%}\n"]
met = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight > 0]
unmet = [cr for cr in result.report if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0]
errors = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0]
if met:
lines.append("Strengths:")
for cr in met:
lines.append(f" + {cr.name}: {cr.reason}")
if unmet:
lines.append("\nAreas for Improvement:")
for cr in unmet:
lines.append(f" - {cr.name}: {cr.reason}")
if errors:
lines.append("\nErrors Found:")
for cr in errors:
lines.append(f" ! {cr.name}: {cr.reason}")
return "\n".join(lines)
Step 3: Ensemble Explanations¶
When using ensemble judging, final_reason combines all judges' explanations with a pipe (|) separator:
from autorubric.graders import CriterionGrader, JudgeSpec
grader = CriterionGrader(
judges=[
JudgeSpec(LLMConfig(model="openai/gpt-4.1-mini"), "gpt"),
JudgeSpec(LLMConfig(model="anthropic/claude-sonnet-4-5-20250929"), "claude"),
],
aggregation="majority",
)
result = await rubric.grade(to_grade=essay, grader=grader, query=prompt)
for cr in result.report:
# Individual judge reasons are pipe-separated
judge_reasons = cr.final_reason.split(" | ")
print(f"[{cr.final_verdict.value}] {cr.criterion.name}")
for i, reason in enumerate(judge_reasons):
print(f" Judge {i + 1}: {reason}")
# Individual votes are also available
for vote in cr.votes:
print(f" {vote.judge_id}: {vote.verdict.value} — {vote.reason}")
Step 4: Programmatic Filtering¶
Extract specific explanations for downstream use:
def get_unmet_feedback(result):
"""Extract reasons for criteria that were not met."""
return {
cr.name: cr.reason
for cr in result.report
if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0
}
def get_error_explanations(result):
"""Extract explanations for detected errors (negative-weight criteria that were MET)."""
return {
cr.name: cr.reason
for cr in result.report
if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0
}
Negative-Weight Criteria and MET Verdicts
For negative-weight criteria like errors, a MET verdict means the undesirable behavior
was detected -- the submission contains the problem described in the requirement. The
reason then explains what the error is, not what was done well. Filter these
separately when building feedback reports.
Key Takeaways¶
| Concept | Single Judge | Ensemble |
|---|---|---|
| Reason | cr.reason — judge's direct explanation |
cr.final_reason — all judges' reasons joined with \| |
| Individual votes | One verdict in cr.votes |
Multiple verdicts in cr.votes, one per judge |
| Verdict | cr.verdict — the judge's verdict |
cr.final_verdict — aggregated verdict (e.g., majority vote) |
| Criterion access | cr.name, cr.weight |
cr.criterion.name, cr.criterion.weight |
| Negative-weight MET | Reason explains the detected problem | Each judge's reason for detecting the problem |
| Access pattern | cr.reason directly |
Split cr.final_reason on \| or iterate cr.votes |
Going Further¶
- Ensemble Judging — Get multiple perspectives on each criterion
- Extended Thinking — Enable deeper reasoning for complex evaluations
- API Reference: Core Grading — Full
CriterionReportandEnsembleCriterionReportdocs
Appendix: Complete Code¶
"""Working with Grading Explanations - Essay Feedback System"""
import asyncio
from pathlib import Path
from autorubric import LLMConfig
from autorubric.dataset import RubricDataset
from autorubric.graders import CriterionGrader
DATASET_PATH = Path(__file__).parent / "examples" / "data" / "essay_grading_dataset.json"
def format_feedback(result):
"""Format grading result as student-readable feedback."""
lines = [f"Overall Score: {result.score:.0%}\n"]
met = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight > 0]
unmet = [cr for cr in result.report if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0]
errors = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0]
if met:
lines.append("Strengths:")
for cr in met:
lines.append(f" + {cr.name}: {cr.reason}")
if unmet:
lines.append("\nAreas for Improvement:")
for cr in unmet:
lines.append(f" - {cr.name}: {cr.reason}")
if errors:
lines.append("\nErrors Found:")
for cr in errors:
lines.append(f" ! {cr.name}: {cr.reason}")
return "\n".join(lines)
async def main():
dataset = RubricDataset.from_file(DATASET_PATH)
grader = CriterionGrader(
llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
)
item = dataset.items[0]
rubric = dataset.get_item_rubric(0)
prompt = dataset.get_item_prompt(0)
print(f"Prompt: {prompt}")
print(f"Submission: {item.description}")
print("=" * 70)
result = await rubric.grade(
to_grade=item.submission,
grader=grader,
query=prompt,
)
print(format_feedback(result))
if __name__ == "__main__":
asyncio.run(main())