Working with Grading Explanations¶
Learn how to access, display, and use per-criterion explanations from rubric grading.
The Scenario¶
You're building an automated essay feedback system. Students submit essays, and you want to provide not just a score, but per-criterion feedback explaining why each requirement was met or not met. AutoRubric's grading produces these explanations automatically — you just need to access them.
What You'll Learn¶
- Accessing
final_reasonfrom grading results - Formatting explanations for student feedback
- Working with ensemble explanations (combined judge reasons)
- Filtering and categorizing reasons programmatically
The Solution¶
flowchart LR
A[Submission] --> B[CriterionGrader]
B --> C{Mode}
C -->|Single Judge| D[One Reason per Criterion]
C -->|Ensemble| E[Multiple Judge Reasons]
E --> F[Aggregated final_reason]
D --> G[EnsembleCriterionReport]
F --> G
Step 1: Grade and Access Explanations¶
Every grading result contains a report — a list of EnsembleCriterionReport objects, each with a final_reason field:
import asyncio
from autorubric import Rubric, LLMConfig
from autorubric.graders import CriterionGrader
rubric = Rubric.from_dict([
{"name": "causes", "weight": 30.0, "requirement": "Identifies at least 2 major causes of the Industrial Revolution"},
{"name": "effects", "weight": 30.0, "requirement": "Describes at least 2 major effects of the Industrial Revolution"},
{"name": "structure", "weight": 12.0, "requirement": "Clear essay structure with introduction and logical flow"},
{"name": "errors", "weight": -15.0, "requirement": "Contains significant factual errors"},
])
grader = CriterionGrader(
llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
)
async def main():
result = await rubric.grade(
to_grade="The Industrial Revolution began in Britain around 1760...",
grader=grader,
query="Explain the causes and effects of the Industrial Revolution.",
)
for cr in result.report:
verdict = cr.final_verdict.value if cr.final_verdict else "N/A"
name = cr.criterion.name or "unnamed"
print(f"[{verdict}] {name}: {cr.final_reason}")
asyncio.run(main())
Single vs. Ensemble Explanations
With a single judge, final_reason is the judge's direct explanation. With an ensemble,
final_reason concatenates all judges' reasons with a pipe separator, and individual
verdicts are accessible through cr.votes. Choose ensemble when you need multiple
perspectives or higher reliability.
Step 2: Format as Student Feedback¶
Structure the explanations into a readable feedback report:
def format_feedback(result):
lines = [f"Overall Score: {result.score:.0%}\n"]
met = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "MET" and cr.criterion.weight > 0]
unmet = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "UNMET" and cr.criterion.weight > 0]
errors = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "MET" and cr.criterion.weight < 0]
if met:
lines.append("Strengths:")
for cr in met:
lines.append(f" + {cr.criterion.name}: {cr.final_reason}")
if unmet:
lines.append("\nAreas for Improvement:")
for cr in unmet:
lines.append(f" - {cr.criterion.name}: {cr.final_reason}")
if errors:
lines.append("\nErrors Found:")
for cr in errors:
lines.append(f" ! {cr.criterion.name}: {cr.final_reason}")
return "\n".join(lines)
Step 3: Ensemble Explanations¶
When using ensemble judging, final_reason combines all judges' explanations with a pipe (|) separator:
from autorubric.graders import CriterionGrader, JudgeSpec
grader = CriterionGrader(
judges=[
JudgeSpec(LLMConfig(model="openai/gpt-4.1-mini"), "gpt"),
JudgeSpec(LLMConfig(model="anthropic/claude-sonnet-4-5-20250929"), "claude"),
],
aggregation="majority",
)
result = await rubric.grade(to_grade=essay, grader=grader, query=prompt)
for cr in result.report:
# Individual judge reasons are pipe-separated
judge_reasons = cr.final_reason.split(" | ")
print(f"[{cr.final_verdict.value}] {cr.criterion.name}")
for i, reason in enumerate(judge_reasons):
print(f" Judge {i + 1}: {reason}")
# Individual votes are also available
for vote in cr.votes:
print(f" {vote.judge_id}: {vote.verdict.value} — {vote.reason}")
Step 4: Programmatic Filtering¶
Extract specific explanations for downstream use:
def get_unmet_feedback(result):
"""Extract reasons for criteria that were not met."""
return {
cr.criterion.name: cr.final_reason
for cr in result.report
if cr.final_verdict and cr.final_verdict.value == "UNMET" and cr.criterion.weight > 0
}
def get_error_explanations(result):
"""Extract explanations for detected errors (negative-weight criteria that were MET)."""
return {
cr.criterion.name: cr.final_reason
for cr in result.report
if cr.final_verdict and cr.final_verdict.value == "MET" and cr.criterion.weight < 0
}
Negative-Weight Criteria and MET Verdicts
For negative-weight criteria like errors, a MET verdict means the undesirable behavior
was detected -- the submission contains the problem described in the requirement. The
final_reason then explains what the error is, not what was done well. Filter these
separately when building feedback reports.
Key Takeaways¶
| Concept | Single Judge | Ensemble |
|---|---|---|
final_reason |
Judge's direct explanation | All judges' reasons joined with \| |
| Individual votes | One verdict in cr.votes |
Multiple verdicts in cr.votes, one per judge |
final_verdict |
Same as the judge's verdict | Aggregated verdict (e.g., majority vote) |
| Negative-weight MET | Reason explains the detected problem | Each judge's reason for detecting the problem |
| Access pattern | cr.final_reason directly |
Split on \| or iterate cr.votes |
Going Further¶
- Ensemble Judging — Get multiple perspectives on each criterion
- Extended Thinking — Enable deeper reasoning for complex evaluations
- API Reference: Core Grading — Full
CriterionReportandEnsembleCriterionReportdocs
Appendix: Complete Code¶
"""Working with Grading Explanations - Essay Feedback System"""
import asyncio
from pathlib import Path
from autorubric import LLMConfig
from autorubric.dataset import RubricDataset
from autorubric.graders import CriterionGrader
DATASET_PATH = Path(__file__).parent / "examples" / "data" / "essay_grading_dataset.json"
def format_feedback(result):
"""Format grading result as student-readable feedback."""
lines = [f"Overall Score: {result.score:.0%}\n"]
met = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "MET" and cr.criterion.weight > 0]
unmet = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "UNMET" and cr.criterion.weight > 0]
errors = [cr for cr in result.report if cr.final_verdict and cr.final_verdict.value == "MET" and cr.criterion.weight < 0]
if met:
lines.append("Strengths:")
for cr in met:
lines.append(f" + {cr.criterion.name}: {cr.final_reason}")
if unmet:
lines.append("\nAreas for Improvement:")
for cr in unmet:
lines.append(f" - {cr.criterion.name}: {cr.final_reason}")
if errors:
lines.append("\nErrors Found:")
for cr in errors:
lines.append(f" ! {cr.criterion.name}: {cr.final_reason}")
return "\n".join(lines)
async def main():
dataset = RubricDataset.from_file(DATASET_PATH)
grader = CriterionGrader(
llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
)
item = dataset.items[0]
rubric = dataset.get_item_rubric(0)
prompt = dataset.get_item_prompt(0)
print(f"Prompt: {prompt}")
print(f"Submission: {item.description}")
print("=" * 70)
result = await rubric.grade(
to_grade=item.submission,
grader=grader,
query=prompt,
)
print(format_feedback(result))
if __name__ == "__main__":
asyncio.run(main())