Skip to content

Working with Grading Explanations

Learn how to access, display, and use per-criterion explanations from rubric grading.

The Scenario

You're building an automated essay feedback system. Students submit essays, and you want to provide not just a score, but per-criterion feedback explaining why each requirement was met or not met. AutoRubric's grading produces these explanations automatically — you just need to access them.

What You'll Learn

  • Accessing reason from grading results
  • Formatting explanations for student feedback
  • Working with ensemble explanations (combined judge reasons)
  • Filtering and categorizing reasons programmatically

The Solution

flowchart LR
    A[Submission] --> B[CriterionGrader]
    B --> C{Mode}
    C -->|Single Judge| D[One Reason per Criterion]
    C -->|Ensemble| E[Multiple Judge Reasons]
    E --> F[Aggregated final_reason]
    D --> G[CriterionReport]
    F --> H[EnsembleCriterionReport]

Step 1: Grade and Access Explanations

Every grading result contains a report — a list of CriterionReport objects, each with a reason field:

import asyncio
from autorubric import Rubric, LLMConfig
from autorubric.graders import CriterionGrader

rubric = Rubric.from_dict([
    {"name": "causes", "weight": 30.0, "requirement": "Identifies at least 2 major causes of the Industrial Revolution"},
    {"name": "effects", "weight": 30.0, "requirement": "Describes at least 2 major effects of the Industrial Revolution"},
    {"name": "structure", "weight": 12.0, "requirement": "Clear essay structure with introduction and logical flow"},
    {"name": "errors", "weight": -15.0, "requirement": "Contains significant factual errors"},
])

grader = CriterionGrader(
    llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
)

async def main():
    result = await rubric.grade(
        to_grade="The Industrial Revolution began in Britain around 1760...",
        grader=grader,
        query="Explain the causes and effects of the Industrial Revolution.",
    )

    for cr in result.report:
        verdict = cr.verdict.value if cr.verdict else "N/A"
        name = cr.name or "unnamed"
        print(f"[{verdict}] {name}: {cr.reason}")

asyncio.run(main())

Single vs. Ensemble Explanations

With a single judge, reason is the judge's direct explanation. With an ensemble, final_reason concatenates all judges' reasons with a pipe separator, and individual verdicts are accessible through cr.votes. Choose ensemble when you need multiple perspectives or higher reliability.

Step 2: Format as Student Feedback

Structure the explanations into a readable feedback report:

def format_feedback(result):
    lines = [f"Overall Score: {result.score:.0%}\n"]

    met = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight > 0]
    unmet = [cr for cr in result.report if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0]
    errors = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0]

    if met:
        lines.append("Strengths:")
        for cr in met:
            lines.append(f"  + {cr.name}: {cr.reason}")

    if unmet:
        lines.append("\nAreas for Improvement:")
        for cr in unmet:
            lines.append(f"  - {cr.name}: {cr.reason}")

    if errors:
        lines.append("\nErrors Found:")
        for cr in errors:
            lines.append(f"  ! {cr.name}: {cr.reason}")

    return "\n".join(lines)

Step 3: Ensemble Explanations

When using ensemble judging, final_reason combines all judges' explanations with a pipe (|) separator:

from autorubric.graders import CriterionGrader, JudgeSpec

grader = CriterionGrader(
    judges=[
        JudgeSpec(LLMConfig(model="openai/gpt-4.1-mini"), "gpt"),
        JudgeSpec(LLMConfig(model="anthropic/claude-sonnet-4-5-20250929"), "claude"),
    ],
    aggregation="majority",
)

result = await rubric.grade(to_grade=essay, grader=grader, query=prompt)

for cr in result.report:
    # Individual judge reasons are pipe-separated
    judge_reasons = cr.final_reason.split(" | ")
    print(f"[{cr.final_verdict.value}] {cr.criterion.name}")
    for i, reason in enumerate(judge_reasons):
        print(f"  Judge {i + 1}: {reason}")

    # Individual votes are also available
    for vote in cr.votes:
        print(f"  {vote.judge_id}: {vote.verdict.value}{vote.reason}")

Step 4: Programmatic Filtering

Extract specific explanations for downstream use:

def get_unmet_feedback(result):
    """Extract reasons for criteria that were not met."""
    return {
        cr.name: cr.reason
        for cr in result.report
        if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0
    }

def get_error_explanations(result):
    """Extract explanations for detected errors (negative-weight criteria that were MET)."""
    return {
        cr.name: cr.reason
        for cr in result.report
        if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0
    }

Negative-Weight Criteria and MET Verdicts

For negative-weight criteria like errors, a MET verdict means the undesirable behavior was detected -- the submission contains the problem described in the requirement. The reason then explains what the error is, not what was done well. Filter these separately when building feedback reports.

Key Takeaways

Concept Single Judge Ensemble
Reason cr.reason — judge's direct explanation cr.final_reason — all judges' reasons joined with \|
Individual votes One verdict in cr.votes Multiple verdicts in cr.votes, one per judge
Verdict cr.verdict — the judge's verdict cr.final_verdict — aggregated verdict (e.g., majority vote)
Criterion access cr.name, cr.weight cr.criterion.name, cr.criterion.weight
Negative-weight MET Reason explains the detected problem Each judge's reason for detecting the problem
Access pattern cr.reason directly Split cr.final_reason on \| or iterate cr.votes

Going Further


Appendix: Complete Code

"""Working with Grading Explanations - Essay Feedback System"""

import asyncio
from pathlib import Path

from autorubric import LLMConfig
from autorubric.dataset import RubricDataset
from autorubric.graders import CriterionGrader

DATASET_PATH = Path(__file__).parent / "examples" / "data" / "essay_grading_dataset.json"


def format_feedback(result):
    """Format grading result as student-readable feedback."""
    lines = [f"Overall Score: {result.score:.0%}\n"]

    met = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight > 0]
    unmet = [cr for cr in result.report if cr.verdict and cr.verdict.value == "UNMET" and cr.weight > 0]
    errors = [cr for cr in result.report if cr.verdict and cr.verdict.value == "MET" and cr.weight < 0]

    if met:
        lines.append("Strengths:")
        for cr in met:
            lines.append(f"  + {cr.name}: {cr.reason}")

    if unmet:
        lines.append("\nAreas for Improvement:")
        for cr in unmet:
            lines.append(f"  - {cr.name}: {cr.reason}")

    if errors:
        lines.append("\nErrors Found:")
        for cr in errors:
            lines.append(f"  ! {cr.name}: {cr.reason}")

    return "\n".join(lines)


async def main():
    dataset = RubricDataset.from_file(DATASET_PATH)

    grader = CriterionGrader(
        llm_config=LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
    )

    item = dataset.items[0]
    rubric = dataset.get_item_rubric(0)
    prompt = dataset.get_item_prompt(0)

    print(f"Prompt: {prompt}")
    print(f"Submission: {item.description}")
    print("=" * 70)

    result = await rubric.grade(
        to_grade=item.submission,
        grader=grader,
        query=prompt,
    )

    print(format_feedback(result))


if __name__ == "__main__":
    asyncio.run(main())