Skip to content

Meta-Rubric Evaluation

Utilities for assessing the quality of grading rubrics using meta-rubrics (rubrics for evaluating rubrics).

Overview

The autorubric.meta module provides tools to evaluate rubric quality before using them for actual grading. This helps identify issues like vague criteria, anti-patterns, or misalignment with the task being evaluated.

Two evaluation modes are supported:

Mode Function Use Case
Standalone evaluate_rubric_standalone() Evaluate rubric quality in isolation (clarity, structure, LLM-friendliness)
In-Context evaluate_rubric_in_context() Evaluate rubric quality relative to a specific task prompt

Quick Example

from autorubric import LLMConfig, Rubric
from autorubric.meta import evaluate_rubric_standalone, evaluate_rubric_in_context

llm_config = LLMConfig(model="openai/gpt-4.1-mini", temperature=0.0)
rubric = Rubric.from_file("my_rubric.json")

# Standalone evaluation with terminal output
result = await evaluate_rubric_standalone(rubric, llm_config, display="stdout")
print(f"Rubric quality score: {result.score:.2f}")

# In-context evaluation with HTML report
result = await evaluate_rubric_in_context(
    rubric,
    task_prompt="Write a comprehensive summary of the research paper.",
    llm_config=llm_config,
    display="html",
    output_html_path="rubric_evaluation.html",
)

Display Modes

Both evaluation functions accept a display parameter:

Value Behavior
None No display output (default)
"stdout" Rich terminal output with colored tables
"html" Generates styled HTML report (requires output_html_path)

Terminal Output

result = await evaluate_rubric_standalone(rubric, llm_config, display="stdout")

Produces formatted output with:

  • Score summary panel
  • Per-section criterion tables with color-coded statuses
  • Issues found section with full feedback

HTML Output

result = await evaluate_rubric_in_context(
    rubric, task_prompt, llm_config,
    display="html",
    output_html_path="report.html"
)

Generates a self-contained HTML file with:

  • Dark theme styling
  • Visual grouping by rubric sections
  • Color-coded status badges (green for MET/CLEAR, red for UNMET/DETECTED)
  • Full issue descriptions

Meta-Rubric Criteria

Standalone Meta-Rubric

Evaluates intrinsic rubric quality across these sections:

Section Focus
Clarity & Precision Clear requirements, specific language, unidimensional criteria
Structure & Design Appropriate count, balanced weights, orthogonal criteria
LLM-Friendliness Independent verification, objective assessment
Anti-Patterns Double-barreled criteria, vague wording, circular definitions, excessive overlap

In-Context Meta-Rubric

Includes all standalone criteria plus:

Section Focus
Construct Alignment Task alignment, coverage of key aspects, appropriate emphasis
Discriminative Power Distinguishes quality levels, avoids trivial criteria
Anti-Patterns (In-Context) Irrelevant criteria, missing critical aspects

Accessing Meta-Rubrics Directly

You can load the meta-rubrics as Rubric objects for inspection or custom use:

from autorubric.meta import get_standalone_meta_rubric, get_in_context_meta_rubric

standalone = get_standalone_meta_rubric()
print(f"Standalone meta-rubric: {len(standalone.rubric)} criteria")

in_context = get_in_context_meta_rubric()
print(f"In-context meta-rubric: {len(in_context.rubric)} criteria")

evaluate_rubric_standalone

Evaluate a rubric's quality in isolation.

evaluate_rubric_standalone async

evaluate_rubric_standalone(rubric: Rubric, llm_config: LLMConfig, *, display: DisplayMode | None = None, output_html_path: Path | str | None = None) -> EnsembleEvaluationReport

Evaluate a rubric's quality in isolation using the standalone meta-rubric.

This evaluates the rubric's intrinsic quality without considering any specific task context. It checks for clarity, structure, LLM-friendliness, and common anti-patterns.

PARAMETER DESCRIPTION
rubric

The rubric to evaluate.

TYPE: Rubric

llm_config

LLM configuration for the evaluation.

TYPE: LLMConfig

display

Output format - None for no display, "stdout" for terminal, "html" for HTML file.

TYPE: DisplayMode | None DEFAULT: None

output_html_path

Path for HTML output (required when display="html").

TYPE: Path | str | None DEFAULT: None

RETURNS DESCRIPTION
EnsembleEvaluationReport

EnsembleEvaluationReport with score, raw_score, and per-criterion verdicts.

RAISES DESCRIPTION
ValueError

If display="html" but output_html_path is not provided.


evaluate_rubric_in_context

Evaluate a rubric's quality relative to a task prompt.

evaluate_rubric_in_context async

evaluate_rubric_in_context(rubric: Rubric, task_prompt: str, llm_config: LLMConfig, *, display: DisplayMode | None = None, output_html_path: Path | str | None = None) -> EnsembleEvaluationReport

Evaluate a rubric's quality in the context of a specific task.

This evaluates both intrinsic rubric quality and how well the rubric aligns with the given task prompt. It checks for task alignment, coverage of key aspects, and task-specific anti-patterns.

PARAMETER DESCRIPTION
rubric

The rubric to evaluate.

TYPE: Rubric

task_prompt

The task prompt the rubric is designed to evaluate.

TYPE: str

llm_config

LLM configuration for the evaluation.

TYPE: LLMConfig

display

Output format - None for no display, "stdout" for terminal, "html" for HTML file.

TYPE: DisplayMode | None DEFAULT: None

output_html_path

Path for HTML output (required when display="html").

TYPE: Path | str | None DEFAULT: None

RETURNS DESCRIPTION
EnsembleEvaluationReport

EnsembleEvaluationReport with score, raw_score, and per-criterion verdicts.

RAISES DESCRIPTION
ValueError

If display="html" but output_html_path is not provided.


get_standalone_meta_rubric

Load the standalone meta-rubric.

get_standalone_meta_rubric

get_standalone_meta_rubric() -> Rubric

Load the standalone meta-rubric for evaluating rubrics in isolation.


get_in_context_meta_rubric

Load the in-context meta-rubric.

get_in_context_meta_rubric

get_in_context_meta_rubric() -> Rubric

Load the in-context meta-rubric for evaluating rubrics with task context.