Skip to content

CANNOT_ASSESS Handling

Configuration for handling criteria that cannot be assessed due to insufficient evidence.

Overview

When a judge lacks evidence to determine whether a criterion is met, it may return CANNOT_ASSESS instead of MET or UNMET. This module provides configuration options for how these uncertain verdicts affect scoring.

Research Background

A recurring recommendation across LLM-as-a-judge research is to include an explicit "cannot assess / insufficient information" option. Forcing binary verdicts when evidence is insufficient leads to unreliable evaluations. Min et al. (2023) demonstrate in FActScore that atomic fact verification must explicitly handle cases where claims cannot be verified.

Usage

from autorubric import CannotAssessConfig, CannotAssessStrategy, LLMConfig
from autorubric.graders import CriterionGrader

# Default: skip unassessable criteria (adjust denominator)
grader = CriterionGrader(
    llm_config=LLMConfig(model="openai/gpt-4.1-mini"),
)

# Be conservative: treat cannot-assess as failure
grader = CriterionGrader(
    llm_config=LLMConfig(model="openai/gpt-4.1-mini"),
    cannot_assess_config=CannotAssessConfig(strategy=CannotAssessStrategy.FAIL),
)

# Give partial credit (30%)
grader = CriterionGrader(
    llm_config=LLMConfig(model="openai/gpt-4.1-mini"),
    cannot_assess_config=CannotAssessConfig(
        strategy=CannotAssessStrategy.PARTIAL,
        partial_credit=0.3
    ),
)

Strategies

Strategy Description
SKIP Exclude from scoring (adjust denominator) - default
ZERO Treat as 0 contribution (same as UNMET)
PARTIAL Treat as partial credit (configurable fraction)
FAIL Treat as worst case (UNMET for positive, MET for negative weights)

CannotAssessConfig

CannotAssessConfig

Bases: BaseModel

Configuration for handling CANNOT_ASSESS verdicts.

ATTRIBUTE DESCRIPTION
strategy

How to handle CANNOT_ASSESS verdicts in score calculation. Default is SKIP, which excludes unassessable criteria from scoring.

TYPE: CannotAssessStrategy

partial_credit

Fraction of weight to award when strategy is PARTIAL. Must be between 0.0 and 1.0. Default is 0.5.

TYPE: float

Example

Default: skip unassessable criteria

config = CannotAssessConfig()

Be conservative: treat cannot-assess as failure

config = CannotAssessConfig(strategy=CannotAssessStrategy.FAIL)

Give partial credit

config = CannotAssessConfig( ... strategy=CannotAssessStrategy.PARTIAL, ... partial_credit=0.3 ... )


CannotAssessStrategy

CannotAssessStrategy

Bases: str, Enum

Strategy for handling CANNOT_ASSESS verdicts in score calculation.

  • SKIP: Exclude the criterion from scoring entirely (adjust denominator)
  • ZERO: Treat as 0 contribution (same as UNMET for positive criteria)
  • PARTIAL: Treat as partial credit (configurable fraction)
  • FAIL: Treat as worst case (UNMET for positive, MET for negative)

CannotAssessMode

Used in metrics computation to specify how CANNOT_ASSESS verdicts should be handled when comparing against ground truth.

CannotAssessMode module-attribute

CannotAssessMode = Literal['exclude', 'as_unmet', 'as_category']

How to handle CANNOT_ASSESS verdicts in metric calculations.

  • "exclude": Skip items with CA verdicts from metric calculation (default)
  • "as_unmet": Treat CA as UNMET for agreement calculation
  • "as_category": Treat CA as a distinct third category (3-class classification)

References

Min, S., Krishna, K., Lyu, X., Lewis, M., Yih, W., Koh, P. W., Iyyer, M., Zettlemoyer, L., and Hajishirzi, H. (2023). FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 12076–12100.