Class ToolCorrectnessEvaluator

java.lang.Object
dev.dokimos.core.BaseEvaluator
dev.dokimos.core.evaluators.agents.ToolCorrectnessEvaluator
All Implemented Interfaces:
Evaluator

public class ToolCorrectnessEvaluator extends BaseEvaluator
Checks whether the agent used the expected set of tools.

This is a glass-box evaluator for tool proficiency that compares actual tool calls against expected tool calls. The score is the F1-score of tool name sets, balancing precision and recall. No LLM is required.

Supports multiple match modes:

  • NAMES_ONLY — compares tool name sets (default)
  • NAMES_AND_ORDER — also checks invocation order
  • NAMES_AND_ARGS — full structural comparison including arguments
  • Method Details