Class ToolArgumentHallucinationEvaluator

java.lang.Object
dev.dokimos.core.BaseEvaluator
dev.dokimos.core.evaluators.agents.ToolArgumentHallucinationEvaluator
All Implemented Interfaces:
Evaluator

public class ToolArgumentHallucinationEvaluator extends BaseEvaluator
Uses a judge LLM to assess whether tool call argument values are factually grounded in the user's input and preceding tool call results.

This is a glass-box evaluator for tool proficiency. For each tool call, the judge evaluates whether argument values can be derived from the user's request or from the results of earlier tool calls in the same execution. This supports multi-step agent workflows where later tool arguments are derived from earlier tool results (e.g., a search returns URLs, then a fetch tool uses one of those URLs).

When ToolCall.result() is populated, the result is included as grounding context for subsequent tool calls. When result is null, only the user input is considered as grounding context.

The score is the fraction of non-hallucinated tool calls (0.0 to 1.0).