dev.dokimos.springai.alibaba.SpringAiAlibabaSupport

public final class SpringAiAlibabaSupport extends Object

Utilities for evaluating Spring AI Alibaba (graph/agent) runs with the Dokimos agent evaluators.

Spring AI Alibaba's graph runtime carries its whole conversation as standard Spring AI message types (AssistantMessage, ToolResponseMessage, and UserMessage) under the OverAllState key "messages". Because those are the exact types Dokimos already converts in SpringAiSupport, this class is a thin adapter: it unwraps the List<Message> from the graph state and folds the full multi-turn conversation into a single AgentTrace, delegating tool-call and tool-definition conversion to SpringAiSupport.

Per-turn windowing

Tool-call results are correlated per turn: each AssistantMessage that issues tool calls is matched only against the ToolResponseMessages that follow it, up to the next AssistantMessage. This avoids silently binding a tool call to the wrong result if a sub-agent or loop reuses a tool-call id across turns.

Judges and tasks

This class deliberately does not provide asJudge or a plain asyncTask: Spring AI Alibaba agents run on a standard Spring AI ChatModel/ChatClient, so use SpringAiSupport.asJudge(org.springframework.ai.chat.model.ChatModel) and SpringAiSupport.asyncTask(org.springframework.ai.chat.client.ChatClient) directly.

Measured tasks

For metrics capture there is one important exception. A graph run does not expose Spring AI Usage: a ReactAgent run returns a bare AssistantMessage (or the OverAllState message list), and a Spring AI AssistantMessage carries no typed token usage — usage lives only on a ChatResponse, which the graph fold never surfaces. So unlike SpringAiSupport.measuredAsyncTask(org.springframework.ai.chat.client.ChatClient, String, dev.dokimos.core.PriceTable) (which reads usage off ChatResponse), the agent path here cannot read usage from its own result. measuredAsyncTask(Function, String, PriceTable) therefore follows the decoupled carrier pattern: it auto-times wall-clock latency around the run and lets you supply the token counts you obtain from your own Alibaba setup (a usage callback on the underlying ChatModel, the response metadata, etc.) via SpringAiAlibabaSupport.AlibabaAgentResponse, computing cost through an optional PriceTable. When you drive a plain ChatClient instead of a graph, prefer SpringAiSupport.measuredAsyncTask(...) so usage is captured automatically.

Example


 ReactAgent agent = ReactAgent.builder()
         .name("assistant")
         .chatClient(chatClient)
         .tools(toolCallbacks)
         .build();

 AgentTrace trace = SpringAiAlibabaSupport.toAgentTrace(
         agent, Map.of("messages", List.of(new UserMessage("...")), null));

 EvalTestCase testCase = trace.toTestCase(
         "user question",
         SpringAiAlibabaSupport.toToolDefinitions(toolCallbacks));

 EvalResult result = ToolCallValidityEvaluator.builder().build().evaluate(testCase);

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

SpringAiAlibabaSupport.AlibabaAgentResponse

A Spring AI Alibaba agent run's output text paired with optional token usage, consumed by measuredAsyncTask(Function, String, PriceTable).
Field Summary

Fields

Modifier and Type

Field

Description

static final String

MESSAGES_KEY

The Spring AI Alibaba graph-state key under which the message list is stored.
Method Summary

Modifier and Type

Method

Description

static AsyncTask

measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices)

Creates a measured AsyncTask that runs a Spring AI Alibaba agent and captures latency automatically and, when a PriceTable and model id are supplied, cost, lighting up the run's metrics cards.

static AsyncTask

measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices, Executor executor)

Creates a measured AsyncTask that runs a Spring AI Alibaba agent on the supplied Executor so you control and isolate concurrency, with a non-null executor required.

static List<org.springframework.ai.chat.messages.Message>

messages(com.alibaba.cloud.ai.graph.OverAllState state)

Extracts the raw Spring AI Message list from a graph state.

static AgentTrace

toAgentTrace(com.alibaba.cloud.ai.graph.agent.ReactAgent agent, Map<String,Object> inputs, com.alibaba.cloud.ai.graph.RunnableConfig config)

Runs a ReactAgent's compiled graph and folds the resulting state into a single AgentTrace.

static AgentTrace

toAgentTrace(com.alibaba.cloud.ai.graph.OverAllState state)

Folds the full multi-turn conversation of a graph run into a single AgentTrace.

static AgentTrace

toAgentTrace(Optional<com.alibaba.cloud.ai.graph.OverAllState> state)

Folds the optional graph state returned by CompiledGraph.invoke(Map) into a single AgentTrace.

static List<ToolCall>

toToolCalls(com.alibaba.cloud.ai.graph.OverAllState state)

Extracts every tool call across all turns of a graph run, correlating each call to its result with per-turn windowing.

static List<ToolDefinition>

toToolDefinitions(List<org.springframework.ai.tool.ToolCallback> callbacks)

Converts the ToolCallbacks an agent was built with into Dokimos ToolDefinitions, so tool calls can be evaluated against the tools the agent had available.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MESSAGES_KEY
  
  public static final String MESSAGES_KEY
  
  The Spring AI Alibaba graph-state key under which the message list is stored.
  See Also:
  
  Constant Field Values
Method Details
- messages
  
  public static List<org.springframework.ai.chat.messages.Message> messages(com.alibaba.cloud.ai.graph.OverAllState state)
  
  Extracts the raw Spring AI Message list from a graph state.
  Null-tolerant: a null state, an absent "messages" key, or a value that is not a List yields an empty list. Elements that are not Messages (including unknown subtypes that do not implement Message) are skipped. Never throws.
  
  Parameters:
  
  state - the graph state (may be null)
  
  Returns:
  
  the messages in order, or an empty list
- toToolCalls
  
  public static List<ToolCall> toToolCalls(com.alibaba.cloud.ai.graph.OverAllState state)
  
  Extracts every tool call across all turns of a graph run, correlating each call to its result with per-turn windowing.
  For each AssistantMessage that issues tool calls, the following ToolResponseMessages (up to the next AssistantMessage) form the window used to resolve results, via SpringAiSupport.toToolCalls(AssistantMessage, List). Calls with no matching response in their window have a null result.
  
  Parameters:
  
  state - the graph state (may be null)
  
  Returns:
  
  the tool calls in execution order, or an empty list
- toAgentTrace
  
  public static AgentTrace toAgentTrace(com.alibaba.cloud.ai.graph.OverAllState state)
  
  Folds the full multi-turn conversation of a graph run into a single AgentTrace.
  The trace's tool calls come from toToolCalls(OverAllState) (per-turn windowing); its final response is the text of the last AssistantMessage in the conversation, when that text is non-blank.
  
  Parameters:
  
  state - the graph state (may be null)
  
  Returns:
  
  an agent trace, never null
- toAgentTrace
  
  public static AgentTrace toAgentTrace(Optional<com.alibaba.cloud.ai.graph.OverAllState> state)
  
  Folds the optional graph state returned by CompiledGraph.invoke(Map) into a single AgentTrace.
  An empty optional yields an empty trace (no tool calls, no final response).
  
  Parameters:
  
  state - the optional graph state (may be null)
  
  Returns:
  
  an agent trace, never null
- toAgentTrace
  
  public static AgentTrace toAgentTrace(com.alibaba.cloud.ai.graph.agent.ReactAgent agent, Map<String,Object> inputs, com.alibaba.cloud.ai.graph.RunnableConfig config) throws com.alibaba.cloud.ai.graph.exception.GraphStateException
  
  Runs a ReactAgent's compiled graph and folds the resulting state into a single AgentTrace.
  This is the full-fidelity one-liner: it invokes the compiled graph (which preserves every intermediate tool call) rather than a lossy single-shot call. CompiledGraph.invoke(Map, RunnableConfig) returns Optional<OverAllState>, which is folded by toAgentTrace(Optional).
  
  Parameters:
  
  agent - the agent whose compiled graph is run, never null
  
  inputs - the graph inputs (for example the initial "messages" list), never null
  
  config - the run configuration, or null to invoke without one
  
  Returns:
  
  an agent trace, never null
  
  Throws:
  
  com.alibaba.cloud.ai.graph.exception.GraphStateException - if the agent's graph cannot be compiled
- toToolDefinitions
  
  public static List<ToolDefinition> toToolDefinitions(List<org.springframework.ai.tool.ToolCallback> callbacks)
  
  Converts the ToolCallbacks an agent was built with into Dokimos ToolDefinitions, so tool calls can be evaluated against the tools the agent had available.
  Delegates to SpringAiSupport.toToolDefinitions(List) after pulling each callback's getToolDefinition(). A null or empty list yields an empty list.
  
  Parameters:
  
  callbacks - the tool callbacks supplied to the agent (may be null)
  
  Returns:
  
  the Dokimos tool definitions, or an empty list
- measuredAsyncTask
  
  public static AsyncTask measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices)
  Creates a measured AsyncTask that runs a Spring AI Alibaba agent and captures latency automatically and, when a PriceTable and model id are supplied, cost, lighting up the run's metrics cards.
  Decoupled-carrier counterpart to SpringAiSupport.measuredAsyncTask(org.springframework.ai.chat.client.ChatClient, String, PriceTable): a graph/agent run does not expose Spring AI Usage (see the class-level "Measured tasks" note), so the supplied agentCall returns an SpringAiAlibabaSupport.AlibabaAgentResponse carrying the output text plus any token counts you extracted from your own setup. Wall-clock latency is measured around the call; cost is computed via prices.costUsd(model, tokensIn, tokensOut) when both prices and model are non-null. Missing token counts leave those fields null (only the Latency card lights); a null prices or model leaves cost null. The output text is written under the default output key.
  The blocking call runs on the common ForkJoinPool; for isolated, true concurrency use measuredAsyncTask(Function, String, PriceTable, Executor) with a pool you size to the experiment's parallelism. Never throws on missing metrics.
  Example:
  ReactAgent agent = ReactAgent.builder().name("assistant").chatClient(chatClient).build(); AsyncTask task = SpringAiAlibabaSupport.measuredAsyncTask( example -> { AssistantMessage out = agent.call(example.input()); // pull token counts from your own usage callback / response metadata: return new AlibabaAgentResponse(out.getText(), promptTokens, completionTokens); }, "qwen-max", prices);
  Parameters:
  
  agentCall - a function that runs the agent for an Example and returns its response, never null
  
  model - the model id used as the PriceTable lookup key, or null to skip pricing
  
  prices - the price lookup, or null to capture tokens and latency only
  
  Returns:
  
  an AsyncTask suitable for Experiment.builder().asyncTask(...)
  
  Throws:
  
  IllegalArgumentException - if agentCall is null
- measuredAsyncTask
  
  public static AsyncTask measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices, Executor executor)
  
  Creates a measured AsyncTask that runs a Spring AI Alibaba agent on the supplied Executor so you control and isolate concurrency, with a non-null executor required.
  
  Parameters:
  
  agentCall - a function that runs the agent for an Example and returns its response, never null
  
  model - the model id used as the PriceTable lookup key, or null to skip pricing
  
  prices - the price lookup, or null to capture tokens and latency only
  
  executor - the executor each blocking call runs on, never null
  
  Returns:
  
  an AsyncTask suitable for Experiment.builder().asyncTask(...)
  
  Throws:
  
  IllegalArgumentException - if agentCall or executor is null

Class SpringAiAlibabaSupport

Per-turn windowing

Judges and tasks

Measured tasks

Example

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MESSAGES_KEY

Method Details

messages

toToolCalls

toAgentTrace

toAgentTrace

toAgentTrace

toToolDefinitions

measuredAsyncTask

measuredAsyncTask