Class SpringAiAlibabaSupport

java.lang.Object
dev.dokimos.springai.alibaba.SpringAiAlibabaSupport

public final class SpringAiAlibabaSupport extends Object
Utilities for evaluating Spring AI Alibaba (graph/agent) runs with the Dokimos agent evaluators.

Spring AI Alibaba's graph runtime carries its whole conversation as standard Spring AI message types (AssistantMessage, ToolResponseMessage, and UserMessage) under the OverAllState key "messages". Because those are the exact types Dokimos already converts in SpringAiSupport, this class is a thin adapter: it unwraps the List<Message> from the graph state and folds the full multi-turn conversation into a single AgentTrace, delegating tool-call and tool-definition conversion to SpringAiSupport.

Per-turn windowing

Tool-call results are correlated per turn: each AssistantMessage that issues tool calls is matched only against the ToolResponseMessages that follow it, up to the next AssistantMessage. This avoids silently binding a tool call to the wrong result if a sub-agent or loop reuses a tool-call id across turns.

Judges and tasks

This class deliberately does not provide asJudge or a plain asyncTask: Spring AI Alibaba agents run on a standard Spring AI ChatModel/ChatClient, so use SpringAiSupport.asJudge(org.springframework.ai.chat.model.ChatModel) and SpringAiSupport.asyncTask(org.springframework.ai.chat.client.ChatClient) directly.

Measured tasks

For metrics capture there is one important exception. A graph run does not expose Spring AI Usage: a ReactAgent run returns a bare AssistantMessage (or the OverAllState message list), and a Spring AI AssistantMessage carries no typed token usage — usage lives only on a ChatResponse, which the graph fold never surfaces. So unlike SpringAiSupport.measuredAsyncTask(org.springframework.ai.chat.client.ChatClient, String, dev.dokimos.core.PriceTable) (which reads usage off ChatResponse), the agent path here cannot read usage from its own result. measuredAsyncTask(Function, String, PriceTable) therefore follows the decoupled carrier pattern: it auto-times wall-clock latency around the run and lets you supply the token counts you obtain from your own Alibaba setup (a usage callback on the underlying ChatModel, the response metadata, etc.) via SpringAiAlibabaSupport.AlibabaAgentResponse, computing cost through an optional PriceTable. When you drive a plain ChatClient instead of a graph, prefer SpringAiSupport.measuredAsyncTask(...) so usage is captured automatically.

Example


 ReactAgent agent = ReactAgent.builder()
         .name("assistant")
         .chatClient(chatClient)
         .tools(toolCallbacks)
         .build();

 AgentTrace trace = SpringAiAlibabaSupport.toAgentTrace(
         agent, Map.of("messages", List.of(new UserMessage("...")), null));

 EvalTestCase testCase = trace.toTestCase(
         "user question",
         SpringAiAlibabaSupport.toToolDefinitions(toolCallbacks));

 EvalResult result = ToolCallValidityEvaluator.builder().build().evaluate(testCase);
 
  • Field Details

    • MESSAGES_KEY

      public static final String MESSAGES_KEY
      The Spring AI Alibaba graph-state key under which the message list is stored.
      See Also:
  • Method Details

    • messages

      public static List<org.springframework.ai.chat.messages.Message> messages(com.alibaba.cloud.ai.graph.OverAllState state)
      Extracts the raw Spring AI Message list from a graph state.

      Null-tolerant: a null state, an absent "messages" key, or a value that is not a List yields an empty list. Elements that are not Messages (including unknown subtypes that do not implement Message) are skipped. Never throws.

      Parameters:
      state - the graph state (may be null)
      Returns:
      the messages in order, or an empty list
    • toToolCalls

      public static List<ToolCall> toToolCalls(com.alibaba.cloud.ai.graph.OverAllState state)
      Extracts every tool call across all turns of a graph run, correlating each call to its result with per-turn windowing.

      For each AssistantMessage that issues tool calls, the following ToolResponseMessages (up to the next AssistantMessage) form the window used to resolve results, via SpringAiSupport.toToolCalls(AssistantMessage, List). Calls with no matching response in their window have a null result.

      Parameters:
      state - the graph state (may be null)
      Returns:
      the tool calls in execution order, or an empty list
    • toAgentTrace

      public static AgentTrace toAgentTrace(com.alibaba.cloud.ai.graph.OverAllState state)
      Folds the full multi-turn conversation of a graph run into a single AgentTrace.

      The trace's tool calls come from toToolCalls(OverAllState) (per-turn windowing); its final response is the text of the last AssistantMessage in the conversation, when that text is non-blank.

      Parameters:
      state - the graph state (may be null)
      Returns:
      an agent trace, never null
    • toAgentTrace

      public static AgentTrace toAgentTrace(Optional<com.alibaba.cloud.ai.graph.OverAllState> state)
      Folds the optional graph state returned by CompiledGraph.invoke(Map) into a single AgentTrace.

      An empty optional yields an empty trace (no tool calls, no final response).

      Parameters:
      state - the optional graph state (may be null)
      Returns:
      an agent trace, never null
    • toAgentTrace

      public static AgentTrace toAgentTrace(com.alibaba.cloud.ai.graph.agent.ReactAgent agent, Map<String,Object> inputs, com.alibaba.cloud.ai.graph.RunnableConfig config) throws com.alibaba.cloud.ai.graph.exception.GraphStateException
      Runs a ReactAgent's compiled graph and folds the resulting state into a single AgentTrace.

      This is the full-fidelity one-liner: it invokes the compiled graph (which preserves every intermediate tool call) rather than a lossy single-shot call. CompiledGraph.invoke(Map, RunnableConfig) returns Optional<OverAllState>, which is folded by toAgentTrace(Optional).

      Parameters:
      agent - the agent whose compiled graph is run, never null
      inputs - the graph inputs (for example the initial "messages" list), never null
      config - the run configuration, or null to invoke without one
      Returns:
      an agent trace, never null
      Throws:
      com.alibaba.cloud.ai.graph.exception.GraphStateException - if the agent's graph cannot be compiled
    • toToolDefinitions

      public static List<ToolDefinition> toToolDefinitions(List<org.springframework.ai.tool.ToolCallback> callbacks)
      Converts the ToolCallbacks an agent was built with into Dokimos ToolDefinitions, so tool calls can be evaluated against the tools the agent had available.

      Delegates to SpringAiSupport.toToolDefinitions(List) after pulling each callback's getToolDefinition(). A null or empty list yields an empty list.

      Parameters:
      callbacks - the tool callbacks supplied to the agent (may be null)
      Returns:
      the Dokimos tool definitions, or an empty list
    • measuredAsyncTask

      public static AsyncTask measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices)
      Creates a measured AsyncTask that runs a Spring AI Alibaba agent and captures latency automatically and, when a PriceTable and model id are supplied, cost, lighting up the run's metrics cards.

      Decoupled-carrier counterpart to SpringAiSupport.measuredAsyncTask(org.springframework.ai.chat.client.ChatClient, String, PriceTable): a graph/agent run does not expose Spring AI Usage (see the class-level "Measured tasks" note), so the supplied agentCall returns an SpringAiAlibabaSupport.AlibabaAgentResponse carrying the output text plus any token counts you extracted from your own setup. Wall-clock latency is measured around the call; cost is computed via prices.costUsd(model, tokensIn, tokensOut) when both prices and model are non-null. Missing token counts leave those fields null (only the Latency card lights); a null prices or model leaves cost null. The output text is written under the default output key.

      The blocking call runs on the common ForkJoinPool; for isolated, true concurrency use measuredAsyncTask(Function, String, PriceTable, Executor) with a pool you size to the experiment's parallelism. Never throws on missing metrics.

      Example:

      
       ReactAgent agent = ReactAgent.builder().name("assistant").chatClient(chatClient).build();
       AsyncTask task = SpringAiAlibabaSupport.measuredAsyncTask(
               example -> {
                   AssistantMessage out = agent.call(example.input());
                   // pull token counts from your own usage callback / response metadata:
                   return new AlibabaAgentResponse(out.getText(), promptTokens, completionTokens);
               },
               "qwen-max",
               prices);
       
      Parameters:
      agentCall - a function that runs the agent for an Example and returns its response, never null
      model - the model id used as the PriceTable lookup key, or null to skip pricing
      prices - the price lookup, or null to capture tokens and latency only
      Returns:
      an AsyncTask suitable for Experiment.builder().asyncTask(...)
      Throws:
      IllegalArgumentException - if agentCall is null
    • measuredAsyncTask

      public static AsyncTask measuredAsyncTask(Function<Example,SpringAiAlibabaSupport.AlibabaAgentResponse> agentCall, String model, PriceTable prices, Executor executor)
      Creates a measured AsyncTask that runs a Spring AI Alibaba agent on the supplied Executor so you control and isolate concurrency, with a non-null executor required.
      Parameters:
      agentCall - a function that runs the agent for an Example and returns its response, never null
      model - the model id used as the PriceTable lookup key, or null to skip pricing
      prices - the price lookup, or null to capture tokens and latency only
      executor - the executor each blocking call runs on, never null
      Returns:
      an AsyncTask suitable for Experiment.builder().asyncTask(...)
      Throws:
      IllegalArgumentException - if agentCall or executor is null