Skip to main content

Server datasets

Store your test data on the server once, version it, and point your tests at a specific version by URI. No more copying the same examples into every test.

Each run records the exact dataset version it used. That is what lets a regression gate compare like for like.

How it works

A dataset is a named container. The data lives in versions.

  • Versions are numbered from 1.
  • A version is immutable once written.
  • Adding examples never edits an existing version. It creates the next one.
  • The alias latest always resolves to the highest version number.

Browse your datasets under Datasets in the web UI. The list shows each dataset's latest version and item count. Open one to see its versions and page through the items in a version.

Create a dataset and add a version

Create an empty dataset, then add a version with its items.

# 1. Create an empty dataset
curl -X POST http://localhost:8080/api/v1/datasets \
-H 'Content-Type: application/json' \
-d '{ "name": "qa-regression", "description": "Customer support QA set" }'

# 2. Add version 1 with its items
curl -X POST http://localhost:8080/api/v1/datasets/qa-regression/versions \
-H 'Content-Type: application/json' \
-d '{
"description": "Initial import",
"items": [
{
"inputs": { "question": "What is the capital of France?" },
"expectedOutputs": { "answer": "Paris" },
"metadata": { "category": "geography" }
}
]
}'

Each item needs inputs. The expectedOutputs and metadata fields are optional.

All dataset endpoints

MethodPathWhat it does
POST/api/v1/datasetsCreate an empty dataset
POST/api/v1/datasets/{name}/versionsAdd a new version with its items
GET/api/v1/datasetsList datasets with their latest version
GET/api/v1/datasets/{name}One dataset with all its versions
GET/api/v1/datasets/{name}/versions/{version}One version (latest or a number)
GET/api/v1/datasets/{name}/versions/{version}/itemsPage through a version's items
DELETE/api/v1/datasets/{name}Delete a dataset and all its versions

Write operations need an EDITOR role when authentication is on. See Authentication.

To grow a dataset from real run results instead of hand-writing items, see Review and curation.

Point your tests at a server dataset

Add the dokimos-server-client dependency to your test classpath.

<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-server-client</artifactId>
<version>${dokimos.version}</version>
<scope>test</scope>
</dependency>

The dependency registers a resolver for dataset:// URIs. Anywhere Dokimos resolves a dataset (the registry, or the JUnit @DatasetSource annotation) can now point at the server.

Resolve a dataset in code

Call the registry with a dataset:// URI.

import dev.dokimos.core.Dataset;
import dev.dokimos.core.DatasetResolverRegistry;

Dataset dataset = DatasetResolverRegistry.getInstance()
.resolve("dataset://qa-regression@3");

Resolve a dataset in a JUnit test

Use @DatasetSource on a parameterized test. Pin to @latest to always pull the newest version.

@ParameterizedTest
@DatasetSource("dataset://qa-regression@latest")
void evaluatesAnswers(Example example) {
String answer = aiService.generate(example.input());
Assertions.assertEval(example.toTestCase(answer), evaluators);
}

URI format

The URI is dataset://<name>@<version>. The version is a positive integer or latest.

The version is required. A pinned test always states the exact data it ran against.

Configure the server connection

The resolver reads two environment variables.

VariableWhat it is
DOKIMOS_SERVER_URLBase URL of the server to fetch from
DOKIMOS_API_KEYBearer key, when the server requires one

When DOKIMOS_SERVER_URL is unset, the resolver stays inert and resolves nothing. The same test then runs offline against file-based datasets. You do not need to configure the server to run your tests locally.

Offline cache

Resolved datasets are cached at ~/.dokimos/datasets-cache/<name>@<version>/items.json.

  • A pinned version is fetched network-first and falls back to its cached copy when the server is briefly unreachable. A transient outage does not break a CI run that already pulled that version once.
  • The latest alias is always fetched fresh. Once it resolves to a concrete version, that version is cached too.
  • A 4xx response or a parse error is surfaced directly, not masked by the cache. Those are not transient.

Next steps

For AI agentsView as Markdown