Server datasets
Store your test data on the server once, version it, and point your tests at a specific version by URI. No more copying the same examples into every test.
Each run records the exact dataset version it used. That is what lets a regression gate compare like for like.
How it works
A dataset is a named container. The data lives in versions.
- Versions are numbered from 1.
- A version is immutable once written.
- Adding examples never edits an existing version. It creates the next one.
- The alias
latestalways resolves to the highest version number.
Browse your datasets under Datasets in the web UI. The list shows each dataset's latest version and item count. Open one to see its versions and page through the items in a version.
Create a dataset and add a version
Create an empty dataset, then add a version with its items.
# 1. Create an empty dataset
curl -X POST http://localhost:8080/api/v1/datasets \
-H 'Content-Type: application/json' \
-d '{ "name": "qa-regression", "description": "Customer support QA set" }'
# 2. Add version 1 with its items
curl -X POST http://localhost:8080/api/v1/datasets/qa-regression/versions \
-H 'Content-Type: application/json' \
-d '{
"description": "Initial import",
"items": [
{
"inputs": { "question": "What is the capital of France?" },
"expectedOutputs": { "answer": "Paris" },
"metadata": { "category": "geography" }
}
]
}'
Each item needs inputs. The expectedOutputs and metadata fields are optional.
All dataset endpoints
| Method | Path | What it does |
|---|---|---|
POST | /api/v1/datasets | Create an empty dataset |
POST | /api/v1/datasets/{name}/versions | Add a new version with its items |
GET | /api/v1/datasets | List datasets with their latest version |
GET | /api/v1/datasets/{name} | One dataset with all its versions |
GET | /api/v1/datasets/{name}/versions/{version} | One version (latest or a number) |
GET | /api/v1/datasets/{name}/versions/{version}/items | Page through a version's items |
DELETE | /api/v1/datasets/{name} | Delete a dataset and all its versions |
Write operations need an EDITOR role when authentication is on. See Authentication.
To grow a dataset from real run results instead of hand-writing items, see Review and curation.
Point your tests at a server dataset
Add the dokimos-server-client dependency to your test classpath.
<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-server-client</artifactId>
<version>${dokimos.version}</version>
<scope>test</scope>
</dependency>
The dependency registers a resolver for dataset:// URIs. Anywhere Dokimos resolves a dataset (the registry, or the JUnit @DatasetSource annotation) can now point at the server.
Resolve a dataset in code
Call the registry with a dataset:// URI.
- Java
- Kotlin
import dev.dokimos.core.Dataset;
import dev.dokimos.core.DatasetResolverRegistry;
Dataset dataset = DatasetResolverRegistry.getInstance()
.resolve("dataset://qa-regression@3");
import dev.dokimos.core.Dataset
import dev.dokimos.core.DatasetResolverRegistry
val dataset: Dataset = DatasetResolverRegistry.getInstance()
.resolve("dataset://qa-regression@3")
Resolve a dataset in a JUnit test
Use @DatasetSource on a parameterized test. Pin to @latest to always pull the newest version.
@ParameterizedTest
@DatasetSource("dataset://qa-regression@latest")
void evaluatesAnswers(Example example) {
String answer = aiService.generate(example.input());
Assertions.assertEval(example.toTestCase(answer), evaluators);
}
URI format
The URI is dataset://<name>@<version>. The version is a positive integer or latest.
The version is required. A pinned test always states the exact data it ran against.
Configure the server connection
The resolver reads two environment variables.
| Variable | What it is |
|---|---|
DOKIMOS_SERVER_URL | Base URL of the server to fetch from |
DOKIMOS_API_KEY | Bearer key, when the server requires one |
When DOKIMOS_SERVER_URL is unset, the resolver stays inert and resolves nothing. The same test then runs offline against file-based datasets. You do not need to configure the server to run your tests locally.
Offline cache
Resolved datasets are cached at ~/.dokimos/datasets-cache/<name>@<version>/items.json.
- A pinned version is fetched network-first and falls back to its cached copy when the server is briefly unreachable. A transient outage does not break a CI run that already pulled that version once.
- The
latestalias is always fetched fresh. Once it resolves to a concrete version, that version is cached too. - A 4xx response or a parse error is surfaced directly, not masked by the cache. Those are not transient.
Next steps
- Review and curation: turn real run failures into new dataset versions
- CI regression gate: fail a build when a run regresses against a dataset version