JUnit 5 Integration

Dokimos works with JUnit 5's parameterized tests so you can test LLM applications the same way you test regular code - with fast-failing tests that catch regressions.

Why Use JUnit 5 Integration?

Fast feedback during development - Tests fail immediately when an output doesn't meet your criteria. You don't have to wait for a full evaluation run to finish.

CI/CD quality gates - Fail your build if critical test cases don't pass, just like you would with regular unit tests.

Familiar tooling - Use the JUnit tools you already know: test runners, IDE integration, and reporting.

When to use JUnit tests:

Testing critical examples that should never break
Quick validation during development
CI/CD pipelines where you want to fail fast
Test-driven development of LLM features

When to use experiments instead:

Analyzing performance across large datasets
Comparing different models or configurations
Generating detailed reports with metrics
Exploratory evaluation of new features

See Experiments vs JUnit Testing for more details.

Setup

Add the JUnit 5 integration dependency:

<dependency>
    <groupId>dev.dokimos</groupId>
    <artifactId>dokimos-junit5</artifactId>
    <version>${dokimos.version}</version>
    <scope>test</scope>
</dependency>

Basic Usage

Using @DatasetSource

Load datasets with the @DatasetSource annotation:

import dev.dokimos.junit5.DatasetSource;
import dev.dokimos.core.*;
import org.junit.jupiter.params.ParameterizedTest;

@ParameterizedTest
@DatasetSource("classpath:datasets/support-qa.json")
void shouldAnswerSupportQuestions(Example example) {
    // Generate answer from your LLM
    String answer = supportBot.generate(example.input());
    
    // Create test case
    EvalTestCase testCase = example.toTestCase(answer);
    
    // Assert evaluators pass (fails test if they don't)
    Assertions.assertEval(testCase, evaluators);
}

JUnit runs this test once for each example in the dataset. If any evaluator doesn't pass its threshold, the test fails.

Loading Datasets

From classpath (like src/test/resources):

@DatasetSource("classpath:datasets/support-qa.json")

From file system:

@DatasetSource("file:testdata/support-qa.json")

Inline for quick tests:

@DatasetSource(json = """
    {
      "examples": [
        {"input": "Reset password", "expectedOutput": "Click Forgot Password"},
        {"input": "Track order", "expectedOutput": "Check Order History"}
      ]
    }
    """)

Using assertEval

Assertions.assertEval() runs your evaluators and fails the test if any don't pass:

Assertions.assertEval(testCase, evaluators);

When a test fails, you get a clear error message:

Evaluation 'Answer Quality' failed: score=0.65 (threshold=0.80)
Reason: The answer is incomplete and doesn't mention the 30-day policy.

Complete Example

import dev.dokimos.junit5.DatasetSource;
import dev.dokimos.core.*;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.params.ParameterizedTest;
import java.util.List;

class CustomerSupportTest {
    
    private static List<Evaluator> evaluators;
    private static CustomerSupportBot supportBot;
    
    @BeforeAll
    static void setup() {
        supportBot = new CustomerSupportBot(apiKey);
        JudgeLM judge = prompt -> judgeModel.generate(prompt);
        
        evaluators = List.of(
            LLMJudgeEvaluator.builder()
                .name("Answer Quality")
                .criteria("Is the answer helpful and addresses the user's question?")
                .threshold(0.80)
                .judge(judge)
                .build(),
            RegexEvaluator.builder()
                .name("No Placeholders")
                .pattern(".*\\[.*\\].*")  // Catch [PLACEHOLDER] text
                .threshold(0.0)  // Should NOT match
                .build()
        );
    }
    
    @ParameterizedTest(name = "[{index}] {0}")
    @DatasetSource("classpath:datasets/support-qa-v3.json")
    void shouldAnswerSupportQuestions(Example example) {
        String response = supportBot.generate(example.input());
        EvalTestCase testCase = example.toTestCase(response);
        Assertions.assertEval(testCase, evaluators);
    }
}

Advanced Usage

Testing RAG Systems

For RAG applications, include the retrieved context in your test case:

@ParameterizedTest
@DatasetSource("classpath:datasets/product-docs-qa.json")
void shouldAnswerFromDocumentation(Example example) {
    // Retrieve relevant documents
    List<String> docs = vectorStore.search(example.input(), topK = 5);
    
    // Generate answer with RAG
    String answer = ragSystem.generate(example.input(), docs);
    
    // Include context in test case
    EvalTestCase testCase = example.toTestCase(Map.of(
        "output", answer,
        "retrievedContext", docs
    ));
    
    // Check both quality and faithfulness
    Assertions.assertEval(testCase, List.of(
        LLMJudgeEvaluator.builder()
            .name("Answer Quality")
            .criteria("Is the answer helpful?")
            .threshold(0.8)
            .judge(judge)
            .build(),
        FaithfulnessEvaluator.builder()
            .threshold(0.85)
            .judge(judge)
            .contextKey("retrievedContext")
            .build()
    ));
}

Readable Test Names

Customize how tests appear in output:

@ParameterizedTest(name = "{index}: {0}")
@DatasetSource("classpath:datasets/support-qa.json")
void shouldAnswerQuestions(Example example) {
    // Output: "1: How do I reset my password?"
}

CI/CD Integration

Maven

Run tests in your CI pipeline:

mvn test

GitHub Actions

name: LLM Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up JDK 21
        uses: actions/setup-java@v3
        with:
          java-version: '21'
          distribution: 'temurin'
      
      - name: Run LLM Tests
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: mvn test
      
      - name: Publish Test Report
        if: always()
        uses: dorny/test-reporter@v1
        with:
          name: JUnit Tests
          path: target/surefire-reports/*.xml
          reporter: java-junit

Test Reports

JUnit generates standard test reports that integrate with CI tools:

target/surefire-reports/
  ├── TEST-CustomerSupportTest.xml
  └── CustomerSupportTest.txt

Best Practices

Keep datasets in version control - Store them alongside your code so tests are reproducible.

Start with critical examples - Don't try to test everything. Focus on the most important cases that should never break.

Use clear test names - Make it obvious what each test is checking.

Separate CI and comprehensive testing - Use a smaller dataset for CI (maybe 10-20 examples) and run full evaluations separately.

Test at multiple levels - Combine unit tests (JUnit) with comprehensive evaluations (Experiments) for best coverage.

Why Use JUnit 5 Integration?​

Setup​

Basic Usage​

Using @DatasetSource​

Loading Datasets​

Using assertEval​

Complete Example​

Advanced Usage​

Testing RAG Systems​

Readable Test Names​

CI/CD Integration​

Maven​

GitHub Actions​

Test Reports​

Best Practices​