Class RunComparison

java.lang.Object
dev.dokimos.core.comparison.RunComparison

public final class RunComparison extends Object
Regression-comparison engine that compares a baseline set of runs against a candidate set.

Each side may contain one or more runs (repetitions). Items are grouped by an item-identity key, aggregated across repetitions into a per-item pass-probability and per-evaluator mean, then paired across sides by key. The engine emits per-evaluator and overall deltas classified as IMPROVED, REGRESSED, or UNCHANGED, each backed by a significance test.

For single-run binary outcomes the pass-rate test uses McNemar's test with continuity correction; otherwise a paired sign-flip permutation test with a bootstrap percentile confidence interval. A change is flagged only when |delta| > epsilon and the test is significant at alpha.

Randomized procedures are deterministic for a fixed seed and evaluator set; the shared Random is consumed in evaluator-name order, so adding or removing evaluators shifts p-values of the others.