Package dev.dokimos.core.gate
Class RegressionGate
java.lang.Object
dev.dokimos.core.gate.RegressionGate
Compares a candidate experiment result against a committed baseline and produces a
GateVerdict. This is the comparison step only; reading the baseline file and turning a verdict
into a test failure are handled by RegressionGateRunner and Assertions.
The gate fails if either of two independent guards fires:
- Guard 1 (broad) — the
RunComparisonengine reports a significant aggregate pass-rate drop or any significantly regressed evaluator (RunComparisonResult.hasRegressions()). It stays quiet on judge noise but cannot see a localized break on a small dataset. - Guard 2 (localized-severe) — any shared item whose worst raw per-evaluator score drop
falls below
-severityMargin. This catches a single item that breaks hard even when the aggregate change is not significant.
Coverage-loss conditions fail the gate independently of failOnRegression, because guard
1 cannot see them: a removed evaluator (per onRemovedEvaluator), a removed item (only when
failOnRemovedItems), and — under dataset_item_id pairing — a candidate item with no
id, which pairs against nothing and would hide a regression on it (always a FAIL). A baseline id
absent from the candidate warns; whether it also fails is governed by failOnRemovedItems. A
threshold change between sides only warns.
-
Method Summary
Modifier and TypeMethodDescriptionstatic GateVerdictevaluate(ExperimentResult candidate, BaselineFile baseline, GateConfig config) Compares a candidate against a baseline and returns the gate verdict.
-
Method Details
-
evaluate
public static GateVerdict evaluate(ExperimentResult candidate, BaselineFile baseline, GateConfig config) Compares a candidate against a baseline and returns the gate verdict.- Parameters:
candidate- the candidate experiment resultbaseline- the committed baseline to compare againstconfig- the gate configuration- Returns:
- the verdict (status
PASSorFAIL; neverNO_BASELINE, which the assertion layer produces when no baseline file exists)
-