Record Class ExperimentResult
- Record Components:
name- the experiment namedescription- the experiment descriptionmetadata- experiment metadatarunResults- results for each run of the experiment
When an experiment is run multiple times, this class aggregates the results
across all runs. Methods like averageScore(String) and
passRate()
return values averaged across all runs.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondoubleaverageScore(String evaluatorName) Returns the average score for the specified evaluator across all runs.Returns the value of thedescriptionrecord component.final booleanIndicates whether some other object is "equal to" this one.Returns the names of all evaluators used in this experiment.voidExports the experiment result to a CSV file.voidexportHtml(Path path) Exports the experiment result to an HTML file.voidexportJson(Path path) Exports the experiment result to a JSON file.voidexportMarkdown(Path path) Exports the experiment result to a Markdown file.doubleReturns the average number of items that failed across all runs.final inthashCode()Returns a hash code value for this object.Returns all item results across all runs.metadata()Returns the value of themetadatarecord component.name()Returns the value of thenamerecord component.doubleReturns the average number of items that passed across all runs.doublepassRate()Returns the average pass rate across all runs.intrunCount()Returns the number of runs performed.Returns the value of therunResultsrecord component.runs()Returns the individual run results for detailed analysis.doublescoreStdDev(String evaluatorName) Returns the sample standard deviation of scores for the specified evaluator across runs.toCsv()Returns the experiment result as a CSV string.toHtml()Returns the experiment result as an HTML string.toJson()Returns the experiment result as a JSON string.Returns the experiment result as a Markdown string.final StringtoString()Returns a string representation of this record class.intReturns the total number of items evaluated per run.
-
Constructor Details
-
ExperimentResult
public ExperimentResult(String name, String description, Map<String, Object> metadata, List<RunResult> runResults) Creates an instance of aExperimentResultrecord class.- Parameters:
name- the value for thenamerecord componentdescription- the value for thedescriptionrecord componentmetadata- the value for themetadatarecord componentrunResults- the value for therunResultsrecord component
-
-
Method Details
-
itemResults
Returns all item results across all runs.For single-run experiments, this returns the same results as accessing the first run's item results directly.
- Returns:
- all item results flattened across runs
-
runs
Returns the individual run results for detailed analysis.- Returns:
- list of run results
-
runCount
public int runCount()Returns the number of runs performed.- Returns:
- the run count
-
totalCount
public int totalCount()Returns the total number of items evaluated per run.This returns the count from the first run. All runs evaluate the same dataset, so this value is consistent across runs.
- Returns:
- the total count per run
-
passCount
public double passCount()Returns the average number of items that passed across all runs.- Returns:
- the average pass count
-
failCount
public double failCount()Returns the average number of items that failed across all runs.- Returns:
- the average fail count
-
passRate
public double passRate()Returns the average pass rate across all runs.- Returns:
- the pass rate between 0.0 and 1.0
-
averageScore
Returns the average score for the specified evaluator across all runs.This first computes the average score within each run, then averages those values across all runs.
- Parameters:
evaluatorName- the evaluator's name- Returns:
- the computed average score
-
scoreStdDev
Returns the sample standard deviation of scores for the specified evaluator across runs.This measures how much the average score varies between runs. A high standard deviation suggests instability in your task or evaluator outputs.
Uses sample standard deviation (N-1 denominator) since runs represent a sample of potential outcomes, not the complete population.
For single-run experiments, this returns 0.0.
- Parameters:
evaluatorName- the evaluator's name- Returns:
- the standard deviation, or 0.0 for single-run experiments
-
evaluatorNames
Returns the names of all evaluators used in this experiment.- Returns:
- set of evaluator names
-
exportJson
Exports the experiment result to a JSON file.- Parameters:
path- the file path to write to
-
exportHtml
Exports the experiment result to an HTML file.- Parameters:
path- the file path to write to
-
exportMarkdown
Exports the experiment result to a Markdown file.- Parameters:
path- the file path to write to
-
exportCsv
Exports the experiment result to a CSV file.- Parameters:
path- the file path to write to
-
toJson
Returns the experiment result as a JSON string.- Returns:
- JSON representation
-
toHtml
Returns the experiment result as an HTML string.- Returns:
- HTML representation
-
toMarkdown
Returns the experiment result as a Markdown string.- Returns:
- Markdown representation
-
toCsv
Returns the experiment result as a CSV string.- Returns:
- CSV representation
-
toString
Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components. -
hashCode
public final int hashCode()Returns a hash code value for this object. The value is derived from the hash code of each of the record components. -
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object). -
name
Returns the value of thenamerecord component.- Returns:
- the value of the
namerecord component
-
description
Returns the value of thedescriptionrecord component.- Returns:
- the value of the
descriptionrecord component
-
metadata
Returns the value of themetadatarecord component.- Returns:
- the value of the
metadatarecord component
-
runResults
Returns the value of therunResultsrecord component.- Returns:
- the value of the
runResultsrecord component
-