Dokimos | LLM Evaluation Framework for Java

Load test cases from JSON or CSV files, or create them programmatically. Run the same dataset across experiments or JUnit tests.

Use built-in and LLM-based evaluators out of the box.

Works with JUnit for parameterized testing and LangChain4j for evaluating AI Services. Integrate into existing CI/CD pipelines.