The Run feature lets you test your prompts against production test cases stored in Datasets, using your configured evaluators. Get detailed results to optimize your AI responses.
Prerequisites
Before you can run evaluations, ensure you have:
Running Evaluation
Once your evaluators are ready, click on the Evaluate button to start the evaluation run:
The system processes your prompts against the dataset using the evaluators you set up. After the evaluation is completed, the system displays the results:
Background Processing and Concurrent Executions
Evaluations run in the background (in cloud) automatically. This means you can safely leave the page and return later to check results.
This background processing means you can continue working on other prompts while your evaluations complete.
Also, note that you can run up to 5 concurrent evaluation tests at the same time. Each time you click on Evaluate, the platform launches a new execution in parallel to the others already started.
TIP: Background processing and concurrent executions allow you to efficiently use your time with large-scale testing. Use this feature for running tests with large models, models with rate limits, or large datasets to improve the efficiency on your end. Instead of waiting several minutes for an evaluation to complete before starting another, launch the other in parallel.