Run Evaluations

The Run Evaluations feature lets you test your prompts against production test cases stored in Datasets, using your configured evaluators. Get detailed results to optimize your AI responses.

Prerequisites

Before you can run evaluations, ensure you have:

At least one evaluator: Add any evaluator from the available options.
A connected dataset: Link an evaluation dataset that contains your test cases and variables.

Dynamic Columns: If your dataset contains dynamic columns (API variables or prompt variables), they are auto resolved with the values from the external API or the prompt output before running evaluations. Learn more about dynamic columns.

Running Evaluation

Once your evaluators are ready, click on the Evaluate button to start the evaluation run:

The system processes your prompts against the dataset using the evaluators you set up. After the evaluation is completed, the system displays the results:

Background Processing and Concurrent Executions

Evaluations run in the background (in cloud) automatically. This means you can safely leave the page and return later to check results. This background processing means you can continue working on other prompts while your evaluations complete. Also, note that you can run up to 5 concurrent evaluation tests at the same time. Each time you click on Evaluate, the platform launches a new execution in parallel to the others already started.

TIP: Background processing and concurrent executions allow you to efficiently use your time with large-scale testing. Use this feature for running tests with large models, models with rate limits, or large datasets to improve the efficiency on your end. Instead of waiting several minutes for an evaluation to complete before starting another, launch the other in parallel.

Get started

Iterate

Evaluate

Deploy

Monitor

Guides

References

Run Evaluations

Prerequisites

Running Evaluation

Background Processing and Concurrent Executions

Get started

Iterate

Evaluate

Deploy

Monitor

Guides

References

​Prerequisites

​Running Evaluation

​Background Processing and Concurrent Executions

Prerequisites

Running Evaluation

Background Processing and Concurrent Executions