Skip to main content

What are Tests?

Tests are automated regression tests that validate your agent’s behavior. Unlike Evaluations (which run after every real call), Tests run on-demand in the Advanced Editor to verify your agent responds correctly to specific scenarios before you deploy changes.
Run tests after modifying prompts, knowledge base, or actions to catch regressions before they affect real customers.

Accessing Tests

Tests are located in the right panel of the Advanced Editor. Click the Tests tab (clipboard icon) to access them.

Tests vs Evaluations

FeatureTestsEvaluations
When they runOn-demand in the editorAfter every real call
PurposeValidate specific scenarios before deployMeasure call quality over time
SetupDefine conversation flow + success criteriaDefine yes/no questions
OutputPass/fail per testPass/fail per evaluation per call

Test Interface Overview

The Tests panel features a modern tabbed interface with two main sections:
Your test suite with filtering, creation, and execution controls

Header Actions

The header includes quick access to:
  • Context Variables (</>) - Set dynamic variables for test execution
  • Simulate (💬) - Run one-off conversation simulations
  • Batch Controls - Run tests multiple times for reliability testing

Test Types

Scenario Tests

A Scenario test defines a fixed conversation flow. You specify the exact messages (user and agent) and a success condition. The system evaluates whether the agent’s response meets your criteria.
PropertyDescriptionExample
Chat historyAlternating user/agent messagesUser: “Hola” → Agent: “Buenos días…”
Success conditionWhat the agent should achieve”El agente responde de manera profesional y útil”
Success examplesSample responses that should pass”Buenos días, ¿en qué puedo ayudarle?”
Failure examplesSample responses that should fail”No tengo idea”

Simulation Tests

A Simulation test uses an AI to simulate a user. You define a persona, goal, and first message. The system runs a full conversation and evaluates the outcome.
PropertyDescriptionExample
First messageHow the simulated user starts”Hola, buenos días”
PersonaSimulated user’s profile”Cliente interesado en información”
GoalWhat the simulated user wants”Obtener información sobre el servicio”

Creating a Test

1

Open Tests Panel

Click the Tests tab in the right panel and ensure you’re on the Tests tab (not History).
2

Click New

Click the “New” button to start the creation wizard.
3

Step 1 - Name & Type

Enter a descriptive name and choose Scenario or Simulation.
4

Step 2 - Conversation

For Scenario: Add the conversation flow (user/agent messages). For Simulation: Set first message, persona, and goal.
5

Step 3 - Criteria

Define the success condition and add success/failure examples to guide the evaluation.
6

Save

The test is created and appears in the list.

Context Variables

Tests can use dynamic variables that replace placeholders in your agent’s prompts during test execution. Access the Context Variables panel by clicking the </> icon in the header.

Built-in Variables

VariablePurposeExample
fecha_y_hora_actualCurrent date/time in Spanish”Hoy es Miércoles 12 de Febrero de 2026 a las 14:30”

Custom Variables

You can also set values for your agent’s input parameters, which will be injected during test execution.
1

Open Context Variables

Click the </> icon in the Tests header. A badge shows how many variables have values set.
2

Set Built-in Variables

Click the refresh icon next to fecha_y_hora_actual to update to the current time.
3

Set Input Parameters

Enter values for any input parameters your agent expects (customer name, order number, etc.).
4

Run Tests

Variables are automatically injected into the agent’s context during execution.
Use fecha_y_hora_actual when your agent greets with “Buenos días” or references the current date. Tests run with the values you set, not the live time.

Batch Testing and Reliability

Multiple Repetitions

Run tests multiple times to assess reliability and identify intermittent failures:
  1. Use the +/- controls next to the “Run All” button to set repetition count (1-10)
  2. Click Run All - the system executes each test the specified number of times
  3. View aggregated results in the History tab
Batch testing helps identify consistency issues. If a test passes 8/10 times, you may need to refine your prompts or success criteria.

Batch Progress Tracking

During batch execution:
  • Progress indicator shows current run (e.g., “Running 2/5”)
  • Individual test cards update with interim results
  • History tab populates with each completed run

Running Tests

Filter Tests

Use filter buttons to focus on specific test states:
  • All - Show every test
  • Pass - Only tests that passed last run (with pass count)
  • Fail - Only tests that failed last run (with fail count)

Run All Tests

Click Run All to execute every test (or all visible tests if filtered). When batch count > 1, button shows “Run x[count]”.

Run Selected Tests

  1. Use checkboxes to select specific tests
  2. Click Run All (runs only selected when any are checked)

Run Individual Tests

Click the play icon (▶) on any test card to run just that test.

Simulate Conversation

The Simulate feature lets you run real-time conversation simulations without creating permanent tests:
1

Open Simulator

Click the chat bubble icon (💬) in the Tests header.
2

Configure Parameters

Set first message, persona, goal, and turn limit (default 10).
3

Start Simulation

Click to begin - watch the conversation unfold in real-time with streaming messages.
4

Review Results

See full conversation history and final outcome evaluation.
Simulate is perfect for exploratory testing. Use regular Scenario/Simulation tests when you need repeatable, automated validation.

History Tab

The History tab provides comprehensive test run analytics:

Run Overview

ColumnDescription
DateWhen the run was executed
StatusPass count, fail count, or running status
Batch InfoShows repetition info for batch runs
ActionsView details, retry failed, retry all

Detailed Results

Click any run to open the detail modal:
  • Individual Test Results - Pass/fail status with evaluation reasoning
  • Agent Responses - Full agent output for each test
  • Conversation History - Complete message flow (for simulation tests)
  • Retry Options - Re-run failed tests or entire batch

Trend Analysis

For batch runs with multiple repetitions:
  • Success Rate - Percentage of tests passing across all repetitions
  • Consistency Metrics - Identify tests with intermittent failures
  • Performance Patterns - Spot degradation over time

Results and Analysis

Test Cards

Each test card displays:
IconStatusMeaning
PassAgent response met success criteria
FailAgent response did not meet criteria
RunningTest is currently executing
PendingNot run yet

Inline Details

Click a test card to expand inline details:
  • Agent Response - What the agent actually said
  • Evaluation Reason - Why it passed or failed
  • Full Conversation - Complete message history (for simulations)

Batch Results

For batch runs, test cards show aggregated results:
  • Success Rate - “8/10 passed” format
  • Latest Result - Most recent execution status
  • Trend Indicator - Visual indicator of consistency

Copilot Integration

Failed tests automatically generate improvement suggestions for Copilot:
1

Run Tests

Execute your test suite and identify failing tests.
2

Generate Suggestions

The system analyzes failures and creates targeted improvement prompts.
3

Open Copilot

Click the generated prompt to open Copilot with context about specific failures.
4

Apply Improvements

Copilot suggests prompt modifications based on test failure analysis.
5

Re-test

Run tests again to validate improvements.
Copilot integration requires test failures to generate meaningful suggestions. Ensure your success criteria are specific enough to catch real issues.

Managing Tests

Edit a Test

  1. Click the menu (⋮) on a test card → Edit
  2. Modify conversation flow, success criteria, or examples
  3. Save changes

Clone a Test

  1. Click the menu (⋮) → Clone
  2. Creates an exact copy for creating variations
  3. Useful for testing different personas with same criteria

Delete a Test

  1. Click the menu (⋮) → Delete
  2. Confirm deletion (irreversible)

Organize Tests

  • Name Clearly - Use descriptive names like “Greeting - Professional Tone”
  • Group by Feature - Create tests for each major capability
  • Version Control - Clone tests when making significant criteria changes

Advanced Features

Real-time Streaming

Simulation tests stream responses in real-time:
  • Watch conversations develop naturally
  • See agent thinking process
  • Identify response delays or issues

Dynamic Variable Injection

Context variables are injected dynamically:
  • Set once, apply to all tests
  • Override per-test if needed
  • Track variable impact on results

Historical Comparison

Compare test results across runs:
  • Identify regression patterns
  • Track improvement over time
  • Validate deployment readiness

Best Practices

Create tests for your most important flows: greetings, main use case, objections, and compliance.
Provide clear examples so the LLM evaluator understands what “good” and “bad” responses look like.
If your agent uses or input parameters, set them in the context panel before running tests.
Run tests multiple times to identify consistency issues and improve reliability.
Run tests before publishing changes and use Copilot integration for continuous improvement.

Troubleshooting

Tests Won’t Run

1

Check Context Variables

Ensure required variables like fecha_y_hora_actual are set.
2

Verify Success Criteria

Make sure success conditions are specific and measurable.
3

Review Agent Configuration

Confirm your agent has proper prompts and knowledge base setup.

Inconsistent Results

  • Increase Batch Size - Run tests 3-5 times to identify patterns
  • Refine Success Criteria - Make conditions more specific
  • Check Variable Values - Ensure context variables are set correctly

Failed Evaluations

  • Review Examples - Add more success/failure examples
  • Simplify Criteria - Break complex conditions into multiple tests
  • Use Copilot - Let AI suggest improvements based on failures

Next Steps

Evaluations

Set up post-call quality metrics

Playground

Test your agent interactively

Copilot

Get AI help improving your prompts

Input Params

Configure variables for tests and calls