What are Tests?
Tests are automated regression tests that validate your agent’s behavior. Unlike Evaluations (which run after every real call), Tests run on-demand in the Advanced Editor to verify your agent responds correctly to specific scenarios before you deploy changes.Run tests after modifying prompts, knowledge base, or actions to catch regressions before they affect real customers.
Accessing Tests
Tests are located in the right panel of the Advanced Editor. Click the Tests tab (clipboard icon) to access them.Tests vs Evaluations
| Feature | Tests | Evaluations |
|---|---|---|
| When they run | On-demand in the editor | After every real call |
| Purpose | Validate specific scenarios before deploy | Measure call quality over time |
| Setup | Define conversation flow + success criteria | Define yes/no questions |
| Output | Pass/fail per test | Pass/fail per evaluation per call |
Test Interface Overview
The Tests panel features a modern tabbed interface with two main sections:- Tests
- History
Your test suite with filtering, creation, and execution controls
Header Actions
The header includes quick access to:- Context Variables (
</>) - Set dynamic variables for test execution - Simulate (💬) - Run one-off conversation simulations
- Batch Controls - Run tests multiple times for reliability testing
Test Types
Scenario Tests
A Scenario test defines a fixed conversation flow. You specify the exact messages (user and agent) and a success condition. The system evaluates whether the agent’s response meets your criteria.| Property | Description | Example |
|---|---|---|
| Chat history | Alternating user/agent messages | User: “Hola” → Agent: “Buenos días…” |
| Success condition | What the agent should achieve | ”El agente responde de manera profesional y útil” |
| Success examples | Sample responses that should pass | ”Buenos días, ¿en qué puedo ayudarle?” |
| Failure examples | Sample responses that should fail | ”No tengo idea” |
Simulation Tests
A Simulation test uses an AI to simulate a user. You define a persona, goal, and first message. The system runs a full conversation and evaluates the outcome.| Property | Description | Example |
|---|---|---|
| First message | How the simulated user starts | ”Hola, buenos días” |
| Persona | Simulated user’s profile | ”Cliente interesado en información” |
| Goal | What the simulated user wants | ”Obtener información sobre el servicio” |
Creating a Test
Open Tests Panel
Click the Tests tab in the right panel and ensure you’re on the Tests tab (not History).
Step 2 - Conversation
For Scenario: Add the conversation flow (user/agent messages). For Simulation: Set first message, persona, and goal.
Step 3 - Criteria
Define the success condition and add success/failure examples to guide the evaluation.
Context Variables
Tests can use dynamic variables that replace placeholders in your agent’s prompts during test execution. Access the Context Variables panel by clicking the</> icon in the header.
Built-in Variables
| Variable | Purpose | Example |
|---|---|---|
fecha_y_hora_actual | Current date/time in Spanish | ”Hoy es Miércoles 12 de Febrero de 2026 a las 14:30” |
Custom Variables
You can also set values for your agent’s input parameters, which will be injected during test execution.Open Context Variables
Click the
</> icon in the Tests header. A badge shows how many variables have values set.Set Built-in Variables
Click the refresh icon next to
fecha_y_hora_actual to update to the current time.Set Input Parameters
Enter values for any input parameters your agent expects (customer name, order number, etc.).
Batch Testing and Reliability
Multiple Repetitions
Run tests multiple times to assess reliability and identify intermittent failures:- Use the +/- controls next to the “Run All” button to set repetition count (1-10)
- Click Run All - the system executes each test the specified number of times
- View aggregated results in the History tab
Batch testing helps identify consistency issues. If a test passes 8/10 times, you may need to refine your prompts or success criteria.
Batch Progress Tracking
During batch execution:- Progress indicator shows current run (e.g., “Running 2/5”)
- Individual test cards update with interim results
- History tab populates with each completed run
Running Tests
Filter Tests
Use filter buttons to focus on specific test states:- All - Show every test
- Pass - Only tests that passed last run (with pass count)
- Fail - Only tests that failed last run (with fail count)
Run All Tests
Click Run All to execute every test (or all visible tests if filtered). When batch count > 1, button shows “Run x[count]”.Run Selected Tests
- Use checkboxes to select specific tests
- Click Run All (runs only selected when any are checked)
Run Individual Tests
Click the play icon (▶) on any test card to run just that test.Simulate Conversation
The Simulate feature lets you run real-time conversation simulations without creating permanent tests:Start Simulation
Click to begin - watch the conversation unfold in real-time with streaming messages.
Simulate is perfect for exploratory testing. Use regular Scenario/Simulation tests when you need repeatable, automated validation.
History Tab
The History tab provides comprehensive test run analytics:Run Overview
| Column | Description |
|---|---|
| Date | When the run was executed |
| Status | Pass count, fail count, or running status |
| Batch Info | Shows repetition info for batch runs |
| Actions | View details, retry failed, retry all |
Detailed Results
Click any run to open the detail modal:- Individual Test Results - Pass/fail status with evaluation reasoning
- Agent Responses - Full agent output for each test
- Conversation History - Complete message flow (for simulation tests)
- Retry Options - Re-run failed tests or entire batch
Trend Analysis
For batch runs with multiple repetitions:- Success Rate - Percentage of tests passing across all repetitions
- Consistency Metrics - Identify tests with intermittent failures
- Performance Patterns - Spot degradation over time
Results and Analysis
Test Cards
Each test card displays:| Icon | Status | Meaning |
|---|---|---|
| ✓ | Pass | Agent response met success criteria |
| ✗ | Fail | Agent response did not meet criteria |
| ⟳ | Running | Test is currently executing |
| ○ | Pending | Not run yet |
Inline Details
Click a test card to expand inline details:- Agent Response - What the agent actually said
- Evaluation Reason - Why it passed or failed
- Full Conversation - Complete message history (for simulations)
Batch Results
For batch runs, test cards show aggregated results:- Success Rate - “8/10 passed” format
- Latest Result - Most recent execution status
- Trend Indicator - Visual indicator of consistency
Copilot Integration
Failed tests automatically generate improvement suggestions for Copilot:Managing Tests
Edit a Test
- Click the menu (⋮) on a test card → Edit
- Modify conversation flow, success criteria, or examples
- Save changes
Clone a Test
- Click the menu (⋮) → Clone
- Creates an exact copy for creating variations
- Useful for testing different personas with same criteria
Delete a Test
- Click the menu (⋮) → Delete
- Confirm deletion (irreversible)
Organize Tests
- Name Clearly - Use descriptive names like “Greeting - Professional Tone”
- Group by Feature - Create tests for each major capability
- Version Control - Clone tests when making significant criteria changes
Advanced Features
Real-time Streaming
Simulation tests stream responses in real-time:- Watch conversations develop naturally
- See agent thinking process
- Identify response delays or issues
Dynamic Variable Injection
Context variables are injected dynamically:- Set once, apply to all tests
- Override per-test if needed
- Track variable impact on results
Historical Comparison
Compare test results across runs:- Identify regression patterns
- Track improvement over time
- Validate deployment readiness
Best Practices
Test Critical Paths
Test Critical Paths
Create tests for your most important flows: greetings, main use case, objections, and compliance.
Use Success/Failure Examples
Use Success/Failure Examples
Provide clear examples so the LLM evaluator understands what “good” and “bad” responses look like.
Set Context Variables
Set Context Variables
If your agent uses or input parameters, set them in the context panel before running tests.
Leverage Batch Testing
Leverage Batch Testing
Run tests multiple times to identify consistency issues and improve reliability.
Monitor Trends
Monitor Trends
Use the History tab to track test performance over time and catch regressions early.
Integrate with Development
Integrate with Development
Run tests before publishing changes and use Copilot integration for continuous improvement.
Troubleshooting
Tests Won’t Run
Inconsistent Results
- Increase Batch Size - Run tests 3-5 times to identify patterns
- Refine Success Criteria - Make conditions more specific
- Check Variable Values - Ensure context variables are set correctly
Failed Evaluations
- Review Examples - Add more success/failure examples
- Simplify Criteria - Break complex conditions into multiple tests
- Use Copilot - Let AI suggest improvements based on failures
Next Steps
Evaluations
Set up post-call quality metrics
Playground
Test your agent interactively
Copilot
Get AI help improving your prompts
Input Params
Configure variables for tests and calls

