> ## Documentation Index
> Fetch the complete documentation index at: https://docs.meetzy.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Tests

> Create and run automated regression tests to ensure your AI agent quality

## What are Tests?

Tests are automated regression tests that validate your agent's behavior. Unlike **Evaluations** (which run after every real call), Tests run on-demand in the Advanced Editor to verify your agent responds correctly to specific scenarios before you deploy changes.

<Info>
  Run tests after modifying prompts, knowledge base, or actions to catch regressions before they affect real customers.
</Info>

## Accessing Tests

Tests are located in the right panel of the Advanced Editor. Click the **Tests** tab (clipboard icon) to access them.

## Tests vs Evaluations

| Feature       | Tests                                       | Evaluations                       |
| ------------- | ------------------------------------------- | --------------------------------- |
| When they run | On-demand in the editor                     | After every real call             |
| Purpose       | Validate specific scenarios before deploy   | Measure call quality over time    |
| Setup         | Define conversation flow + success criteria | Define yes/no questions           |
| Output        | Pass/fail per test                          | Pass/fail per evaluation per call |

## Test Interface Overview

The Tests panel features a modern tabbed interface with two main sections:

<Tabs>
  <Tab title="Tests">
    Your test suite with filtering, creation, and execution controls
  </Tab>

  <Tab title="History">
    Past test runs with detailed results and trend analysis
  </Tab>
</Tabs>

### Header Actions

The header includes quick access to:

* **Context Variables** (`</>`) - Set dynamic variables for test execution
* **Simulate** (💬) - Run one-off conversation simulations
* **Batch Controls** - Run tests multiple times for reliability testing

## Test Types

### Scenario Tests

A **Scenario** test defines a fixed conversation flow. You specify the exact messages (user and agent) and a success condition. The system evaluates whether the agent's response meets your criteria.

| Property          | Description                       | Example                                           |
| ----------------- | --------------------------------- | ------------------------------------------------- |
| Chat history      | Alternating user/agent messages   | User: "Hola" → Agent: "Buenos días..."            |
| Success condition | What the agent should achieve     | "El agente responde de manera profesional y útil" |
| Success examples  | Sample responses that should pass | "Buenos días, ¿en qué puedo ayudarle?"            |
| Failure examples  | Sample responses that should fail | "No tengo idea"                                   |

### Simulation Tests

A **Simulation** test uses an AI to simulate a user. You define a persona, goal, and first message. The system runs a full conversation and evaluates the outcome.

| Property      | Description                   | Example                                 |
| ------------- | ----------------------------- | --------------------------------------- |
| First message | How the simulated user starts | "Hola, buenos días"                     |
| Persona       | Simulated user's profile      | "Cliente interesado en información"     |
| Goal          | What the simulated user wants | "Obtener información sobre el servicio" |

## Creating a Test

<Steps>
  <Step title="Open Tests Panel">
    Click the Tests tab in the right panel and ensure you're on the **Tests** tab (not History).
  </Step>

  <Step title="Click New">
    Click the "New" button to start the creation wizard.
  </Step>

  <Step title="Step 1 - Name & Type">
    Enter a descriptive name and choose Scenario or Simulation.
  </Step>

  <Step title="Step 2 - Conversation">
    For Scenario: Add the conversation flow (user/agent messages). For Simulation: Set first message, persona, and goal.
  </Step>

  <Step title="Step 3 - Criteria">
    Define the success condition and add success/failure examples to guide the evaluation.
  </Step>

  <Step title="Save">
    The test is created and appears in the list.
  </Step>
</Steps>

## Context Variables

Tests can use **dynamic variables** that replace placeholders in your agent's prompts during test execution. Access the Context Variables panel by clicking the **`</>`** icon in the header.

### Built-in Variables

| Variable              | Purpose                      | Example                                              |
| --------------------- | ---------------------------- | ---------------------------------------------------- |
| `fecha_y_hora_actual` | Current date/time in Spanish | "Hoy es Miércoles 12 de Febrero de 2026 a las 14:30" |

### Custom Variables

You can also set values for your agent's input parameters, which will be injected during test execution.

<Steps>
  <Step title="Open Context Variables">
    Click the **`</>`** icon in the Tests header. A badge shows how many variables have values set.
  </Step>

  <Step title="Set Built-in Variables">
    Click the refresh icon next to `fecha_y_hora_actual` to update to the current time.
  </Step>

  <Step title="Set Input Parameters">
    Enter values for any input parameters your agent expects (customer name, order number, etc.).
  </Step>

  <Step title="Run Tests">
    Variables are automatically injected into the agent's context during execution.
  </Step>
</Steps>

<Tip>
  Use `fecha_y_hora_actual` when your agent greets with "Buenos días" or references the current date. Tests run with the values you set, not the live time.
</Tip>

## Batch Testing and Reliability

### Multiple Repetitions

Run tests multiple times to assess reliability and identify intermittent failures:

1. Use the **+/-** controls next to the "Run All" button to set repetition count (1-10)
2. Click **Run All** - the system executes each test the specified number of times
3. View aggregated results in the History tab

<Info>
  Batch testing helps identify consistency issues. If a test passes 8/10 times, you may need to refine your prompts or success criteria.
</Info>

### Batch Progress Tracking

During batch execution:

* Progress indicator shows current run (e.g., "Running 2/5")
* Individual test cards update with interim results
* History tab populates with each completed run

## Running Tests

### Filter Tests

Use filter buttons to focus on specific test states:

* **All** - Show every test
* **Pass** - Only tests that passed last run (with pass count)
* **Fail** - Only tests that failed last run (with fail count)

### Run All Tests

Click **Run All** to execute every test (or all visible tests if filtered). When batch count > 1, button shows "Run x\[count]".

### Run Selected Tests

1. Use checkboxes to select specific tests
2. Click **Run All** (runs only selected when any are checked)

### Run Individual Tests

Click the play icon (▶) on any test card to run just that test.

## Simulate Conversation

The **Simulate** feature lets you run real-time conversation simulations without creating permanent tests:

<Steps>
  <Step title="Open Simulator">
    Click the chat bubble icon (💬) in the Tests header.
  </Step>

  <Step title="Configure Parameters">
    Set first message, persona, goal, and turn limit (default 10).
  </Step>

  <Step title="Start Simulation">
    Click to begin - watch the conversation unfold in real-time with streaming messages.
  </Step>

  <Step title="Review Results">
    See full conversation history and final outcome evaluation.
  </Step>
</Steps>

<Info>
  Simulate is perfect for exploratory testing. Use regular Scenario/Simulation tests when you need repeatable, automated validation.
</Info>

## History Tab

The **History** tab provides comprehensive test run analytics:

### Run Overview

| Column     | Description                               |
| ---------- | ----------------------------------------- |
| Date       | When the run was executed                 |
| Status     | Pass count, fail count, or running status |
| Batch Info | Shows repetition info for batch runs      |
| Actions    | View details, retry failed, retry all     |

### Detailed Results

Click any run to open the detail modal:

* **Individual Test Results** - Pass/fail status with evaluation reasoning
* **Agent Responses** - Full agent output for each test
* **Conversation History** - Complete message flow (for simulation tests)
* **Retry Options** - Re-run failed tests or entire batch

### Trend Analysis

For batch runs with multiple repetitions:

* **Success Rate** - Percentage of tests passing across all repetitions
* **Consistency Metrics** - Identify tests with intermittent failures
* **Performance Patterns** - Spot degradation over time

## Results and Analysis

### Test Cards

Each test card displays:

| Icon | Status  | Meaning                              |
| ---- | ------- | ------------------------------------ |
| ✓    | Pass    | Agent response met success criteria  |
| ✗    | Fail    | Agent response did not meet criteria |
| ⟳    | Running | Test is currently executing          |
| ○    | Pending | Not run yet                          |

### Inline Details

Click a test card to expand inline details:

* **Agent Response** - What the agent actually said
* **Evaluation Reason** - Why it passed or failed
* **Full Conversation** - Complete message history (for simulations)

### Batch Results

For batch runs, test cards show aggregated results:

* **Success Rate** - "8/10 passed" format
* **Latest Result** - Most recent execution status
* **Trend Indicator** - Visual indicator of consistency

## Copilot Integration

Failed tests automatically generate improvement suggestions for Copilot:

<Steps>
  <Step title="Run Tests">
    Execute your test suite and identify failing tests.
  </Step>

  <Step title="Generate Suggestions">
    The system analyzes failures and creates targeted improvement prompts.
  </Step>

  <Step title="Open Copilot">
    Click the generated prompt to open Copilot with context about specific failures.
  </Step>

  <Step title="Apply Improvements">
    Copilot suggests prompt modifications based on test failure analysis.
  </Step>

  <Step title="Re-test">
    Run tests again to validate improvements.
  </Step>
</Steps>

<Warning>
  Copilot integration requires test failures to generate meaningful suggestions. Ensure your success criteria are specific enough to catch real issues.
</Warning>

## Managing Tests

### Edit a Test

1. Click the menu (⋮) on a test card → **Edit**
2. Modify conversation flow, success criteria, or examples
3. Save changes

### Clone a Test

1. Click the menu (⋮) → **Clone**
2. Creates an exact copy for creating variations
3. Useful for testing different personas with same criteria

### Delete a Test

1. Click the menu (⋮) → **Delete**
2. Confirm deletion (irreversible)

### Organize Tests

* **Name Clearly** - Use descriptive names like "Greeting - Professional Tone"
* **Group by Feature** - Create tests for each major capability
* **Version Control** - Clone tests when making significant criteria changes

## Advanced Features

### Real-time Streaming

Simulation tests stream responses in real-time:

* Watch conversations develop naturally
* See agent thinking process
* Identify response delays or issues

### Dynamic Variable Injection

Context variables are injected dynamically:

* Set once, apply to all tests
* Override per-test if needed
* Track variable impact on results

### Historical Comparison

Compare test results across runs:

* Identify regression patterns
* Track improvement over time
* Validate deployment readiness

## Best Practices

<AccordionGroup>
  <Accordion title="Test Critical Paths" icon="route">
    Create tests for your most important flows: greetings, main use case, objections, and compliance.
  </Accordion>

  <Accordion title="Use Success/Failure Examples" icon="scale">
    Provide clear examples so the LLM evaluator understands what "good" and "bad" responses look like.
  </Accordion>

  <Accordion title="Set Context Variables" icon="variable">
    If your agent uses {{fecha_y_hora_actual}} or input parameters, set them in the context panel before running tests.
  </Accordion>

  <Accordion title="Leverage Batch Testing" icon="repeat">
    Run tests multiple times to identify consistency issues and improve reliability.
  </Accordion>

  <Accordion title="Monitor Trends" icon="chart-line">
    Use the History tab to track test performance over time and catch regressions early.
  </Accordion>

  <Accordion title="Integrate with Development" icon="code">
    Run tests before publishing changes and use Copilot integration for continuous improvement.
  </Accordion>
</AccordionGroup>

## Troubleshooting

### Tests Won't Run

<Steps>
  <Step title="Check Context Variables">
    Ensure required variables like `fecha_y_hora_actual` are set.
  </Step>

  <Step title="Verify Success Criteria">
    Make sure success conditions are specific and measurable.
  </Step>

  <Step title="Review Agent Configuration">
    Confirm your agent has proper prompts and knowledge base setup.
  </Step>
</Steps>

### Inconsistent Results

* **Increase Batch Size** - Run tests 3-5 times to identify patterns
* **Refine Success Criteria** - Make conditions more specific
* **Check Variable Values** - Ensure context variables are set correctly

### Failed Evaluations

* **Review Examples** - Add more success/failure examples
* **Simplify Criteria** - Break complex conditions into multiple tests
* **Use Copilot** - Let AI suggest improvements based on failures

## Next Steps

<CardGroup cols={2}>
  <Card title="Evaluations" icon="check-double" href="/advanced-editor/evaluations">
    Set up post-call quality metrics
  </Card>

  <Card title="Playground" icon="gamepad" href="/advanced-editor/playground">
    Test your agent interactively
  </Card>

  <Card title="Copilot" icon="robot" href="/advanced-editor/copilot">
    Get AI help improving your prompts
  </Card>

  <Card title="Input Params" icon="sliders" href="/advanced-editor/input-params">
    Configure variables for tests and calls
  </Card>
</CardGroup>
