> ## Documentation Index > Fetch the complete documentation index at: https://docs.meetzy.io/llms.txt > Use this file to discover all available pages before exploring further. # Tests > Create and run automated regression tests to ensure your AI agent quality ## What are Tests? Tests are automated regression tests that validate your agent's behavior. Unlike **Evaluations** (which run after every real call), Tests run on-demand in the Advanced Editor to verify your agent responds correctly to specific scenarios before you deploy changes. Run tests after modifying prompts, knowledge base, or actions to catch regressions before they affect real customers. ## Accessing Tests Tests are located in the right panel of the Advanced Editor. Click the **Tests** tab (clipboard icon) to access them. ## Tests vs Evaluations | Feature | Tests | Evaluations | | ------------- | ------------------------------------------- | --------------------------------- | | When they run | On-demand in the editor | After every real call | | Purpose | Validate specific scenarios before deploy | Measure call quality over time | | Setup | Define conversation flow + success criteria | Define yes/no questions | | Output | Pass/fail per test | Pass/fail per evaluation per call | ## Test Interface Overview The Tests panel features a modern tabbed interface with two main sections: Your test suite with filtering, creation, and execution controls Past test runs with detailed results and trend analysis ### Header Actions The header includes quick access to: * **Context Variables** (``) - Set dynamic variables for test execution * **Simulate** (💬) - Run one-off conversation simulations * **Batch Controls** - Run tests multiple times for reliability testing ## Test Types ### Scenario Tests A **Scenario** test defines a fixed conversation flow. You specify the exact messages (user and agent) and a success condition. The system evaluates whether the agent's response meets your criteria. | Property | Description | Example | | ----------------- | --------------------------------- | ------------------------------------------------- | | Chat history | Alternating user/agent messages | User: "Hola" → Agent: "Buenos días..." | | Success condition | What the agent should achieve | "El agente responde de manera profesional y útil" | | Success examples | Sample responses that should pass | "Buenos días, ¿en qué puedo ayudarle?" | | Failure examples | Sample responses that should fail | "No tengo idea" | ### Simulation Tests A **Simulation** test uses an AI to simulate a user. You define a persona, goal, and first message. The system runs a full conversation and evaluates the outcome. | Property | Description | Example | | ------------- | ----------------------------- | --------------------------------------- | | First message | How the simulated user starts | "Hola, buenos días" | | Persona | Simulated user's profile | "Cliente interesado en información" | | Goal | What the simulated user wants | "Obtener información sobre el servicio" | ## Creating a Test Click the Tests tab in the right panel and ensure you're on the **Tests** tab (not History). Click the "New" button to start the creation wizard. Enter a descriptive name and choose Scenario or Simulation. For Scenario: Add the conversation flow (user/agent messages). For Simulation: Set first message, persona, and goal. Define the success condition and add success/failure examples to guide the evaluation. The test is created and appears in the list. ## Context Variables Tests can use **dynamic variables** that replace placeholders in your agent's prompts during test execution. Access the Context Variables panel by clicking the **``** icon in the header. ### Built-in Variables | Variable | Purpose | Example | | --------------------- | ---------------------------- | ---------------------------------------------------- | | `fecha_y_hora_actual` | Current date/time in Spanish | "Hoy es Miércoles 12 de Febrero de 2026 a las 14:30" | ### Custom Variables You can also set values for your agent's input parameters, which will be injected during test execution. Click the **``** icon in the Tests header. A badge shows how many variables have values set. Click the refresh icon next to `fecha_y_hora_actual` to update to the current time. Enter values for any input parameters your agent expects (customer name, order number, etc.). Variables are automatically injected into the agent's context during execution. Use `fecha_y_hora_actual` when your agent greets with "Buenos días" or references the current date. Tests run with the values you set, not the live time. ## Batch Testing and Reliability ### Multiple Repetitions Run tests multiple times to assess reliability and identify intermittent failures: 1. Use the **+/-** controls next to the "Run All" button to set repetition count (1-10) 2. Click **Run All** - the system executes each test the specified number of times 3. View aggregated results in the History tab Batch testing helps identify consistency issues. If a test passes 8/10 times, you may need to refine your prompts or success criteria. ### Batch Progress Tracking During batch execution: * Progress indicator shows current run (e.g., "Running 2/5") * Individual test cards update with interim results * History tab populates with each completed run ## Running Tests ### Filter Tests Use filter buttons to focus on specific test states: * **All** - Show every test * **Pass** - Only tests that passed last run (with pass count) * **Fail** - Only tests that failed last run (with fail count) ### Run All Tests Click **Run All** to execute every test (or all visible tests if filtered). When batch count > 1, button shows "Run x\[count]". ### Run Selected Tests 1. Use checkboxes to select specific tests 2. Click **Run All** (runs only selected when any are checked) ### Run Individual Tests Click the play icon (▶) on any test card to run just that test. ## Simulate Conversation The **Simulate** feature lets you run real-time conversation simulations without creating permanent tests: Click the chat bubble icon (💬) in the Tests header. Set first message, persona, goal, and turn limit (default 10). Click to begin - watch the conversation unfold in real-time with streaming messages. See full conversation history and final outcome evaluation. Simulate is perfect for exploratory testing. Use regular Scenario/Simulation tests when you need repeatable, automated validation. ## History Tab The **History** tab provides comprehensive test run analytics: ### Run Overview | Column | Description | | ---------- | ----------------------------------------- | | Date | When the run was executed | | Status | Pass count, fail count, or running status | | Batch Info | Shows repetition info for batch runs | | Actions | View details, retry failed, retry all | ### Detailed Results Click any run to open the detail modal: * **Individual Test Results** - Pass/fail status with evaluation reasoning * **Agent Responses** - Full agent output for each test * **Conversation History** - Complete message flow (for simulation tests) * **Retry Options** - Re-run failed tests or entire batch ### Trend Analysis For batch runs with multiple repetitions: * **Success Rate** - Percentage of tests passing across all repetitions * **Consistency Metrics** - Identify tests with intermittent failures * **Performance Patterns** - Spot degradation over time ## Results and Analysis ### Test Cards Each test card displays: | Icon | Status | Meaning | | ---- | ------- | ------------------------------------ | | ✓ | Pass | Agent response met success criteria | | ✗ | Fail | Agent response did not meet criteria | | ⟳ | Running | Test is currently executing | | ○ | Pending | Not run yet | ### Inline Details Click a test card to expand inline details: * **Agent Response** - What the agent actually said * **Evaluation Reason** - Why it passed or failed * **Full Conversation** - Complete message history (for simulations) ### Batch Results For batch runs, test cards show aggregated results: * **Success Rate** - "8/10 passed" format * **Latest Result** - Most recent execution status * **Trend Indicator** - Visual indicator of consistency ## Copilot Integration Failed tests automatically generate improvement suggestions for Copilot: Execute your test suite and identify failing tests. The system analyzes failures and creates targeted improvement prompts. Click the generated prompt to open Copilot with context about specific failures. Copilot suggests prompt modifications based on test failure analysis. Run tests again to validate improvements. Copilot integration requires test failures to generate meaningful suggestions. Ensure your success criteria are specific enough to catch real issues. ## Managing Tests ### Edit a Test 1. Click the menu (⋮) on a test card → **Edit** 2. Modify conversation flow, success criteria, or examples 3. Save changes ### Clone a Test 1. Click the menu (⋮) → **Clone** 2. Creates an exact copy for creating variations 3. Useful for testing different personas with same criteria ### Delete a Test 1. Click the menu (⋮) → **Delete** 2. Confirm deletion (irreversible) ### Organize Tests * **Name Clearly** - Use descriptive names like "Greeting - Professional Tone" * **Group by Feature** - Create tests for each major capability * **Version Control** - Clone tests when making significant criteria changes ## Advanced Features ### Real-time Streaming Simulation tests stream responses in real-time: * Watch conversations develop naturally * See agent thinking process * Identify response delays or issues ### Dynamic Variable Injection Context variables are injected dynamically: * Set once, apply to all tests * Override per-test if needed * Track variable impact on results ### Historical Comparison Compare test results across runs: * Identify regression patterns * Track improvement over time * Validate deployment readiness ## Best Practices Create tests for your most important flows: greetings, main use case, objections, and compliance. Provide clear examples so the LLM evaluator understands what "good" and "bad" responses look like. If your agent uses {{fecha_y_hora_actual}} or input parameters, set them in the context panel before running tests. Run tests multiple times to identify consistency issues and improve reliability. Use the History tab to track test performance over time and catch regressions early. Run tests before publishing changes and use Copilot integration for continuous improvement. ## Troubleshooting ### Tests Won't Run Ensure required variables like `fecha_y_hora_actual` are set. Make sure success conditions are specific and measurable. Confirm your agent has proper prompts and knowledge base setup. ### Inconsistent Results * **Increase Batch Size** - Run tests 3-5 times to identify patterns * **Refine Success Criteria** - Make conditions more specific * **Check Variable Values** - Ensure context variables are set correctly ### Failed Evaluations * **Review Examples** - Add more success/failure examples * **Simplify Criteria** - Break complex conditions into multiple tests * **Use Copilot** - Let AI suggest improvements based on failures ## Next Steps Set up post-call quality metrics Test your agent interactively Get AI help improving your prompts Configure variables for tests and calls