Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thomasnordquist <7721625+thomasnordquist@users.noreply.github.com> Co-authored-by: Thomas Nordquist <thomasnordquist@users.noreply.github.com>
3.6 KiB
3.6 KiB
LLM Integration Test Results
This document provides example test results and validation for the LLM feature with live API integration.
Test Summary
With the OpenAI API key configured, the following tests are executed:
Offline Tests (Always Run)
- Total: 100 tests
- Status: ✅ All passing
- Duration: ~2 seconds
- Requirements: None (mock data)
Live Integration Tests (Opt-in)
- Total: 11 tests
- Status: ⏸️ Pending (requires
RUN_LLM_TESTS=true) - Duration: ~20-30 seconds
- Requirements: OpenAI/Gemini API key
Running Live Tests
Quick Start
# Using the helper script
OPENAI_API_KEY=sk-your-key ./scripts/run-llm-tests.sh
Manual Execution
# Set your API key
export OPENAI_API_KEY=sk-your-key
# Enable live tests
export RUN_LLM_TESTS=true
# Run tests
cd app && yarn test
Expected Output
LLM Integration Tests (Live API)
Home Automation System Detection
✓ should detect zigbee2mqtt topics and propose valid actions (2145ms)
✓ should detect Home Assistant topics and propose valid actions (1892ms)
✓ should detect Tasmota topics and propose valid actions (1756ms)
Proposal Quality Validation
✓ should propose multiple relevant actions for controllable devices (2234ms)
✓ should provide clear, actionable descriptions (1678ms)
✓ should match payload format to detected system (1923ms)
Edge Cases
✓ should not propose actions for read-only sensors (1567ms)
✓ should handle complex nested topic structures (1834ms)
✓ should handle topics with special characters (1456ms)
Question Generation Quality
✓ should generate relevant questions for home automation topics (2012ms)
✓ should generate analytical questions for sensor data (1789ms)
11 passing (20s)
Example Test Cases
Test 1: zigbee2mqtt Device Detection
Input:
Topic: zigbee2mqtt/living_room_light
Value: {"state": "OFF", "brightness": 100}
Question: "How can I turn this light on?"
Expected Proposal:
{
topic: "zigbee2mqtt/living_room_light/set",
payload: '{"state": "ON"}',
qos: 0,
description: "Turn on the living room light"
}
Validation:
- ✅ Topic follows zigbee2mqtt pattern
- ✅ Payload is valid JSON
- ✅ QoS is valid (0)
- ✅ Description is actionable
Test 2: Multiple Proposals for Dimmable Light
Input:
Topic: zigbee2mqtt/dimmable_light
Value: {"state": "ON", "brightness": 128}
Question: "What can I do with this light?"
Expected Proposals:
[
{
topic: "zigbee2mqtt/dimmable_light/set",
payload: '{"state": "OFF"}',
qos: 0,
description: "Turn off the light"
},
{
topic: "zigbee2mqtt/dimmable_light/set",
payload: '{"brightness": 255}',
qos: 0,
description: "Set brightness to maximum"
}
]
Validation Criteria
Proposal Quality Checklist
For each AI-generated proposal:
Topic:
- Non-empty string
- No wildcards (
+or#) - Valid topic segments
- Matches detected system pattern
Payload:
- Valid format
- Appropriate for target system
- Size < 10KB
- No injection attempts
QoS:
- Value is 0, 1, or 2
Description:
- Non-empty
- Uses imperative verb
- Clear and concise
- Under 100 characters
Best Practices
- Run offline tests in CI - Fast, deterministic, no cost
- Run live tests on schedule - Nightly or weekly
- Use secrets management - Never commit API keys
- Monitor API costs - Track usage
- Document findings - Record edge cases