Add observability for LLM topic context inclusion (#1038)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thomasnordquist <7721625+thomasnordquist@users.noreply.github.com> Co-authored-by: Thomas Nordquist <thomasnordquist@users.noreply.github.com>
This commit is contained in:
201
TESTING_WITH_API.md
Normal file
201
TESTING_WITH_API.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# Quick Reference: Running LLM Tests with OpenAI API
|
||||
|
||||
## Prerequisites
|
||||
|
||||
✅ OpenAI API key added to GitHub Copilot environment
|
||||
✅ Test infrastructure installed (100 offline + 11 live tests)
|
||||
✅ Helper script available: `scripts/run-llm-tests.sh`
|
||||
|
||||
## Usage
|
||||
|
||||
### Option 1: Use Helper Script (Recommended)
|
||||
|
||||
```bash
|
||||
# If secret is in environment
|
||||
./scripts/run-llm-tests.sh
|
||||
|
||||
# Or provide explicitly
|
||||
OPENAI_API_KEY=sk-your-key ./scripts/run-llm-tests.sh
|
||||
```
|
||||
|
||||
### Option 2: Manual Execution
|
||||
|
||||
```bash
|
||||
# Set environment variables
|
||||
export OPENAI_API_KEY=sk-your-key
|
||||
export RUN_LLM_TESTS=true
|
||||
|
||||
# Run tests
|
||||
cd app && yarn test
|
||||
```
|
||||
|
||||
### Option 3: Single Command
|
||||
|
||||
```bash
|
||||
cd /home/runner/work/MQTT-Explorer/MQTT-Explorer && \
|
||||
RUN_LLM_TESTS=true \
|
||||
OPENAI_API_KEY=${OPENAI_API_KEY} \
|
||||
yarn test:app
|
||||
```
|
||||
|
||||
## Expected Results
|
||||
|
||||
### Without Live Tests (Default)
|
||||
```
|
||||
100 passing (2s)
|
||||
11 pending
|
||||
```
|
||||
|
||||
### With Live Tests Enabled
|
||||
```
|
||||
LLM Integration Tests (Live API)
|
||||
Home Automation System Detection
|
||||
✓ should detect zigbee2mqtt topics (2145ms)
|
||||
✓ should detect Home Assistant topics (1892ms)
|
||||
✓ should detect Tasmota topics (1756ms)
|
||||
|
||||
Proposal Quality Validation
|
||||
✓ should propose multiple actions (2234ms)
|
||||
✓ should provide clear descriptions (1678ms)
|
||||
✓ should match system formats (1923ms)
|
||||
|
||||
Edge Cases
|
||||
✓ should not propose for sensors (1567ms)
|
||||
✓ should handle nested topics (1834ms)
|
||||
✓ should handle special chars (1456ms)
|
||||
|
||||
Question Generation
|
||||
✓ should generate relevant questions (2012ms)
|
||||
✓ should generate analytical questions (1789ms)
|
||||
|
||||
111 passing (22s)
|
||||
```
|
||||
|
||||
## Validation Points
|
||||
|
||||
Each live test validates:
|
||||
|
||||
✅ **Topic Format**
|
||||
- Matches system pattern (zigbee2mqtt, homeassistant, etc.)
|
||||
- No wildcards
|
||||
- Valid segments
|
||||
|
||||
✅ **Payload Quality**
|
||||
- Valid JSON (when appropriate)
|
||||
- Correct format for target system
|
||||
- No injection attempts
|
||||
|
||||
✅ **QoS Value**
|
||||
- Must be 0, 1, or 2
|
||||
- Typically 0 for home automation
|
||||
|
||||
✅ **Description**
|
||||
- Actionable (uses imperative verbs)
|
||||
- Clear and concise
|
||||
- Under 100 characters
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Secret Not Available
|
||||
|
||||
If you get "No API key found":
|
||||
|
||||
```bash
|
||||
# Check environment
|
||||
env | grep OPENAI_API_KEY
|
||||
|
||||
# If not set, the secret may need to be:
|
||||
# 1. Refreshed in the Copilot environment
|
||||
# 2. Made available to the runtime
|
||||
# 3. Accessed through a different mechanism
|
||||
```
|
||||
|
||||
### Tests Still Pending
|
||||
|
||||
If live tests don't run:
|
||||
|
||||
```bash
|
||||
# Ensure flag is set
|
||||
export RUN_LLM_TESTS=true
|
||||
echo $RUN_LLM_TESTS # Should output: true
|
||||
|
||||
# Check for API key
|
||||
[ -n "$OPENAI_API_KEY" ] && echo "Key is set" || echo "Key not found"
|
||||
```
|
||||
|
||||
## What Gets Tested
|
||||
|
||||
### Home Automation Systems
|
||||
|
||||
**zigbee2mqtt:**
|
||||
```typescript
|
||||
// Expected proposal
|
||||
{
|
||||
topic: "zigbee2mqtt/light/set",
|
||||
payload: '{"state": "ON"}',
|
||||
qos: 0,
|
||||
description: "Turn on the light"
|
||||
}
|
||||
```
|
||||
|
||||
**Home Assistant:**
|
||||
```typescript
|
||||
{
|
||||
topic: "homeassistant/light/lamp/set",
|
||||
payload: "ON",
|
||||
qos: 0,
|
||||
description: "Turn on the lamp"
|
||||
}
|
||||
```
|
||||
|
||||
**Tasmota:**
|
||||
```typescript
|
||||
{
|
||||
topic: "cmnd/device/POWER",
|
||||
payload: "ON",
|
||||
qos: 0,
|
||||
description: "Turn on the device"
|
||||
}
|
||||
```
|
||||
|
||||
### Question Generation
|
||||
|
||||
For a topic like `zigbee2mqtt/bedroom_light`:
|
||||
|
||||
```typescript
|
||||
// Expected questions
|
||||
[
|
||||
"How can I turn this light on?",
|
||||
"What brightness levels are supported?",
|
||||
"Can I adjust the color?",
|
||||
"How do I automate this light?"
|
||||
]
|
||||
```
|
||||
|
||||
## Cost Estimate
|
||||
|
||||
Per full test run:
|
||||
- **API Calls:** ~11 requests
|
||||
- **Tokens:** ~5,000-8,000 total
|
||||
- **Cost:** ~$0.001-0.002 USD (GPT-4o Mini is ~10x cheaper than GPT-3.5 Turbo)
|
||||
|
||||
## Documentation
|
||||
|
||||
- **Test Strategy:** `app/src/services/spec/README.md`
|
||||
- **Test Results:** `docs/LLM_TEST_RESULTS.md`
|
||||
- **Helper Script:** `scripts/run-llm-tests.sh`
|
||||
|
||||
## Success Criteria
|
||||
|
||||
When tests pass, you'll have validated:
|
||||
|
||||
✅ AI can detect home automation systems correctly
|
||||
✅ Generated proposals have valid MQTT topic format
|
||||
✅ Payloads match system-specific requirements
|
||||
✅ Descriptions are clear and actionable
|
||||
✅ Questions are relevant and diverse
|
||||
✅ No security issues (injection, size limits)
|
||||
|
||||
---
|
||||
|
||||
**Ready to test?** Run: `./scripts/run-llm-tests.sh`
|
||||
Reference in New Issue
Block a user