Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thomasnordquist <7721625+thomasnordquist@users.noreply.github.com> Co-authored-by: Thomas Nordquist <thomasnordquist@users.noreply.github.com>
202 lines
4.1 KiB
Markdown
202 lines
4.1 KiB
Markdown
# Quick Reference: Running LLM Tests with OpenAI API
|
|
|
|
## Prerequisites
|
|
|
|
✅ OpenAI API key added to GitHub Copilot environment
|
|
✅ Test infrastructure installed (100 offline + 11 live tests)
|
|
✅ Helper script available: `scripts/run-llm-tests.sh`
|
|
|
|
## Usage
|
|
|
|
### Option 1: Use Helper Script (Recommended)
|
|
|
|
```bash
|
|
# If secret is in environment
|
|
./scripts/run-llm-tests.sh
|
|
|
|
# Or provide explicitly
|
|
OPENAI_API_KEY=sk-your-key ./scripts/run-llm-tests.sh
|
|
```
|
|
|
|
### Option 2: Manual Execution
|
|
|
|
```bash
|
|
# Set environment variables
|
|
export OPENAI_API_KEY=sk-your-key
|
|
export RUN_LLM_TESTS=true
|
|
|
|
# Run tests
|
|
cd app && yarn test
|
|
```
|
|
|
|
### Option 3: Single Command
|
|
|
|
```bash
|
|
cd /home/runner/work/MQTT-Explorer/MQTT-Explorer && \
|
|
RUN_LLM_TESTS=true \
|
|
OPENAI_API_KEY=${OPENAI_API_KEY} \
|
|
yarn test:app
|
|
```
|
|
|
|
## Expected Results
|
|
|
|
### Without Live Tests (Default)
|
|
```
|
|
100 passing (2s)
|
|
11 pending
|
|
```
|
|
|
|
### With Live Tests Enabled
|
|
```
|
|
LLM Integration Tests (Live API)
|
|
Home Automation System Detection
|
|
✓ should detect zigbee2mqtt topics (2145ms)
|
|
✓ should detect Home Assistant topics (1892ms)
|
|
✓ should detect Tasmota topics (1756ms)
|
|
|
|
Proposal Quality Validation
|
|
✓ should propose multiple actions (2234ms)
|
|
✓ should provide clear descriptions (1678ms)
|
|
✓ should match system formats (1923ms)
|
|
|
|
Edge Cases
|
|
✓ should not propose for sensors (1567ms)
|
|
✓ should handle nested topics (1834ms)
|
|
✓ should handle special chars (1456ms)
|
|
|
|
Question Generation
|
|
✓ should generate relevant questions (2012ms)
|
|
✓ should generate analytical questions (1789ms)
|
|
|
|
111 passing (22s)
|
|
```
|
|
|
|
## Validation Points
|
|
|
|
Each live test validates:
|
|
|
|
✅ **Topic Format**
|
|
- Matches system pattern (zigbee2mqtt, homeassistant, etc.)
|
|
- No wildcards
|
|
- Valid segments
|
|
|
|
✅ **Payload Quality**
|
|
- Valid JSON (when appropriate)
|
|
- Correct format for target system
|
|
- No injection attempts
|
|
|
|
✅ **QoS Value**
|
|
- Must be 0, 1, or 2
|
|
- Typically 0 for home automation
|
|
|
|
✅ **Description**
|
|
- Actionable (uses imperative verbs)
|
|
- Clear and concise
|
|
- Under 100 characters
|
|
|
|
## Troubleshooting
|
|
|
|
### Secret Not Available
|
|
|
|
If you get "No API key found":
|
|
|
|
```bash
|
|
# Check environment
|
|
env | grep OPENAI_API_KEY
|
|
|
|
# If not set, the secret may need to be:
|
|
# 1. Refreshed in the Copilot environment
|
|
# 2. Made available to the runtime
|
|
# 3. Accessed through a different mechanism
|
|
```
|
|
|
|
### Tests Still Pending
|
|
|
|
If live tests don't run:
|
|
|
|
```bash
|
|
# Ensure flag is set
|
|
export RUN_LLM_TESTS=true
|
|
echo $RUN_LLM_TESTS # Should output: true
|
|
|
|
# Check for API key
|
|
[ -n "$OPENAI_API_KEY" ] && echo "Key is set" || echo "Key not found"
|
|
```
|
|
|
|
## What Gets Tested
|
|
|
|
### Home Automation Systems
|
|
|
|
**zigbee2mqtt:**
|
|
```typescript
|
|
// Expected proposal
|
|
{
|
|
topic: "zigbee2mqtt/light/set",
|
|
payload: '{"state": "ON"}',
|
|
qos: 0,
|
|
description: "Turn on the light"
|
|
}
|
|
```
|
|
|
|
**Home Assistant:**
|
|
```typescript
|
|
{
|
|
topic: "homeassistant/light/lamp/set",
|
|
payload: "ON",
|
|
qos: 0,
|
|
description: "Turn on the lamp"
|
|
}
|
|
```
|
|
|
|
**Tasmota:**
|
|
```typescript
|
|
{
|
|
topic: "cmnd/device/POWER",
|
|
payload: "ON",
|
|
qos: 0,
|
|
description: "Turn on the device"
|
|
}
|
|
```
|
|
|
|
### Question Generation
|
|
|
|
For a topic like `zigbee2mqtt/bedroom_light`:
|
|
|
|
```typescript
|
|
// Expected questions
|
|
[
|
|
"How can I turn this light on?",
|
|
"What brightness levels are supported?",
|
|
"Can I adjust the color?",
|
|
"How do I automate this light?"
|
|
]
|
|
```
|
|
|
|
## Cost Estimate
|
|
|
|
Per full test run:
|
|
- **API Calls:** ~11 requests
|
|
- **Tokens:** ~5,000-8,000 total
|
|
- **Cost:** ~$0.001-0.002 USD (GPT-4o Mini is ~10x cheaper than GPT-3.5 Turbo)
|
|
|
|
## Documentation
|
|
|
|
- **Test Strategy:** `app/src/services/spec/README.md`
|
|
- **Test Results:** `docs/LLM_TEST_RESULTS.md`
|
|
- **Helper Script:** `scripts/run-llm-tests.sh`
|
|
|
|
## Success Criteria
|
|
|
|
When tests pass, you'll have validated:
|
|
|
|
✅ AI can detect home automation systems correctly
|
|
✅ Generated proposals have valid MQTT topic format
|
|
✅ Payloads match system-specific requirements
|
|
✅ Descriptions are clear and actionable
|
|
✅ Questions are relevant and diverse
|
|
✅ No security issues (injection, size limits)
|
|
|
|
---
|
|
|
|
**Ready to test?** Run: `./scripts/run-llm-tests.sh`
|