- Promtail: keep kubernetes-pods in namespace onelab; tag host file logs (host-logs) - Grafana: enable dashboard sidecar; ConfigMap onelab-logs.json - Dashboard: stats (total/error/warn heuristics), logs panel, component + regex filters Made-with: Cursor
59 lines
3.4 KiB
Markdown
59 lines
3.4 KiB
Markdown
# Observability (Loki / Promtail / Grafana)
|
||
|
||
The umbrella chart under [`gitops/observability/`](../observability/) deploys:
|
||
|
||
- **Loki** — log storage (SingleBinary, filesystem PVC, 7-day retention by default).
|
||
- **Promtail** — DaemonSet: Kubernetes pod logs (`/var/log/pods`) plus **OneLab file logs** from the same host path the app chart uses (`/opt/onelab/logs` by default).
|
||
- **Grafana** — explore logs; datasource points at this release’s Loki gateway.
|
||
|
||
It is synced by the **same** Argo CD Application as the OneLab chart ([`gitops/argocd/application.yaml`](../argocd/application.yaml)): second `sources` entry, Helm release name **`onelab-obs`** (so services are like `onelab-obs-loki-gateway`).
|
||
|
||
## First-time setup
|
||
|
||
1. **Change the Grafana admin password** in [`gitops/observability/values.yaml`](../observability/values.yaml) (`grafana.adminPassword`) or switch to `admin.existingSecret` per the upstream Grafana chart.
|
||
2. **Align host paths** — if you change `persistence.hostPath.logs` for OneLab, update `promtail.extraVolumes` / `extraVolumeMounts` in the same `values.yaml` so Promtail still reads the shared log directory.
|
||
3. **Multi-node** — with `hostPath` logs, each node only sees its own files; Promtail runs on every node, so you still get coverage when pods move.
|
||
|
||
## OneLab-only ingestion
|
||
|
||
Promtail adds **`extraRelabelConfigs`** so the **kubernetes-pods** job **keeps only** pods in namespace **`onelab`**. Other namespaces no longer reach Loki (Explore only sees OneLab). Host file logs under `/opt/onelab/logs` are tagged with **`namespace: onelab`** and **`component: host-logs`** so they appear in the same queries.
|
||
|
||
Existing Loki data from before this change may still show non-`onelab` streams until **retention** drops them; for a clean index you would need to wipe the Loki PVC (destructive).
|
||
|
||
## Dashboard: **OneLab logs**
|
||
|
||
Grafana’s **dashboard sidecar** loads ConfigMap **`…-dashboard-onelab-logs`** (JSON: `dashboards/onelab-logs.json`). Open **Dashboards → OneLab logs** (`uid` `onelab-logs`):
|
||
|
||
- **Component** — multi-select from `label_values({namespace="onelab"}, component)` (includes **`host-logs`** for file logs).
|
||
- **Line filter** — regex applied to log line content (`.*` = all).
|
||
- Stat panels: total lines, heuristic **error** / **warning** counts (tuned for typical text logs, not strict JSON parsing).
|
||
|
||
## Access Grafana
|
||
|
||
An **Ingress** named **`grafana-onelab`** is created by the umbrella chart (`templates/ingress-grafana-onelab.yaml`), Traefik + cert-manager, matching the OneLab web UI pattern in `gitops/values/k3s-example.yaml`:
|
||
|
||
- Host: **`grafana.k8s.selair.it`** — edit `grafanaOnelabIngress` and `grafana.ini.server` in `gitops/observability/values.yaml` together.
|
||
- TLS Secret: **`grafana-tls-k8s-selair`** (cert-manager with `letsencrypt-prod`).
|
||
|
||
Point DNS at your ingress, sync the app, then open `https://<grafana-host>/` (user `admin` until you change values).
|
||
|
||
For debugging without DNS:
|
||
|
||
```bash
|
||
kubectl -n onelab port-forward svc/onelab-obs-grafana 3000:80
|
||
```
|
||
|
||
## Upgrading chart dependencies
|
||
|
||
From `gitops/observability/`:
|
||
|
||
```bash
|
||
helm dependency update
|
||
```
|
||
|
||
Commit updated `Chart.lock` and `charts/*.tgz` if you want Argo to render without calling remote Helm repos at sync time.
|
||
|
||
## OneLab `logs.path`
|
||
|
||
The OneLab chart now sets `onelab.logs.path: "/logs"` in the generated configuration so application file logs match the `/logs` volume mount (see Enterprise guide §7.2).
|