docs(gitops): single Argo CD README, remove redundant docs
Made-with: Cursor
This commit is contained in:
225
gitops/README.md
225
gitops/README.md
@@ -1,78 +1,217 @@
|
||||
# OneLab GitOps (k3s + Argo CD)
|
||||
# OneLab GitOps (Argo CD)
|
||||
|
||||
This directory holds the **Helm chart** that replaces `docker stack deploy` from the legacy Swarm installer (`app/docker-compose.yml`).
|
||||
This directory is the **declarative source** for OneLab on Kubernetes. Argo CD applies two **Helm-based sources** from Git (Argo invokes Helm internally; you do not run a separate Helm install workflow).
|
||||
|
||||
Legacy Swarm install lives under [`app/`](../app/) (`docker-compose.yml`); this tree replaces `docker stack deploy` for k3s/Kubernetes.
|
||||
|
||||
## Layout
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `charts/onelab` | Helm chart (StatefulSets, Deployments, Services, ConfigMaps, Secrets) |
|
||||
| `values/*.yaml` | Environment-specific overrides (non-secret defaults; use sealed/external secrets for prod) |
|
||||
| `argocd/application.yaml` | `Application` (multi-source): OneLab chart + [`observability/`](observability/) (Loki/Promtail/Grafana) |
|
||||
| `observability/` | Umbrella Helm chart for log aggregation (same Argo app, release `onelab-obs`) |
|
||||
| [`charts/onelab`](charts/onelab) | OneLab chart (StatefulSets, Deployments, Services, ConfigMaps, Secrets) — **Argo source 1** |
|
||||
| [`values/`](values/) | Environment values (e.g. [`values/k3s-example.yaml`](values/k3s-example.yaml)); reference from `helm.valueFiles` |
|
||||
| [`observability/`](observability/) | Loki / Promtail / Grafana umbrella chart — **Argo source 2** (`releaseName: onelab-obs`) |
|
||||
| [`argocd/application.yaml`](argocd/application.yaml) | `Application` manifest (`spec.sources`, namespace `onelab`) |
|
||||
| [`argocd/jsonpatch-multisource.json`](argocd/jsonpatch-multisource.json) | One-time JSON patch if the live `Application` stuck on `spec.source` |
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **k3s** (or any Kubernetes) with default storage class for Postgres/Rabbit PVCs (e.g. `local-path`).
|
||||
2. **Image pull access** to `hub.andrewalliance.com` — create a docker-registry secret and reference it in `imagePullSecrets`:
|
||||
```bash
|
||||
kubectl create namespace onelab
|
||||
kubectl create secret docker-registry hub-andrewalliance -n onelab \
|
||||
--docker-server=hub.andrewalliance.com --docker-username=... --docker-password=...
|
||||
```
|
||||
3. **RabbitMQ TLS secret** (name `onelab-rabbit-tls` by default) — see `values/k3s-example.yaml` comments, or set `rabbitmq.tls.embed: true` with PEM strings in a **private** values file.
|
||||
4. **Host paths** (default): ensure `/opt/onelab/data` and `/opt/onelab/logs` exist on nodes that run workloads using `persistence.mode: hostPath`, or switch to RWX storage for multi-node.
|
||||
1. **Kubernetes** (e.g. k3s) with a default **StorageClass** for Postgres/Rabbit PVCs (e.g. `local-path`).
|
||||
2. **Image pull** to `hub.andrewalliance.com` — registry Secret + `imagePullSecrets` (see [`values/k3s-example.yaml`](values/k3s-example.yaml) and [Private registry credentials](#private-registry-credentials)).
|
||||
3. **RabbitMQ TLS** Secret `onelab-rabbit-tls` (or `rabbitmq.tls.embed` in a private values file) — [RabbitMQ TLS](#rabbitmq-tls).
|
||||
4. **Host paths** when using `persistence.mode: hostPath`: `/opt/onelab/data` and `/opt/onelab/logs` on nodes that run those pods, or use RWX storage for multi-node.
|
||||
|
||||
## Helm (without Argo CD)
|
||||
## Bootstrap (registry, Argo repo, TLS)
|
||||
|
||||
### Private registry credentials
|
||||
|
||||
By default, `gitops/values/k3s-example.yaml` matches the Swarm installer (`app/playbooks/tasks/manage-images.yml`): user **`public`**, password **`Andrew01..Release`**, and the chart creates Secret **`hub-andrewalliance`** when `registry.createPullSecret: true`.
|
||||
|
||||
To use other credentials, override `registry.username` / `registry.password` or create the secret manually:
|
||||
|
||||
```bash
|
||||
cd gitops/charts/onelab
|
||||
helm upgrade --install onelab . -n onelab --create-namespace \
|
||||
-f ../../values/k3s-example.yaml
|
||||
kubectl create secret docker-registry hub-andrewalliance -n onelab \
|
||||
--docker-server=hub.andrewalliance.com \
|
||||
--docker-username='YOUR_USER' \
|
||||
--docker-password='YOUR_PASSWORD'
|
||||
```
|
||||
|
||||
## Argo CD
|
||||
…and set `registry.createPullSecret: false` plus `imagePullSecrets: [{ name: hub-andrewalliance }]`.
|
||||
|
||||
1. Push this repository to a Git remote Argo CD can read.
|
||||
2. Edit `argocd/application.yaml`: `repoURL`, `targetRevision`, and values file as needed.
|
||||
3. `kubectl apply -f gitops/argocd/application.yaml` (from a machine with a working kubeconfig).
|
||||
#### StatefulSet pods still get `401 Unauthorized` / `ImagePullBackOff` after enabling registry auth
|
||||
|
||||
The Application uses **`spec.sources`** (Argo CD 2.6+): source 1 is the OneLab chart (`releaseName: onelab`), source 2 is [`observability/`](observability/) (`releaseName: onelab-obs`). Both deploy to namespace **`onelab`**.
|
||||
If `db-0` / `rabbitmq-0` were created **before** `imagePullSecrets` existed, their **Pod** spec can still use anonymous pulls until they are recreated:
|
||||
|
||||
Sync waves order Postgres → Redis/Rabbit/config → application pods.
|
||||
```bash
|
||||
kubectl delete pod -n onelab db-0 rabbitmq-0
|
||||
```
|
||||
|
||||
The chart adds a pod-template checksum so after you change registry settings in Git and **Argo syncs**, workloads normally roll; a one-time delete is enough if pods were created before pull secrets existed.
|
||||
|
||||
### Argo CD private Git repository
|
||||
|
||||
If the Application shows `authentication required: Unauthorized`, register the repo in Argo CD (CLI or UI):
|
||||
|
||||
```bash
|
||||
# Example; use a deploy token or PAT with repo read access
|
||||
argocd repo add https://git.luneski.fr/luneski/onelab-k8s.git \
|
||||
--username git \
|
||||
--password YOUR_TOKEN
|
||||
```
|
||||
|
||||
Then apply the Application:
|
||||
|
||||
```bash
|
||||
kubectl apply -f gitops/argocd/application.yaml
|
||||
```
|
||||
|
||||
**Single controller:** Use **only** this Argo CD `Application` for `onelab` / `onelab-obs`. Do not manage the same namespace with a separate **Helm CLI** release.
|
||||
|
||||
### RabbitMQ TLS
|
||||
|
||||
Secret `onelab-rabbit-tls` must exist before RabbitMQ starts (created once from `app/rabbit/ssl/` or your own PEMs).
|
||||
|
||||
### Argo CD version and observability stack
|
||||
|
||||
[`argocd/application.yaml`](argocd/application.yaml) uses **`spec.sources`** (two Helm charts in one Application). Use **Argo CD 2.6 or newer**.
|
||||
|
||||
If the `onelab` Application was created earlier with **`spec.source` only**, Argo will **not** show the observability resources until you remove `source` and set `sources` — see [Migrating `spec.source` → `spec.sources`](#migrating-specsource--specsources) below.
|
||||
|
||||
The second source installs Loki/Promtail/Grafana from [`observability/`](observability/) (`releaseName: onelab-obs`). Set a strong **`grafana.adminPassword`** in [`observability/values.yaml`](observability/values.yaml) before production — details in [Observability](#observability-loki--promtail--grafana).
|
||||
|
||||
## Deploy with Argo CD
|
||||
|
||||
1. Push this repo to a Git remote Argo CD can read.
|
||||
2. Register the repo in Argo CD (CLI or UI) if it is private — [Argo CD private Git repository](#argo-cd-private-git-repository).
|
||||
3. Edit [`argocd/application.yaml`](argocd/application.yaml): `repoURL`, `targetRevision`, and per-source `helm.valueFiles` if needed.
|
||||
4. Apply the Application:
|
||||
|
||||
```bash
|
||||
kubectl apply -f gitops/argocd/application.yaml
|
||||
```
|
||||
|
||||
**Requirements:** Argo CD **2.6+** (`spec.sources`).
|
||||
|
||||
Each entry under `spec.sources` has its own `helm.releaseName` and `helm.valueFiles` (paths are **relative to that source’s `path`**):
|
||||
|
||||
- Source `gitops/charts/onelab` → e.g. `../../values/k3s-example.yaml`
|
||||
- Source `gitops/observability` → e.g. `values.yaml`
|
||||
|
||||
Both targets deploy into namespace **`onelab`**. Sync waves order: Postgres → Redis/Rabbit/config → application workloads.
|
||||
|
||||
### Migrating `spec.source` → `spec.sources`
|
||||
|
||||
If the `onelab` `Application` was created earlier with **`spec.source` only**, a plain `kubectl apply` of the new file may **not** remove `spec.source`, and Argo will never reconcile the observability chart.
|
||||
|
||||
Check:
|
||||
|
||||
```bash
|
||||
kubectl get application onelab -n argocd -o jsonpath='{.spec.source}{"\n"}{.spec.sources}{"\n"}'
|
||||
```
|
||||
|
||||
If `source` is set and `sources` is empty, patch once (adjust `repoURL` in the patch file if needed):
|
||||
|
||||
```bash
|
||||
kubectl patch application onelab -n argocd --type json --patch-file gitops/argocd/jsonpatch-multisource.json
|
||||
```
|
||||
|
||||
Then sync in Argo (or wait for auto-sync).
|
||||
|
||||
### Single controller
|
||||
|
||||
Manage these workloads **only** through this Argo CD `Application`. Do not drive the same resources with a parallel **Helm CLI** release.
|
||||
|
||||
### Logs / Grafana
|
||||
|
||||
See [docs/OBSERVABILITY.md](docs/OBSERVABILITY.md). Change `grafana.adminPassword` in `observability/values.yaml` before relying on it in production.
|
||||
See [Observability (Loki / Promtail / Grafana)](#observability-loki--promtail--grafana) — set a strong `grafana.adminPassword` in [`observability/values.yaml`](observability/values.yaml) before production.
|
||||
|
||||
## Observability (Loki / Promtail / Grafana)
|
||||
|
||||
The umbrella chart under [`observability/`](observability/) deploys:
|
||||
|
||||
- **Loki** — log storage (SingleBinary, filesystem PVC, 7-day retention by default).
|
||||
- **Promtail** — DaemonSet: Kubernetes pod logs (`/var/log/pods`) plus **OneLab file logs** from the same host path the app chart uses (`/opt/onelab/logs` by default).
|
||||
- **Grafana** — explore logs; datasource points at this release’s Loki gateway.
|
||||
|
||||
It is synced by the **same** Argo CD Application as the OneLab chart ([`argocd/application.yaml`](argocd/application.yaml)): second `sources` entry, Argo **`helm.releaseName`** **`onelab-obs`** (so services are like `onelab-obs-loki-gateway`).
|
||||
|
||||
### First-time setup
|
||||
|
||||
1. **Change the Grafana admin password** in [`observability/values.yaml`](observability/values.yaml) (`grafana.adminPassword`) or switch to `admin.existingSecret` per the upstream Grafana chart.
|
||||
2. **Align host paths** — if you change `persistence.hostPath.logs` for OneLab, update `promtail.extraVolumes` / `extraVolumeMounts` in the same `values.yaml` so Promtail still reads the shared log directory.
|
||||
3. **Multi-node** — with `hostPath` logs, each node only sees its own files; Promtail runs on every node, so you still get coverage when pods move.
|
||||
|
||||
### OneLab-only ingestion
|
||||
|
||||
Promtail adds **`extraRelabelConfigs`** so the **kubernetes-pods** job **keeps only** pods in namespace **`onelab`**. Other namespaces no longer reach Loki (Explore only sees OneLab). Host file logs under `/opt/onelab/logs` are tagged with **`namespace: onelab`** and **`component: host-logs`** so they appear in the same queries.
|
||||
|
||||
Existing Loki data from before this change may still show non-`onelab` streams until **retention** drops them; for a clean index you would need to wipe the Loki PVC (destructive).
|
||||
|
||||
### Dashboard: **OneLab logs**
|
||||
|
||||
Grafana’s **dashboard sidecar** loads ConfigMap **`…-dashboard-onelab-logs`** (JSON: `observability/dashboards/onelab-logs.json`). Open **Dashboards → OneLab logs** (`uid` `onelab-logs`):
|
||||
|
||||
- **Component** — multi-select from `label_values({namespace="onelab"}, component)` (includes **`host-logs`** for file logs).
|
||||
- **Line filter** — regex applied to log line content (`.*` = all).
|
||||
- Stat panels: total lines, heuristic **error** / **warning** counts (tuned for typical text logs, not strict JSON parsing).
|
||||
|
||||
#### Grafana pod: `init-chown-data` CrashLoopBackOff
|
||||
|
||||
The upstream chart runs an init container as **root** to `chown` `/var/lib/grafana`. Clusters with **Pod Security Admission** (often on k3s) commonly block that. This repo sets **`grafana.initChownData.enabled: false`**; the Grafana pod keeps **`fsGroup: 472`** so the PVC is usually group-writable. If Grafana still cannot write to disk, delete the Grafana PVC once after the change or relax PSA for namespace `onelab`.
|
||||
|
||||
### Access Grafana
|
||||
|
||||
An **Ingress** named **`grafana-onelab`** is created by the umbrella chart (`observability/templates/ingress-grafana-onelab.yaml`), Traefik + cert-manager, matching the OneLab web UI pattern in `gitops/values/k3s-example.yaml`:
|
||||
|
||||
- Host: **`grafana.k8s.selair.it`** — edit `grafanaOnelabIngress` and `grafana.ini.server` in `gitops/observability/values.yaml` together.
|
||||
- TLS Secret: **`grafana-tls-k8s-selair`** (cert-manager with `letsencrypt-prod`).
|
||||
|
||||
Point DNS at your ingress, sync the app, then open `https://<grafana-host>/` (user `admin` until you change values).
|
||||
|
||||
For debugging without DNS:
|
||||
|
||||
```bash
|
||||
kubectl -n onelab port-forward svc/onelab-obs-grafana 3000:80
|
||||
```
|
||||
|
||||
### Maintainers: vendored chart dependencies
|
||||
|
||||
The observability umbrella vendors upstream charts under `gitops/observability/charts/*.tgz` so **Argo CD** can render without relying on live Helm repo access at sync time.
|
||||
|
||||
When bumping Loki / Promtail / Grafana versions, from `gitops/observability/` run:
|
||||
|
||||
```bash
|
||||
helm dependency update
|
||||
```
|
||||
|
||||
Commit the updated `Chart.lock` and `charts/*.tgz` with your Git change. This is **repository packaging**, not an alternative install path — deploy still happens only via Argo CD.
|
||||
|
||||
### OneLab `logs.path`
|
||||
|
||||
The OneLab chart sets `onelab.logs.path: "/logs"` in the generated configuration so application file logs match the `/logs` volume mount (see Enterprise guide §7.2).
|
||||
|
||||
## kubectl / credentials
|
||||
|
||||
If `kubectl` reports *You must be logged in*, refresh your kubeconfig (e.g. copy `/etc/rancher/k3s/k3s.yaml` from the server or re-run your auth plugin) before applying manifests.
|
||||
|
||||
## Private Git + registry
|
||||
|
||||
See [docs/BOOTSTRAP.md](docs/BOOTSTRAP.md) for Argo CD access to `git.luneski.fr` and `docker-registry` for `hub.andrewalliance.com`.
|
||||
|
||||
## Helm note (Windows)
|
||||
|
||||
Helm 3.19 may return empty content for `.Files.Get` on Windows; this chart uses `fromYaml (.Files.AsConfig)` as a workaround so packaged files still render correctly.
|
||||
If `kubectl` reports *You must be logged in*, refresh your kubeconfig (e.g. k3s `/etc/rancher/k3s/k3s.yaml` on the server or your auth plugin) before applying manifests.
|
||||
|
||||
## Application configuration (`configurations.yml`)
|
||||
|
||||
Do **not** need to edit `app/configurations.yml` in Git for Kubernetes. The chart builds `configurations.yml` from `charts/onelab/files/configurations.gotmpl` and stores it in Secret **`onelab-configurations`** (mounted by app pods and `ldap-worker`).
|
||||
You do not need to edit [`app/configurations.yml`](../app/configurations.yml) in Git for Kubernetes. The chart renders `configurations.yml` from [`charts/onelab/files/configurations.gotmpl`](charts/onelab/files/configurations.gotmpl) into Secret **`onelab-configurations`**.
|
||||
|
||||
1. **Values (recommended)** — set `onelab.compliance.enabled`, `onelab.ldap.enabled`, and related fields. See `values/instance-overrides.example.yaml`. Point Helm/Argo at an extra values file for your site (Argo: add another path under `spec.source.helm.valueFiles`, relative to the chart directory).
|
||||
2. **Bring your own Secret** — set `configuration.existingSecretName` to a Secret you manage (SealedSecrets, External Secrets, `kubectl create secret ... --from-file=configurations.yml=...`). The chart will **not** create `onelab-configurations` in that case; the Secret must contain key **`configurations.yml`**.
|
||||
1. **Values (recommended)** — set `onelab.compliance`, `onelab.ldap`, etc. See [`values/instance-overrides.example.yaml`](values/instance-overrides.example.yaml). Add extra paths under **`spec.sources[].helm.valueFiles`** for the `gitops/charts/onelab` source (paths relative to `gitops/charts/onelab`).
|
||||
2. **Bring your own Secret** — set `configuration.existingSecretName`; the Secret must contain key **`configurations.yml`**.
|
||||
|
||||
A **ConfigMap** alone is fine if you mount it yourself, but this chart expects a **Secret** for the config file (same as Swarm-style sensitivity). LDAP TLS file paths in values are container paths; mount PEMs with extra volumes on `ldap-worker` if you use them.
|
||||
LDAP TLS paths in values are container paths; mount PEMs on `ldap-worker` if required.
|
||||
|
||||
## Ingress (web UI)
|
||||
|
||||
Enable `ingress.enabled` and set `ingress.host` (and optional TLS). Traffic is sent to Service **`revproxy`** (internal nginx). On k3s, `ingress.className: traefik` matches the default controller.
|
||||
Set `ingress.enabled`, `ingress.host`, and optional TLS in values. Traffic goes to Service **`revproxy`**. On k3s, `ingress.className: traefik` matches the default controller. For cert-manager, set `ingress.tls`, `ingress.tlsSecretName`, and `ingress.certManager.clusterIssuer`; DNS for `ingress.host` must resolve before ACME runs.
|
||||
|
||||
For **cert-manager**, set `ingress.tls: true`, `ingress.tlsSecretName`, and `ingress.certManager.clusterIssuer` (e.g. `letsencrypt-prod`). Ensure a **DNS A/CNAME** for `ingress.host` points to your ingress before the ACME challenge runs.
|
||||
## Developer note (local render)
|
||||
|
||||
Running **`helm template` on Windows** against some paths can return empty `.Files.Get` content; the OneLab chart uses `fromYaml (.Files.AsConfig)` where needed. **Argo CD runs on Linux** and renders the same charts in-cluster — this is a local-tooling caveat, not a second deploy path.
|
||||
|
||||
## Not migrated in this chart
|
||||
|
||||
- **Edge proxy stack** (`app/proxy/docker-compose.yml`, host 80/443 Swarm mode) — replaced for K8s by this **Ingress** + `revproxy`; optional **cert-manager** for TLS at the Ingress.
|
||||
- **Swarm-only secrets** (e.g. `ssl_passphrase`) — handle via Kubernetes Secrets or external operators.
|
||||
- **Edge proxy stack** (`app/proxy/docker-compose.yml`, host 80/443 Swarm) — use **Ingress** + `revproxy` and optional cert-manager.
|
||||
- **Swarm-only secrets** (e.g. `ssl_passphrase`) — use Kubernetes Secrets or external operators.
|
||||
|
||||
Reference in New Issue
Block a user