Automating Etcd Backups for Talos with S3 and ArgoCD

Etcd Backup Talos

Automating Talos etcd Backups with Scheduled Jobs and ArgoCD

Setting up a reliable and automated etcd backup for Talos is simpler than it seems if you break it down properly. The key? Using a combination of Kubernetes CronJobs, External Secrets, and ArgoCD for seamless deployment and management.

In this guide, I’ll show you how I configured my Talos etcd backups to run on a schedule and store them in an S3-compatible storage (in this case, MinIO). The entire setup is managed declaratively using Kustomize and ArgoCD, ensuring everything stays under GitOps control.


Folder Structure

Here’s how I structured my manifests:

Github Repo

1talos-backup/
2├── cronjob.yaml
3├── kustomization.yaml
4├── serviceaccount.yaml
5└── talosbackup-creds.yaml

Each file serves a specific purpose:

  • cronjob.yaml – Defines the scheduled job to trigger the backup.
  • serviceaccount.yaml – Grants the necessary permissions for backups.
  • talosbackup-creds.yaml – Stores AWS credentials securely using External Secrets.
  • kustomization.yaml – Ties everything together for deployment.

Step 1: Create the Service Account

Before we can schedule backups, we need to ensure the job has the correct permissions. That’s where the Service Account comes in.

1apiVersion: talos.dev/v1alpha1
2kind: ServiceAccount
3metadata:
4  name: talos-backup-secrets
5  namespace: talos-backup
6spec:
7  roles:
8  - os:etcd:backup

This allows the job to interact with Talos and trigger etcd backups. Without this, the backup process won’t have the necessary access.


Step 2: Define the Scheduled Backup Job

The actual backup process is handled by a Kubernetes CronJob, which executes every 6 hours. Here’s what it looks like:

 1apiVersion: batch/v1
 2kind: CronJob
 3metadata:
 4  name: talos-backup
 5  namespace: talos-backup
 6spec:
 7  concurrencyPolicy: Forbid
 8  schedule: "0 */6 * * *"
 9  jobTemplate:
10    spec:
11      template:
12        spec:
13          containers:
14          - name: talos-backup
15            image: ghcr.io/siderolabs/talos-backup:v0.1.0-beta.2-1-g9ccc125
16            command: ["/talos-backup"]
17            env:
18            - name: AWS_ACCESS_KEY_ID
19              value: talos-etcd-backup-account
20            - name: AWS_SECRET_ACCESS_KEY
21              valueFrom:
22                secretKeyRef:
23                  name: talos-backup-creds
24                  key: bucket-secret
25            - name: AWS_REGION
26              value: us-west-2
27            - name: CUSTOM_S3_ENDPOINT
28              value: http://192.168.0.20:9000
29            - name: BUCKET
30              value: talos-backups
31            - name: CLUSTER_NAME
32              value: homeOps
33            - name: DISABLE_ENCRYPTION
34              value: "true"
35          restartPolicy: OnFailure

Key Aspects:

  • Job Frequency: Runs every 6 hours (0 */6 * * *).
  • Backup Image: Uses ghcr.io/siderolabs/talos-backup.
  • AWS Credentials: Pulled from an External Secret to avoid hardcoding.
  • S3 Storage: Configured to store backups in MinIO (adjust if using AWS S3).
  • Failure Handling: restartPolicy: OnFailure ensures a retry if something goes wrong.

Step 3: Securely Managing Secrets with External Secrets

Storing credentials directly in Kubernetes manifests is a bad idea. Instead, I use External Secrets to fetch credentials from Bitwarden (you can adapt this to HashiCorp Vault, AWS Secrets Manager, or any other provider).

 1apiVersion: external-secrets.io/v1beta1
 2kind: ExternalSecret
 3metadata:
 4  annotations:
 5    argocd.argoproj.io/hook: PreSync
 6    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
 7    argocd.argoproj.io/sync-wave: "-1"
 8  name: talos-backup-creds
 9  namespace: talos-backup
10spec:
11  data:
12  - remoteRef:
13      key: talos-backup-bucket-secret
14    secretKey: bucket-secret
15  refreshInterval: "0"
16  secretStoreRef:
17    kind: ClusterSecretStore
18    name: bitwarden-cluster-secretsmanager

Why Use External Secrets?

  • No Hardcoded Secrets: Sensitive credentials stay out of your Git repository.
  • Automated Rotation: If your credentials change, updates are applied without manual intervention.
  • Ensures Availability: ArgoCD’s sync-wave: "-1" guarantees secrets exist before jobs run.

Step 4: Deploy Everything with Kustomize

With all manifests ready, kustomization.yaml stitches everything together:

1resources:
2- cronjob.yaml
3- serviceaccount.yaml
4- talosbackup-creds.yaml

You can apply it manually using:

1kubectl apply -k talos-backup/

Or, if using ArgoCD, push it to your Git repository and let ArgoCD handle the deployment.


Wrapping Up

This setup ensures that your Talos etcd backups are scheduled, secure, and managed declaratively. A few final tips:

  • Test Restores: A backup is useless if you can’t restore it. Regularly verify your backups by running restores in a test environment.
  • Monitor Jobs: Keep an eye on job executions to catch failures early (kubectl get cronjobs -n talos-backup).
  • Review Storage Costs: If using AWS, monitor S3 storage usage to avoid unnecessary expenses.

With this approach, you get automated etcd backups without manual intervention, ensuring your cluster stays resilient and your data remains safe.