Automating Etcd Backups for Talos with S3 and ArgoCD

Automating Talos etcd Backups with Scheduled Jobs and ArgoCD
Setting up a reliable and automated etcd backup for Talos is simpler than it seems if you break it down properly. The key? Using a combination of Kubernetes CronJobs, External Secrets, and ArgoCD for seamless deployment and management.
In this guide, I’ll show you how I configured my Talos etcd backups to run on a schedule and store them in an S3-compatible storage (in this case, MinIO). The entire setup is managed declaratively using Kustomize and ArgoCD, ensuring everything stays under GitOps control.
Folder Structure
Here’s how I structured my manifests:
1talos-backup/
2├── cronjob.yaml
3├── kustomization.yaml
4├── serviceaccount.yaml
5└── talosbackup-creds.yaml
Each file serves a specific purpose:
- cronjob.yaml – Defines the scheduled job to trigger the backup.
- serviceaccount.yaml – Grants the necessary permissions for backups.
- talosbackup-creds.yaml – Stores AWS credentials securely using External Secrets.
- kustomization.yaml – Ties everything together for deployment.
Step 1: Create the Service Account
Before we can schedule backups, we need to ensure the job has the correct permissions. That’s where the Service Account comes in.
1apiVersion: talos.dev/v1alpha1
2kind: ServiceAccount
3metadata:
4 name: talos-backup-secrets
5 namespace: talos-backup
6spec:
7 roles:
8 - os:etcd:backup
This allows the job to interact with Talos and trigger etcd backups. Without this, the backup process won’t have the necessary access.
Step 2: Define the Scheduled Backup Job
The actual backup process is handled by a Kubernetes CronJob
, which executes every 6 hours. Here’s what it looks like:
1apiVersion: batch/v1
2kind: CronJob
3metadata:
4 name: talos-backup
5 namespace: talos-backup
6spec:
7 concurrencyPolicy: Forbid
8 schedule: "0 */6 * * *"
9 jobTemplate:
10 spec:
11 template:
12 spec:
13 containers:
14 - name: talos-backup
15 image: ghcr.io/siderolabs/talos-backup:v0.1.0-beta.2-1-g9ccc125
16 command: ["/talos-backup"]
17 env:
18 - name: AWS_ACCESS_KEY_ID
19 value: talos-etcd-backup-account
20 - name: AWS_SECRET_ACCESS_KEY
21 valueFrom:
22 secretKeyRef:
23 name: talos-backup-creds
24 key: bucket-secret
25 - name: AWS_REGION
26 value: us-west-2
27 - name: CUSTOM_S3_ENDPOINT
28 value: http://192.168.0.20:9000
29 - name: BUCKET
30 value: talos-backups
31 - name: CLUSTER_NAME
32 value: homeOps
33 - name: DISABLE_ENCRYPTION
34 value: "true"
35 restartPolicy: OnFailure
Key Aspects:
- Job Frequency: Runs every 6 hours (
0 */6 * * *
). - Backup Image: Uses
ghcr.io/siderolabs/talos-backup
. - AWS Credentials: Pulled from an External Secret to avoid hardcoding.
- S3 Storage: Configured to store backups in MinIO (adjust if using AWS S3).
- Failure Handling:
restartPolicy: OnFailure
ensures a retry if something goes wrong.
Step 3: Securely Managing Secrets with External Secrets
Storing credentials directly in Kubernetes manifests is a bad idea. Instead, I use External Secrets to fetch credentials from Bitwarden (you can adapt this to HashiCorp Vault, AWS Secrets Manager, or any other provider).
1apiVersion: external-secrets.io/v1beta1
2kind: ExternalSecret
3metadata:
4 annotations:
5 argocd.argoproj.io/hook: PreSync
6 argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
7 argocd.argoproj.io/sync-wave: "-1"
8 name: talos-backup-creds
9 namespace: talos-backup
10spec:
11 data:
12 - remoteRef:
13 key: talos-backup-bucket-secret
14 secretKey: bucket-secret
15 refreshInterval: "0"
16 secretStoreRef:
17 kind: ClusterSecretStore
18 name: bitwarden-cluster-secretsmanager
Why Use External Secrets?
- No Hardcoded Secrets: Sensitive credentials stay out of your Git repository.
- Automated Rotation: If your credentials change, updates are applied without manual intervention.
- Ensures Availability: ArgoCD’s
sync-wave: "-1"
guarantees secrets exist before jobs run.
Step 4: Deploy Everything with Kustomize
With all manifests ready, kustomization.yaml
stitches everything together:
1resources:
2- cronjob.yaml
3- serviceaccount.yaml
4- talosbackup-creds.yaml
You can apply it manually using:
1kubectl apply -k talos-backup/
Or, if using ArgoCD, push it to your Git repository and let ArgoCD handle the deployment.
Wrapping Up
This setup ensures that your Talos etcd backups are scheduled, secure, and managed declaratively. A few final tips:
- Test Restores: A backup is useless if you can’t restore it. Regularly verify your backups by running restores in a test environment.
- Monitor Jobs: Keep an eye on job executions to catch failures early (
kubectl get cronjobs -n talos-backup
). - Review Storage Costs: If using AWS, monitor S3 storage usage to avoid unnecessary expenses.
With this approach, you get automated etcd backups without manual intervention, ensuring your cluster stays resilient and your data remains safe.