Backup Kubernetes PV/PVC to Local Disk w/o using CSI? - kubernetes

I'm looking for to create a local backup for PV/PVC in K8s, then restore. (Not using any CSI)
Have tried VolumeSnapshot in k8s, but it creates a in-cluster backup, and what I need is a local copy, so I can archive it and move around. Also found some 3p tools like Stash/Velero/Kasten, but not sure if any of them fits my target.
Can someone point me to the correct document to look at, or if that's all possible? Thanks!

Looks like the 3rd party tools mentioned by you should be the best fit, especially Velero because as per this post:
Velero is a backup tool not only focused on volumes backups, it also
allows you to backup all your cluster (pods, services, volumes,…) with
a sorting system by labels or Kubernetes objects.
Stash is a tool only focused on volume backups.
To get more information on using Velero and its newest features you can visit the official documentation site and this website.

Related

Kubernetes configMap or persistent volume?

What is the best approach to passing multiple configuration files into a POD?
Assume that we have a legacy application that we have to dockerize and run in a Kubernetes environment. This application requires more than 100 configuration files to be passed. What is the best solution to do that? Create hostPath volume and mount it to some directory containing config files on the host machine? Or maybe config maps allow passing everything as a single compressed file, and then extracting it in the pod volume?
Maybe helm allows somehow to iterate over some directory, and create automatically one big configMap that will act as a directory?
Any suggestions are welcomed
Create hostPath volume and mount it to some directory containing config files on the host machine
This should be avoided.
Accessing hostPaths may not always be allowed. Kubernetes may use PodSecurityPolicies (soon to be replaced by OPA/Gatekeeper/whatever admission controller you want ...), OpenShift has a similar SecurityContextConstraint objects, allowing to define policies for which user can do what. As a general rule: accessing hostPaths would be forbidden.
Besides, hostPaths devices are local to one of your node. You won't be able to schedule your Pod some place else, if there's any outage. Either you've set a nodeSelector restricting its deployment to a single node, and your application would be done as long as your node is. Or there's no placement rule, and your application may restart without its configuration.
Now you could say: "if I mount my volume from an NFS share of some sort, ...". Which is true. But then, you would probably be better using a PersistentVolumeClaim.
Create automatically one big configMap that will act as a directory
This could be an option. Although as noted by #larsks in comments to your post: beware that ConfigMaps are limited in terms of size. While manipulating large objects (frequent edit/updates) could grow your etcd database size.
If you really have ~100 files, ConfigMaps may not be the best choice here.
What next?
There's no one good answer, not knowing exactly what we're talking about.
If you want to allow editing those configurations without restarting containers, it would make sense to use some PersistentVolumeClaim.
If that's not needed, ConfigMaps could be helpful, if you can somewhat limit their volume, and stick with non-critical data. While Secrets could be used storing passwords or any sensitive configuration snippet.
Some emptyDir could also be used, assuming you can figure out a way to automate provisioning of those configurations during container startup (eg: git clone in some initContainer, and/or some shell script contextualizing your configuration based on some environment variables)
If there are files that are not expected to change over time, or whose lifecycle is closely related to that of the application version shipping in your container image: I would consider adding them to my Dockerfile. Maybe even add some startup script -- something you could easily call from an initContainer, generating whichever configuration you couldn't ship in the image.
Depending on what you're dealing with, you could combine PVC, emptyDirs, ConfigMaps, Secrets, git stored configurations, scripts, ...

How to backup PVC regularly

What can be done to backup kubernetes PVC regularly for GCP and AWS?
GCP has VolumeSnapshot but I'm not sure how to schedule it, like every hour or every day.
I also tried Gemini/fairwinds but I get the following error when for GCP. I installed the charts as mentioned in README.MD and I can't find anyone else encountering the same error.
error: unable to recognize "backup-test.yml": no matches for kind "SnapshotGroup" in version "gemini.fairwinds.com/v1beta1"
You can implement Velero, which gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes.
Unfortunately, Velero only allows you to backup & restore PV, not PVCs.
Velero’s restic integration backs up data from volumes by accessing the node’s filesystem, on which the pod is running. For this reason, restic integration can only backup volumes that are mounted by a pod and not directly from the PVC.
Might wanna look into stash.run
Agree with #hdhruna - Velero is really the most popular tool for doing that task.
However, you can also try miracle2k/k8s-snapshots
Automatic Volume Snapshots on Kubernetes
How is it useful? Simply add
an annotation to your PersistentVolume or PersistentVolumeClaim
resources, and let this tool create and expire snapshots according to
your specifications.
Supported Environments:
Google Compute Engine disks,
AWS EBS disks.
I evaluated multiple solutions including k8s CSI VolumeSnapshots, https://stash.run/, https://github.com/miracle2k/k8s-snapshots and CGP disks snapshots.
The best one in my opinion, is using k8s native implementation of snapshots via CSI driver, that is if you have a cluster version > = 1.17. This allows snapshoting volumes while in use, doesn't require having a read many or write many volume like stash.
I chose gemini by fairwinds also to automate backup creation and deletion and restoration and it works like a charm.
I believe your problem is caused by that missing CRD from gemini in your cluster. Verify that the CRD is installed correctly and also that the version installed is indeed the version you are trying to use.
My installation went flawlessly using their install guide with Helm.

How can a file inside a pod be copied to the outside?

I have an audit pod, which has logic to generate a report file. Currently, this file is present in the pod itself. I have only one pod having only one replica.
I know, I can run kubectl cp to copy those files from my pod. This command has to be executed on the Kubernetes node itself, but the task is to copy the file from the pod itself due to many restrictions.
I cannot use a Persistent Volume due to restrictions. I checked the Kubernetes API, but couldn't find anything by which I can do a copy.
Is there another way to copy that file out of the pod?
This is a community wiki answer posted to sum up the whole scenario and for better visibility. Feel free to edit and expand on it.
Taking under consideration all the mentioned restrictions:
not supposed to use the Kubernetes volumes
no cloud storage
pod names not accessible to your user
no sidecar containers
the only workaround for your use case is the one you currently use:
the dynamic PV with the annotations."helm.sh/resource-policy": keep
use PVCs and explicitly mention the user to not to delete the
namespace
If any one has a better idea. Feel free to contribute.

Is it possible to undo kubernetes cluster delete command?

Is it possible to undo "gcloud container clusters delete" command?
Unfortunately not: Deleting a Cluster
All the source volumes and data (that are not persistent) are removed, and unless you made a conscious choice to take a backup of the cluster, it would be a permanent operation.
If a backup does exist, it would be a restore from backup rather than a revert on the delete command.
I suggest reading a bit more into the Administration of a cluster on Gcloud for more info: Administration of Clusters Overview
Unfortunately if you will delete cluster it is impossible to undo this.
In the GCP documentation you can check what will be deleted after gcloud container clusters delete and what will remain after this command.
One of the things which will remain is Persistent disk volumes. It means that if your ClaimPolicy was set to Retain and your PV status is Released you will be able to get data from PersistentVolume. To do that you will have to create PersistentVolumeClain. More info about ReclaimPolicyhere.
Run $ kubectl get pv to check if it is still bound and check ReclaimPolicy. Similar case can be found in this github thread.
In this documentation you can find step by stop how to connect pod to specific PV.
In addition, please note that you can backup your cluster. To do this you can use for example Ark.

Kubernetes: Configuration snapshoting

Is there any configuration snapshot mechanism on kubernetes?
The goal is to take a snapshot of all deployments/services/config-maps etc and apply them to a kubernetes cluster.
The steps that should be taken.
Take a configuration snapshot
Delete the cluster
Create a new cluster
Apply the configuration snapshot to the new cluster
New cluster works like the old one
These are the 3 that spring to mind, with kubed being, at least according to their readme, the closest to your stated goals:
Ark
kubed
kube-backup
I run Ark in my cluster, but (to my discredit) I have not yet attempted to do a D.R. drill using it; I only checked that it is, in fact, making config backups.
State of the kubernetes is stored on etcd, so back up etcd data and restore would be able to restore cluster. But this would not backup any information stored in persistent volumes, that needs to be handled separately.
backup operater provided by coreos is a good option:
https://coreos.com/operators/etcd/docs/latest/user/walkthrough/backup-operator.html
Taking backups with etcdctl :
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/
https://github.com/coreos/etcd/blob/master/etcdctl/README.md
Heptio ark has capability to backup config and also volumes :
https://github.com/heptio/ark
if you want a UI based option, these would be good :
https://github.com/kaptaind/kaptaind
https://github.com/mhausenblas/reshifter