In GKE, the Reclaim Policy of my PersistentVolume is set to Retain, in order to prevent unintentional data removal. However, sometimes, after the deletion of some PersistentVolumes, I'd like to remove the associated Google Persistent Disks manually. Deleting the Google Persistent Disks using the web UI (i.e. Google Cloud Console) is time-consuming, that's why I'd like to use a gcloud command to remove all Google Persistent Disks that are not attached to a GCP VM instance. Could somebody please provide me this command?
This one should work:
gcloud compute disks delete $(gcloud compute disks list --filter="-users:*" --format "value(uri())")
You can use the gcloud compute disks delete command in cloud shell to delete all the disks that are not attached in a gcp vm instance.
gcloud compute disks delete DISK_NAME [DISK_NAME …] [--region=REGION | --zone=ZONE] [GCLOUD_WIDE_FLAG …]
you can provide disknames through this command to delete them.
disk delete
The link that #sandeep-mohanty includes suggests that only non-attached disks are deleted by the command.
Assuming (!) that to be true (check before you delete), you can enumerate a project's disks and then delete the (not attached) disks with:
PROJECT=[[YOUR-PROJECT]]
# Get PAIRS (NAME,ZONE) for all disk in ${PROJECT}
# Using CSV (e.g. my-disk,my-zone) enables IFS parsing (below)
PAIRS=$(\
gcloud compute disks list \
--project=${PROJECT} \
--format="csv[no-heading](name,zone.scope())")
# Iterate over the PAIRS
for PAIR in ${PAIRS}
do
# Extract values of NAME,ZONE from PAIR
IFS=, read NAME ZONE <<< ${PAIR}
# Describe
printf "Attempting to delete disk: %s [%s]\n" ${NAME} ${ZONE}
# Deleting a disks should only succeed if not attached
gcloud compute disks delete ${NAME} \
--zone=${ZONE} \
--project=${PROJECT} \
--quiet
done
NOTE In the unlikely event that Google changes the semantics of gcloud compute disks delete to delete attached disks, this script will delete every disk in the project.
Related
I have a home Kubernetes cluster with multiple SSDs attached to one of the nodes.
I currently have one persistence volume per mounted disk. Is there an easy way to create a persistence volume that can access data from multiple disks? I thought about symlink but it doesn't seem to work.
You would have to combine them at a lower level. The simplest approach would be Linux LVM but there's a wide range of storage strategies. Kubernetes orchestrates mounting volumes but it's not a storage management solution itself, just the last-mile bits.
As already mentioned by coderanger Kubernetes does not manage your storage at lower level. While with cloud solutions there might some provisioners that will do some of the work for you with bare metal there isn't.
The closest thing that help you manage local storage is Local-volume-static-provisionner.
The local volume static provisioner manages the PersistentVolume
lifecycle for pre-allocated disks by detecting and creating PVs for
each local disk on the host, and cleaning up the disks when released.
It does not support dynamic provisioning.
Have a look at this article for more example it.
I have a trick which is working for me.
You can mount these disks at a directory like /disks/, and then make a loop filesystem, mounted it, and make a symbol link from disks to the loop filesystem.
for example:
touch ~/disk-bunch1 && truncate -s 32M ~/disk-bunch1 && mke2fs -t ext4 -F ~/disk-bunch1
mount it and make a symbol link from disks to the loop filesystem:
mkdir -p /local-pv/bunch1 && mount ~/disk-bunch1 /local-pv/bunch1
ln -s /disk/disk1 /local-pv/bunch1/disk1
ln -s /disk/disk2 /local-pv/bunch1/disk2
Finally, use sig-storage-local-static-provisioner, modify the "hostDir" to "/local-pv" in the values.yaml and deploy the provisioner. And then, you could make a pod use multiple disks.
But this method have a drawback, when you run "kubectl get pv", the CAPACITY is just the size of the loop filesystem instead of the sum of several disk capacities...
By the way, this method, is not recommended ... You'd better think of such as raid0 or lvm and etc...
I created a schedule configuration inside my Gcloud project to create snapshots of a bunch of virtual disks.
Now I want to add my schedule configuration to my disks, but I dont know how to do it in a automated way, because I have more than 1200 disks.
I tryed to use a POD with a cron inside, but I cannot execute the kubectl command to list all my persistent volumes:
kubectl describe pv | grep "Name" | awk 'NR % 2 == 1' | awk '{print $2}'
I want to use this list with the next command in a Loop to add automatically my programmed schedule to my disks:
gcloud compute disks add-resource-policies [DISK_NAME] --resource-policies [SCHEDULE_NAME] --zone [ZONE]
Thanks in advance for your help.
Edit 1: After some comments I changed my code to add a Kubernetes CronJob, but the result is the same, the code doesn't work (the pod is created, but it gives me an error: ImagePullBackOff):
resource "kubernetes_cron_job" "schedulerdemo" {
metadata {
name = "schedulerdemo"
}
spec {
concurrency_policy = "Replace"
failed_jobs_history_limit = 5
schedule = "*/5 * * * *"
starting_deadline_seconds = 10
successful_jobs_history_limit = 10
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 10
template {
metadata {}
spec {
container {
name = "scheduler"
image = "imgscheduler"
command = ["/bin/sh", "-c", "date; kubectl describe pv | grep 'Name' | awk 'NR % 2 == 1' | awk '{print $2}'"]
}
}
}
}
}
}
}
Answering the comment:
Ok, shame on me, wrong image name. Now I have an error in the Container Log: /bin/sh: kubectl: not found
It means that the image that you are using doesn't have kubectl installed (or it's not in the PATH). You can use image: google/cloud-sdk:latest. This image already have cloud-sdk installed which includes:
gcloud
kubectl
To run a CronJob that will get the information about PV's and change the configuration of GCP storage you will need following accesses:
Kubernetes/GKE API(kubectl) - ServiceAccount with a Role and RoleBinding.
GCP API (gcloud) - Google Service account with IAM permissions for storage operations.
I found this links helpful when assigning permissions to list PV's:
Kubernetes.io: RBAC
Success.mirantis.com: Article: User unable to list persistent volumes
The recommended way to assign specific permissions for GCP access:
Workload Identity is the recommended way to access Google Cloud services from applications running within GKE due to its improved security properties and manageability.
-- Cloud.google.com: Kubernetes Engine: Workload Identity: How to
I encourage you to read documentation I linked above and check other alternatives.
As for the script used inside of a CronJob. You should look for pdName instead of Name as the pdName is representation of the gce-pd disk in GCP (assuming that we are talking about in-tree plugin).
You will have multiple options to retrieve the disk name from the API to use it in the gcloud command.
One of the options:
kubectl get pv -o yaml | grep "pdName" | cut -d " " -f 8 | xargs -n 1 gcloud compute disks add-resource-policies --zone=ZONE --resource-policies=POLICY
Disclaimer!
Please treat above command only as an example.
Above command will get the PDName attribute from the PV's and iterate with each of them in the command after xargs.
Some of the things to take into consideration when creating a script/program:
Running this command more than once on a single disk will issue an error that you cannot assign multiple policies. You could have a list of already configured disks that do not require assigning a policy.
Consider using .spec.concurrencyPolicy: Forbid instead of Replace. Replaced CronJob will start from the beginning iterating over all of those disks. Command could not complete in the desired time and CronJob will be replaced.
You will need to check for the correct kubectl version as the official support allows +1/-1 version difference between client and a server (cloud-sdk:latest uses v1.19.3).
I highly encourage you to look on other methods to backup your PVC's (like for example VolumeSnapshots).
Take a look on below links for more reference/ideas:
Stackoverflow.com: Answer: Periodic database backup in kubernetes
Stash.run: Guides: Latest: Volumesnapshot: PVC
Velero.io
It's worth to mention that:
CSI drivers are the future of storage extension in Kubernetes. Kubernetes has announced that the in-tree volume plugins are expected to be removed from Kubernetes in version 1.21. For details, see Kubernetes In-Tree to CSI Volume Migration Moves to Beta. After this change happens, existing volumes using in-tree volume plugins will communicate through CSI drivers instead.
-- Cloud.google.com: Kubernetes Engine: Persistent Volumes: GCE PD CSI Driver: Benefits of using
Switching to CSI plugin for your StorageClass will allow you to use Volume Snapshots inside of GKE:
Volume snapshots let you create a copy of your volume at a specific point in time. You can use this copy to bring a volume back to a prior state or to provision a new volume.
-- Cloud.google.com: Kubernetes Engine: Persistent Volumes: Volume snaphosts: How to
Additional resources:
Cloud.google.com: Kubernetes Engine: Persistent Volumes
Cloud.google.com: Kubernetes Engine: Cronjobs: How to
Terraform.io: Kubernetes: CronJob
Cloud.google.com: Compute: Disks: Create snapshot
I have a bucket folder in Google Cloud with about 47GB of data in it. I start a new Kubernetes StatefulSet (in my Google Cloud Kubernetes cluster). The first thing that the container inside the StatefulSet does is to use gsutil -m rsync -r gs://<BUCKET_PATH> <LOCAL_MOUNT_PATH> to sync the bucket folder contents to a locally mounted folder, which corresponds to a Kubernetes Persistent Volume. The Persistent Volume Claim for this StatefulSet requests 125Gi of storage and is only used for this rsync. But the gsutil sync eventually hits a wall where the pod runs out of disk space (space in the Persistent Volume) and gsutil throws an error: [Errno 28] No space left on device. This is weird, because I only need to copy 47GB of data over from the bucket, but the Persistent Volume should have 125Gi of storage available.
I can confirm the Persistent Volume Claim and the Persistent Volume have been provisioned with the appropriate sizes by using kubectl get pvc and kubectl get pv. If I run df -h inside the pod (kubectl exec -it <POD_NAME> -- df -h) I can see that the mounted path exists and that it has the expected size (125Gi). Using df -h during the sync I can see that it does indeed take up all the available space in the Persistent Volume when it finally hits No space left on device.
Further, if I provision a Persistent Volume of 200Gi and retry the sync, it finishes successfully and df -h shows that the used space in the Persistent Volume is 47GB, as expected (this is after gsutil rsync is completed).
So it seems that gsutil rsync uses far more space while syncing than I would expect. Why is this? Is there a way to change how gsutil rsync is done so that it doesn't require a larger Persistent Volume than necessary?
It should be noted that there are a lot of individual files, and that the pod is restarted about 8 times during the sync.
rsync will transfer contents to a temporary file in the target folder first. If it succeeds then it will rename the file to become the target file. If the transfer fails, the temporary file will be deleted. You could try adding --inplace flag to the command, according to the link: “This option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.”
I'm creating Kubernetes clusters programmatically for end-to-end tests in GitLab CI/CD. I'm using gcloud container clusters create. I'm doing this for half a year and created and deleted a few hundred clusters. The cost went up and down. Now, I got an unusually high bill from Google and I checked the cost breakdown. I noticed that the cost is >95% for "Storage PD Capacity". I found out that gcloud container clusters delete never deleted the Google Compute Disks created for Persistent Volume Claims in the Kubernetes cluster.
How can I delete those programmatically? What else could be left running after deleting the Kubernetes cluster and the disks?
Suggestions:
To answer your immediate question: you can programatically delete your disk resource(s) with the Method: disks.delete API.
To determine what other resources might have been allocated, look here: Listing all Resources in your Hierarchy.
Finally, this link might also help: GKE: Understanding cluster resource usage
Because this part of the answer is lengthy:
gcloud compute disks create disk-a \
--size=10gb \
--zone=us-west1-a \
--labels=something=monday \
--project=${PROJECT}
gcloud compute disks create disk-b \
--size=10gb \
--zone=us-west1-b \
--labels=something=else \
--project=${PROJECT}
Then:
ID=$(gcloud compute disks list \
--filter="name~disk zone~us-west1 labels.something=else" \
--format="value(id)" \
--project=${PROJECT}) && echo ${ID}
NB
the filter AND is implicit and omitted
you may remove terms as needed
you should make the filter as specific as possible
And -- when you're certain as deletion is irrecoverable:
gcloud compute disks delete ${ID} --project=${PROJECT} --region=${REGION}
If there are multiple matches, you can iterate:
IDS=$(gcloud compute disks list ...)
for ID in ${IDS}
do
gcloud compute disks delete ${ID}
done
If you prefer -- the awesome jq, you'll have a general-purpose way (not gcloud-specific):
gcloud compute disks list \
--project=${PROJECT} \
--format=json \
| jq --raw-output '.[] | select(.name | contains("disk")) | select(.zone | contains("us-west1")) | select(.labels.something=="else")'
...
When creating various Kubernetes objects in GKE, associated GCP resources are automatically created. I'm specifically referring to:
forwarding-rules
target-http-proxies
url-maps
backend-services
health-checks
These have names such as k8s-fw-service-name-tls-ingress--8473ea5ff858586b.
After deleting a cluster, these resources remain. How can I identify which of these are still in use (by other Kubernetes objects, or another cluster) and which are not?
There is no easy way to identify which added GCP resources (LB, backend, etc.) are linked to which cluster. You need to manually go into these resources to see what they are linked to.
If you delete a cluster with additional resources attached, you have to also manually delete these resources as well. At this time, I would suggest taking note of which added GCP resources are related to which cluster, so that you will know which resources to delete when the time comes to deleting the GKE cluster.
I would also suggest to create a feature request here to request for either a more defined naming convention for additional GCP resources being created linked to a specific cluster and/or having the ability to automatically delete all additonal resources linked to a cluster when deleting said cluster.
I would recommend you to look at https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/14-cleanup.md
You can easily delete all the objects by using the google cloud sdk in the following manner :
gcloud -q compute firewall-rules delete \
kubernetes-the-hard-way-allow-nginx-service \
kubernetes-the-hard-way-allow-internal \
kubernetes-the-hard-way-allow-external \
kubernetes-the-hard-way-allow-health-check
{
gcloud -q compute routes delete \
kubernetes-route-10-200-0-0-24 \
kubernetes-route-10-200-1-0-24 \
kubernetes-route-10-200-2-0-24
gcloud -q compute networks subnets delete kubernetes
gcloud -q compute networks delete kubernetes-the-hard-way
gcloud -q compute forwarding-rules delete kubernetes-forwarding-rule \
--region $(gcloud config get-value compute/region)
gcloud -q compute target-pools delete kubernetes-target-pool
gcloud -q compute http-health-checks delete kubernetes
gcloud -q compute addresses delete kubernetes-the-hard-way
}
This assumes you named your resources 'kubernetes-the-hard-way', if you do not know the names, you can also use various filter mechanisms to filter resources by namespaces etc to remove these.