How to clean up after a GKE cluster created with gcloud container clusters create? - kubernetes

I'm creating Kubernetes clusters programmatically for end-to-end tests in GitLab CI/CD. I'm using gcloud container clusters create. I'm doing this for half a year and created and deleted a few hundred clusters. The cost went up and down. Now, I got an unusually high bill from Google and I checked the cost breakdown. I noticed that the cost is >95% for "Storage PD Capacity". I found out that gcloud container clusters delete never deleted the Google Compute Disks created for Persistent Volume Claims in the Kubernetes cluster.
How can I delete those programmatically? What else could be left running after deleting the Kubernetes cluster and the disks?

Suggestions:
To answer your immediate question: you can programatically delete your disk resource(s) with the Method: disks.delete API.
To determine what other resources might have been allocated, look here: Listing all Resources in your Hierarchy.
Finally, this link might also help: GKE: Understanding cluster resource usage

Because this part of the answer is lengthy:
gcloud compute disks create disk-a \
--size=10gb \
--zone=us-west1-a \
--labels=something=monday \
--project=${PROJECT}
gcloud compute disks create disk-b \
--size=10gb \
--zone=us-west1-b \
--labels=something=else \
--project=${PROJECT}
Then:
ID=$(gcloud compute disks list \
--filter="name~disk zone~us-west1 labels.something=else" \
--format="value(id)" \
--project=${PROJECT}) && echo ${ID}
NB
the filter AND is implicit and omitted
you may remove terms as needed
you should make the filter as specific as possible
And -- when you're certain as deletion is irrecoverable:
gcloud compute disks delete ${ID} --project=${PROJECT} --region=${REGION}
If there are multiple matches, you can iterate:
IDS=$(gcloud compute disks list ...)
for ID in ${IDS}
do
gcloud compute disks delete ${ID}
done
If you prefer -- the awesome jq, you'll have a general-purpose way (not gcloud-specific):
gcloud compute disks list \
--project=${PROJECT} \
--format=json \
| jq --raw-output '.[] | select(.name | contains("disk")) | select(.zone | contains("us-west1")) | select(.labels.something=="else")'
...

Related

Filter GCP IAM policies by description

I'm trying to filter on IAM policies attached to a service account
gcloud iam service-accounts get-iam-policy foo#bar.iam.gserviceaccount.com --project foobar --filter="description:'*migration'"
This throws
WARNING: The following filter keys were not present in any resource : description
Is there a way to filter based on description of IAM policy?
EDIT: My usecase is to filter IAM policies by cluster. For SQL instances, I added a label with the cluster name but for policies returned by command gcloud iam service-accounts we don't have labels so I decided to add the cluster to the description and filter the cluster name through description.
See my comment above, it's unclear what you're trying to filter by.
Here's a specific solution for your stated requirement. Per my comment, it's a curious approach and I don't understand why you'd want to do this but assuming that you really do:
# List SOURCE Policy's Service Account emails
SOURCE="[SERVICE-ACCOUNT-EMAIL]"
EMAILS=$(\
gcloud iam service-accounts get-iam-policy ${SOURCE} \
--flatten="bindings[].members[]" \
--filter="bindings.members:serviceAccount" \
--format="value(bindings.members.split(\":\").slice(1:))" \
|sort | uniq)
# Search pattern
FILTER="*migration"
# Iterate over them
for EMAIL in ${EMAILS}
do
# Extract the Service Account's description
DESCRIPTION=$(\
gcloud iam service-accounts describe ${EMAIL} \
--format="value(description)")
if [[ ${DESCRIPTION} == ${FILTER} ]]
then
print "[%s] %s" ${EMAIL} ${DESCRIPTION}
fi
done
Here's a solution that can be applied generally to filtering gcloud.
gcloud is complex in its use of --format, --filter and --flatten (see below).
--filter is only (!?) useful when you have multiple resources from which to filter (generally from list rather than get or describe) commands.
It's difficult to see this with the default output but, if you use e.g. --format=json, you'll immediately see that gcloud ... get returns one resource whereas gcloud ... list returns a list ([]).
The latter can be filtered.
There are various solutions:
gcloud
The advantage of using gcloud is one tool to rool them all.
Use --flatten to convert a single resource into a list so that you can --filter:
ROLE="roles/owner"
FILTER="bindings.role=\"${ROLE}\""
# Return all matching members
FORMAT="value(bindings.members[])"
# Return 1st matching member
FORMAT="value(bindings.members[0])"
# Return 1st matching member's email address
FORMAT="value(bindings.members[0].split(\":\").slice(1:))"
gcloud iam service-accounts get-iam-policy ${EMAIL} \
--project=${PROJECT} \
--flatten="bindings[]" \
--filter="${FILTER}" \
--format="${FORMAT}"
jq
The UNIX philosophy is tools that do one thing and do it well. jq is an excellent JSON (!) processing tool and combines well with gcloud --format=json
FILTER="
.bindings[]
|select(.role==\"${ROLE}\").members[0]
|split(\":\")[1]
"
gcloud iam service-accounts get-iam-policy ${EMAIL} \
--project=${PROJECT} \
--format=json \
| jq -r "${FILTER}"

Delete all unattached Google Persistent Disks in a project

In GKE, the Reclaim Policy of my PersistentVolume is set to Retain, in order to prevent unintentional data removal. However, sometimes, after the deletion of some PersistentVolumes, I'd like to remove the associated Google Persistent Disks manually. Deleting the Google Persistent Disks using the web UI (i.e. Google Cloud Console) is time-consuming, that's why I'd like to use a gcloud command to remove all Google Persistent Disks that are not attached to a GCP VM instance. Could somebody please provide me this command?
This one should work:
gcloud compute disks delete $(gcloud compute disks list --filter="-users:*" --format "value(uri())")
You can use the gcloud compute disks delete command in cloud shell to delete all the disks that are not attached in a gcp vm instance.
gcloud compute disks delete DISK_NAME [DISK_NAME …] [--region=REGION | --zone=ZONE] [GCLOUD_WIDE_FLAG …]
you can provide disknames through this command to delete them.
disk delete
The link that #sandeep-mohanty includes suggests that only non-attached disks are deleted by the command.
Assuming (!) that to be true (check before you delete), you can enumerate a project's disks and then delete the (not attached) disks with:
PROJECT=[[YOUR-PROJECT]]
# Get PAIRS (NAME,ZONE) for all disk in ${PROJECT}
# Using CSV (e.g. my-disk,my-zone) enables IFS parsing (below)
PAIRS=$(\
gcloud compute disks list \
--project=${PROJECT} \
--format="csv[no-heading](name,zone.scope())")
# Iterate over the PAIRS
for PAIR in ${PAIRS}
do
# Extract values of NAME,ZONE from PAIR
IFS=, read NAME ZONE <<< ${PAIR}
# Describe
printf "Attempting to delete disk: %s [%s]\n" ${NAME} ${ZONE}
# Deleting a disks should only succeed if not attached
gcloud compute disks delete ${NAME} \
--zone=${ZONE} \
--project=${PROJECT} \
--quiet
done
NOTE In the unlikely event that Google changes the semantics of gcloud compute disks delete to delete attached disks, this script will delete every disk in the project.

Identify redundant GCP resources created by Kubernetes

When creating various Kubernetes objects in GKE, associated GCP resources are automatically created. I'm specifically referring to:
forwarding-rules
target-http-proxies
url-maps
backend-services
health-checks
These have names such as k8s-fw-service-name-tls-ingress--8473ea5ff858586b.
After deleting a cluster, these resources remain. How can I identify which of these are still in use (by other Kubernetes objects, or another cluster) and which are not?
There is no easy way to identify which added GCP resources (LB, backend, etc.) are linked to which cluster. You need to manually go into these resources to see what they are linked to.
If you delete a cluster with additional resources attached, you have to also manually delete these resources as well. At this time, I would suggest taking note of which added GCP resources are related to which cluster, so that you will know which resources to delete when the time comes to deleting the GKE cluster.
I would also suggest to create a feature request here to request for either a more defined naming convention for additional GCP resources being created linked to a specific cluster and/or having the ability to automatically delete all additonal resources linked to a cluster when deleting said cluster.
I would recommend you to look at https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/14-cleanup.md
You can easily delete all the objects by using the google cloud sdk in the following manner :
gcloud -q compute firewall-rules delete \
kubernetes-the-hard-way-allow-nginx-service \
kubernetes-the-hard-way-allow-internal \
kubernetes-the-hard-way-allow-external \
kubernetes-the-hard-way-allow-health-check
{
gcloud -q compute routes delete \
kubernetes-route-10-200-0-0-24 \
kubernetes-route-10-200-1-0-24 \
kubernetes-route-10-200-2-0-24
gcloud -q compute networks subnets delete kubernetes
gcloud -q compute networks delete kubernetes-the-hard-way
gcloud -q compute forwarding-rules delete kubernetes-forwarding-rule \
--region $(gcloud config get-value compute/region)
gcloud -q compute target-pools delete kubernetes-target-pool
gcloud -q compute http-health-checks delete kubernetes
gcloud -q compute addresses delete kubernetes-the-hard-way
}
This assumes you named your resources 'kubernetes-the-hard-way', if you do not know the names, you can also use various filter mechanisms to filter resources by namespaces etc to remove these.

How do I use gcloud to find which regions my dataproc clusters are in?

If I issue gcloud dataproc clusters list 0 clusters are listed:
$ gcloud dataproc clusters list
Listed 0 items.
However if I specify the region gcloud dataproc clusters list --region europe-west1 I get back a list of clusters:
$ gcloud dataproc clusters list --region europe-west1
NAME WORKER_COUNT STATUS ZONE
mydataproccluster1 2 RUNNING europe-west1-d
mydataproccluster2 2 RUNNING europe-west1-d
I'm guessing that the inability to get a list of clusters without specifying --region is a consequence of a decision made by my org's administrators however I'm hoping there is a way around it. I can visit https://console.cloud.google.com/ and see a list of all the clusters in the project, can I get the same using gcloud? Having to visit https://console.cloud.google.com/ just so I can issue gcloud dataproc clusters list --region europe-west1 seems a bit of a limitation.
The underlying regional services are by-design isolated from each other such that there's no single URL that returns the combined list (because that would be a global dependency and failure mode), and unfortunately, at the moment the layout of the gcloud libraries is such that there's no option for specifying a list of regions or shorthand for "all regions" when listing dataproc clusters or jobs.
However, you can work around this by obtaining the list of possible regional stacks from the Compute API:
gcloud compute regions list --format="value(name)" | \
xargs -n 1 gcloud dataproc clusters list --region
The only dataproc region that doesn't match up to one of the Compute regions is the special "global" Dataproc region, which is a separate Dataproc service that spans all compute regions.
For convenience you can also just add global to a for-loop:
for REGION in global $(gcloud compute regions list --format="value(name)"); do gcloud dataproc clusters list --region ${REGION}; done
Having to specify --region is how Dataproc command group in gcloud works. Developers Console issues lists requests against all regions (you could request for gcloud to do the same).
Alternatively, you can use the global mutiregion (which is the gcloud default). This will interact well with your organization policies. If your Organization has region-restricted VM locations you will be able to create VMs in europe but will get an error when doing so elsewhere).

Container Engine is temporarily out of capacity

I need add a new pool to my K8S cluster, but I got the error (gcloud.container.node-pools.create) ResponseError: code=503, message=Container Engine is temporarily out of capacity in us-central1-c. Please try a different zone or try again later.
If I try to create in another zone, like us-central1-b, fail because my K8S are on us-central1-c.
gcloud container node-pools create redis-pool \
--cluster=my-kube-cluster \
--image-type=COS \
--machine-type=n1-highmem-2 \
--node-labels=pool=redis \
--zone=us-central1-c \
--project=my-project-id \
--num-nodes=1
How to fix it?
This message https://groups.google.com/forum/#!topic/gce-discussion/PAtGqxUiE0o was the only report I found but without an answer.
Google Container Engine clusters are zonal resources. This means that they cannot be created or grow in a zone that is unavailable (down or out of capacity), as is the case with us-central1-c above.