When creating various Kubernetes objects in GKE, associated GCP resources are automatically created. I'm specifically referring to:
forwarding-rules
target-http-proxies
url-maps
backend-services
health-checks
These have names such as k8s-fw-service-name-tls-ingress--8473ea5ff858586b.
After deleting a cluster, these resources remain. How can I identify which of these are still in use (by other Kubernetes objects, or another cluster) and which are not?
There is no easy way to identify which added GCP resources (LB, backend, etc.) are linked to which cluster. You need to manually go into these resources to see what they are linked to.
If you delete a cluster with additional resources attached, you have to also manually delete these resources as well. At this time, I would suggest taking note of which added GCP resources are related to which cluster, so that you will know which resources to delete when the time comes to deleting the GKE cluster.
I would also suggest to create a feature request here to request for either a more defined naming convention for additional GCP resources being created linked to a specific cluster and/or having the ability to automatically delete all additonal resources linked to a cluster when deleting said cluster.
I would recommend you to look at https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/14-cleanup.md
You can easily delete all the objects by using the google cloud sdk in the following manner :
gcloud -q compute firewall-rules delete \
kubernetes-the-hard-way-allow-nginx-service \
kubernetes-the-hard-way-allow-internal \
kubernetes-the-hard-way-allow-external \
kubernetes-the-hard-way-allow-health-check
{
gcloud -q compute routes delete \
kubernetes-route-10-200-0-0-24 \
kubernetes-route-10-200-1-0-24 \
kubernetes-route-10-200-2-0-24
gcloud -q compute networks subnets delete kubernetes
gcloud -q compute networks delete kubernetes-the-hard-way
gcloud -q compute forwarding-rules delete kubernetes-forwarding-rule \
--region $(gcloud config get-value compute/region)
gcloud -q compute target-pools delete kubernetes-target-pool
gcloud -q compute http-health-checks delete kubernetes
gcloud -q compute addresses delete kubernetes-the-hard-way
}
This assumes you named your resources 'kubernetes-the-hard-way', if you do not know the names, you can also use various filter mechanisms to filter resources by namespaces etc to remove these.
Related
I am running a regional GKE kubernetes cluster in is-central1-b us-central-1-c and us-central1-f. I am running 1.21.14-gke.700. I am adding a confidential node pool to the cluster with this command.
gcloud container node-pools create card-decrpyt-confidential-pool-1 \
--cluster=svcs-dev-1 \
--disk-size=100GB \
--disk-type=pd-standard \
--enable-autorepair \
--enable-autoupgrade \
--enable-gvnic \
--image-type=COS_CONTAINERD \
--machine-type="n2d-standard-2" \
--max-pods-per-node=8 \
--max-surge-upgrade=1 \
--max-unavailable-upgrade=1 \
--min-nodes=4 \
--node-locations=us-central1-b,us-central1-c,us-central1-f \
--node-taints=dedicatednode=card-decrypt:NoSchedule \
--node-version=1.21.14-gke.700 \
--num-nodes=4 \
--region=us-central1 \
--sandbox="type=gvisor" \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--service-account="card-decrpyt-confidential#corp-dev-project.iam.gserviceaccount.com" \
--shielded-integrity-monitoring \
--shielded-secure-boot \
--tags=testingdonotuse \
--workload-metadata=GKE_METADATA \
--enable-confidential-nodes
This creates a node pool but there is one problem... I can still SSH to the instances that the node pool creates. This is unacceptable for my use case as these node pools need to be as secure as possible. I went into my node pool and created a new machine template with ssh turned off using an instance template based off the one created for my node pool.
gcloud compute instance-templates create card-decrypt-instance-template \
--project=corp-dev-project
--machine-type=n2d-standard-2
--network-interface=aliases=gke-svcs-dev-1-pods-10a0a3cd:/28,nic-type=GVNIC,subnet=corp-dev-project-private-subnet,no-address
--metadata=block-project-ssh-keys=true,enable-oslogin=true
--maintenance-policy=TERMINATE --provisioning-model=STANDARD
--service-account=card-decrpyt-confidential#corp-dev-project.iam.gserviceaccount.com
--scopes=https://www.googleapis.com/auth/cloud-platform
--region=us-central1 --min-cpu-platform=AMD\ Milan
--tags=testingdonotuse,gke-svcs-dev-1-10a0a3cd-node
--create-disk=auto-delete=yes,boot=yes,device-name=card-decrpy-instance-template,image=projects/confidential-vm-images/global/images/cos-89-16108-766-5,mode=rw,size=100,type=pd-standard
--shielded-secure-boot
--shielded-vtpm -
-shielded-integrity-monitoring
--labels=component=gke,goog-gke-node=,team=platform --reservation-affinity=any
When I change the instance templates of the nodes in the node pool the new instances come online but they do not attach to the node pool. The cluster is always trying to repair itself and I can't change any settings until I delete all the nodes in the pool. I don't receive any errors.
What do I need to do to disable ssh into the node pool nodes with the original node pool I created or with the new instance template I created. I have tried a bunch of different configurations with a new node pool and the cluster and have not had any luck. I've tried different tags network configs and images. None of these have worked.
Other info:
The cluster was not originally a confidential cluster. The confidential nodes are the first of its kind added to the cluster.
One option you have here is to enable private IP addresses for the nodes in your cluster. The --enable-private-nodes flag will make it so the nodes in your cluster get private IP addresses (rather than the default public, internet-facing IP addresses).
Note that in this case, you would still be able to SSH into these nodes, but only from within your VPC network.
Also note that this means you would not be able to access NodePort type services from outside of your VPC network. Instead, you would need to use a LoadBalancer type service (or provide some other way to route traffic to your service from outside of the cluster, if required).
If you'd like to prevent SSH access even from within your VPC network, your easiest option would likely be to configure a firewall rule to deny SSH traffic to your nodes (TCP/UDP/SCTP port 22). Use network tags (the --tags flag) to target your GKE nodes.
Something along the lines of:
gcloud compute firewall-rules create fw-d-i-ssh-to-gke-nodes \
--network NETWORK_NAME \
--action deny \
--direction ingress \
--rules tcp:22,udp:22,sctp:22 \
--source-ranges 0.0.0.0/0 \
--priority 65534 \
--target-tags my-gke-node-network-tag
Finally, one last option I'll mention for creating a hardened GKE cluster is to use Google's safer-cluster Terraform module. This is an opinionated setup of a GKE cluster that follows many of the principles laid out in Google's cluster hardening guide and the Terraform module takes care of a lot of the nitty-gritty boilerplate here.
I needed the metadata flag when creating the node pool
--metadata=block-project-ssh-keys=TRUE \
This blocked ssh.
However, enable-os-login=false won't work because it is reserved for use by the Kubernetes Engine
I'd like to deploy a single app to multiple servers in one time.
I'm using Kubernetes and K3S to easily deploy containers.
Basically, I have a master server that I run and multiple servers that are localed in my customers facilities.
Master server was initialized with the following command:
k3sup install \
--ip $MASTER_IP \
--user ubuntu \
--cluster --k3s-channel latest \
--k3s-extra-args "--node-label ols.role=master"
Customer's servers were launched with:
k3sup join \
--ip $WORKER01_IP \
--user ubuntu \
--server-ip $MASTER_IP \
--server-user ubuntu \
--k3s-channel latest \
--k3s-extra-args "--node-label ols.role=worker"
When I want to deploy a new web service on each customer's server, I've tried the following code:
helm install node-red k8s-at-home/node-red --set nodeSelector."ols\.role"=worker
Problem: Only one single pod is deployed.
What I'd like is to deploy a single pod on each server and make it independent.
Is there a way to do that ?
Here there are two different things that we need to consider.
If the requirement is just to run more number of replicas of the application a change to the deployment template in the helm chart or through values you can pass number of minimum replicas need to be working in the cluster.
Reference documentation for deployments
Coming to next thing, if the requirements is just to run application across all the nodes existing in the cluster, Daemonsets is the workload which gives the capability to run across all the existing nodes.
Reference documentation for daemonsets
Again if you are using helm to deploy, appropriate templates for either daemonsets or deployments need to be added or modified based on the existing contents of the helm chart.
There are also different workloads k8s supports so based on requirements they can be picked appropriately.
In GKE, the Reclaim Policy of my PersistentVolume is set to Retain, in order to prevent unintentional data removal. However, sometimes, after the deletion of some PersistentVolumes, I'd like to remove the associated Google Persistent Disks manually. Deleting the Google Persistent Disks using the web UI (i.e. Google Cloud Console) is time-consuming, that's why I'd like to use a gcloud command to remove all Google Persistent Disks that are not attached to a GCP VM instance. Could somebody please provide me this command?
This one should work:
gcloud compute disks delete $(gcloud compute disks list --filter="-users:*" --format "value(uri())")
You can use the gcloud compute disks delete command in cloud shell to delete all the disks that are not attached in a gcp vm instance.
gcloud compute disks delete DISK_NAME [DISK_NAME …] [--region=REGION | --zone=ZONE] [GCLOUD_WIDE_FLAG …]
you can provide disknames through this command to delete them.
disk delete
The link that #sandeep-mohanty includes suggests that only non-attached disks are deleted by the command.
Assuming (!) that to be true (check before you delete), you can enumerate a project's disks and then delete the (not attached) disks with:
PROJECT=[[YOUR-PROJECT]]
# Get PAIRS (NAME,ZONE) for all disk in ${PROJECT}
# Using CSV (e.g. my-disk,my-zone) enables IFS parsing (below)
PAIRS=$(\
gcloud compute disks list \
--project=${PROJECT} \
--format="csv[no-heading](name,zone.scope())")
# Iterate over the PAIRS
for PAIR in ${PAIRS}
do
# Extract values of NAME,ZONE from PAIR
IFS=, read NAME ZONE <<< ${PAIR}
# Describe
printf "Attempting to delete disk: %s [%s]\n" ${NAME} ${ZONE}
# Deleting a disks should only succeed if not attached
gcloud compute disks delete ${NAME} \
--zone=${ZONE} \
--project=${PROJECT} \
--quiet
done
NOTE In the unlikely event that Google changes the semantics of gcloud compute disks delete to delete attached disks, this script will delete every disk in the project.
How to get Kubernetes cluster name from K8s API mentions that
curl http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google"
(from within the cluster), or
kubectl run curl --rm --restart=Never -it --image=appropriate/curl -- -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name
(from outside the cluster), can be used to retrieve the cluster name. That works.
Is there a way to perform the same programmatically using the k8s client-go library? Maybe using the RESTClient()? I've tried but kept getting the server could not find the requested resource.
UPDATE
What I'm trying to do is to get the cluster-name from an app that runs either in a local computer or within a k8s cluster. the k8s client-go allows to initialise the clientset via in cluster or out of cluster authentication.
With the two commands mentioned at the top that is achievable. I was wondering if there was a way from the client-go library to achieve the same, instead of having to do kubectl or curl depending on where the service is run from.
The data that you're looking for (name of the cluster) is available at GCP level. The name itself is a resource within GKE, not Kubernetes. This means that this specific information is not available using the client-go.
So in order to get this data, you can use the Google Cloud Client Libraries for Go, designed to interact with GCP.
As a starting point, you can consult this document.
First you have to download the container package:
➜ go get google.golang.org/api/container/v1
Before you will launch you code you will have authenticate to fetch the data:
Google has a very good document how to achieve that.
Basically you have generate a ServiceAccount key and pass it in GOOGLE_APPLICATION_CREDENTIALS environment:
➜ export GOOGLE_APPLICATION_CREDENTIALS=sakey.json
Regarding the information that you want, you can fetch the cluster information (including name) following this example.
Once you do do this you can launch your application like this:
➜ go run main.go -project <google_project_name> -zone us-central1-a
And the result would be information about your cluster:
Cluster "tom" (RUNNING) master_version: v1.14.10-gke.17 -> Pool "default-pool" (RUNNING) machineType=n1-standard-2 node_version=v1.14.10-gke.17 autoscaling=false%
Also it is worth mentioning that if you run this command:
curl http://metadata/computeMetadata/v1/instance/attributes/cluster-name -H "Metadata-Flavor: Google"
You are also interacting with the GCP APIs and can go unauthenticated as long as it's run within a GCE machine/GKE cluster. This provided automatic authentication.
You can read more about it under google`s Storing and retrieving instance metadata document.
Finally, one great advantage of doing this with the Cloud Client Libraries, is that it can be launched externally (as long as it's authenticated) or internally within pods in a deployment.
Let me know if it helps.
If you're running inside GKE, you can get the cluster name through the instance attributes: https://pkg.go.dev/cloud.google.com/go/compute/metadata#InstanceAttributeValue
More specifically, the following should give you the cluster name:
metadata.InstanceAttributeValue("cluster-name")
The example shared by Thomas lists all the clusters in your project, which may not be very helpful if you just want to query the name of the GKE cluster hosting your pod.
I'm creating Kubernetes clusters programmatically for end-to-end tests in GitLab CI/CD. I'm using gcloud container clusters create. I'm doing this for half a year and created and deleted a few hundred clusters. The cost went up and down. Now, I got an unusually high bill from Google and I checked the cost breakdown. I noticed that the cost is >95% for "Storage PD Capacity". I found out that gcloud container clusters delete never deleted the Google Compute Disks created for Persistent Volume Claims in the Kubernetes cluster.
How can I delete those programmatically? What else could be left running after deleting the Kubernetes cluster and the disks?
Suggestions:
To answer your immediate question: you can programatically delete your disk resource(s) with the Method: disks.delete API.
To determine what other resources might have been allocated, look here: Listing all Resources in your Hierarchy.
Finally, this link might also help: GKE: Understanding cluster resource usage
Because this part of the answer is lengthy:
gcloud compute disks create disk-a \
--size=10gb \
--zone=us-west1-a \
--labels=something=monday \
--project=${PROJECT}
gcloud compute disks create disk-b \
--size=10gb \
--zone=us-west1-b \
--labels=something=else \
--project=${PROJECT}
Then:
ID=$(gcloud compute disks list \
--filter="name~disk zone~us-west1 labels.something=else" \
--format="value(id)" \
--project=${PROJECT}) && echo ${ID}
NB
the filter AND is implicit and omitted
you may remove terms as needed
you should make the filter as specific as possible
And -- when you're certain as deletion is irrecoverable:
gcloud compute disks delete ${ID} --project=${PROJECT} --region=${REGION}
If there are multiple matches, you can iterate:
IDS=$(gcloud compute disks list ...)
for ID in ${IDS}
do
gcloud compute disks delete ${ID}
done
If you prefer -- the awesome jq, you'll have a general-purpose way (not gcloud-specific):
gcloud compute disks list \
--project=${PROJECT} \
--format=json \
| jq --raw-output '.[] | select(.name | contains("disk")) | select(.zone | contains("us-west1")) | select(.labels.something=="else")'
...