How to retrieve node-pool size from a k8s cluster? - kubernetes

I couldn't find useful information from:
gcloud container clusters describe CLUSTER_NAME
or from
gcloud container node-pools describe POOL_NAME --cluster CLUSTER_NAME
It is easy to scale up/down using gcloud tool though:
gcloud container clusters resize [CLUSTER_NAME] --node-pool [POOL_NAME] \
--size [SIZE]
But how can I know beforehand what is the size of my node-pool?

I do not agree with the current answer because it only gives the total size of the cluster.
The question is about node-pools. I actually needed to find out the size of a pool so I give you my best shot after many hours of searching and thinking.
read -p 'Cluster name: ' CLUSTER_NAME
read -p 'Pool name: ' POOL_NAME
gcloud compute instance-groups list \
| grep "^gke-$CLUSTER_NAME-$POOL_NAME" \
| awk '{print $6}';
The gcloud command returns 6 columns: 1-name, 6-group-size.
The name of the instance group is predictable which lets me filter by that line with grep.
Lastly, select the 6th column.
Hope this helps someone else save some time.
For some reason I overlooked the not-so-obviuos from Migrating workloads to different machine types
kubectl get nodes -l cloud.google.com/gke-nodepool=$POOL_NAME -o=name \
| wc -l

You should use the following command:
gcloud container clusters describe <cluster name> --zone <zone-cluster>
Check for the field currentNodeCount

building on top of #hanzo2001 answer - something like this will probably reflect what you need:
kubectl get nodes -L cloud.google.com/gke-nodepool | grep -v GKE-NODEPOOL | awk '{print $6}' | sort | uniq -c | sort -r
16 n2-standard-4-pool
2 preempt-custom-6
2 default-pool

There is one (often overlooked/hidden) feature of the gcloud - filters and formatters. These can help you to get the requested information without further awk/grep -ing.
Get the current size/node count of a specific node-pool:
gcloud compute instance-groups list \
--filter "name:gke-<cluster>-<nodepool>-*" \
--format 'value(size)'
Example for cluster test and node-pool default-pool:
$ gcloud compute instance-groups list --filter "name:gke-test-default-pool-*" \
--format 'value(size)'
2
Get the current size/node count of every node-pool in the cluster you can further use:
CLUSTER=test
for nodepool in $(gcloud container node-pools list --cluster $CLUSTER --format="value(name)"); do
echo -n "${nodepool}: "
gcloud compute instance-groups list \
--filter "name:gke-${CLUSTER}-${nodepool}-*" \
--format 'value(size)'
done
Other relevant resources:
gcloud topic filters - gcloud filters reference
gcloud topic formats - gcloud formats reference
GCP blog post on filtering and formatting

Related

GCR Image Tag Listing using GCloud SDK CLI

I'm trying to get list of all the tags in my private GCR repository. I could do that using "gcloud container images list-tags" command as follows:
gcloud container images list-tags gcr.io/project-id/REPONAME
DIGEST TAGS TIMESTAMP
6b5727be962a 0.0.4,latest 2020-06-25T14:14:48
4b8c3f9c6ab7 0.0.3 2020-06-22T08:56:01
However I need the list to be flatten so that i can get tags "0.0.4" and "latest" in separate rows. I tried following command.
gcloud container images list-tags gcr.io/project-id/REPONAME --flatten='[].tags'
This gave me output which is to my surprise repeating "latest" tag but ommiting "0.0.4"
DIGEST TAGS TIMESTAMP
6b5727be962a latest 2020-06-25T14:14:48
6b5727be962a latest 2020-06-25T14:14:48
4b8c3f9c6ab7 0.0.3 2020-06-22T08:56:01
What am I doing wrong, and how can I fix this?
I am able to repro your observation and think it's a bug.
The --flatten appears to correctly enumerate tags but incorrectly returns the last value in the list as each entry's value.
In my case, if the tags are v1,v2,v3, I get:
gcloud container images list-tags gcr.io/${PROJECT}/${IMAGE} \
--flatten="[].tags[]" \
--format="value(tags)" \
--filter="digest=${DIGEST}"
v3
v3
v3
I recommend you file a bug on Google's Issue Tracker for Cloud SDK
jq
If you have jq, perhaps:
gcloud container images list-tags gcr.io/${PROJECT}/${IMAGE} \
--format=json |\
jq -r '.[] | .digest as $D | .timestamp.datetime as $T | .tags[]| {"digest":$D,"tag":.,"timestamp":$T}'
Or:
gcloud container images list-tags gcr.io/${PROJECT}/${IMAGE} \
--format=json |\
jq -r '.[] | .digest as $D | .timestamp.datetime as $T | .tags[]| [$D,.,$T] | #csv'

Get count of Kubernetes pods that aren't running

I've this command to list the Kubernetes pods that are not running:
sudo kubectl get pods -n my-name-space | grep -v Running
Is there a command that will return a count of pods that are not running?
If you add ... | wc -l to the end of that command, it will print the number of lines that the grep command outputs. This will probably include the title line, but you can suppress that.
kubectl get pods -n my-name-space --no-headers \
| grep -v Running \
| wc -l
If you have a JSON-processing tool like jq available, you can get more reliable output (the grep invocation will get an incorrect answer if an Evicted pod happens to have the string Running in its name). You should be able to do something like (untested)
kubectl get pods -n my-namespace -o json \
| jq '.items | map(select(.status.phase != "Running")) | length'
If you'll be doing a lot of this, writing a non-shell program using the Kubernetes API will be more robust; you will generally be able to do an operation like "get pods" using an SDK call and get back a list of pod objects that you can filter.
You can do it without any external tool:
kubectl get po \
--field-selector=status.phase!=Running \
-o go-template='{{len .items}}'
the filtering is done with field-selectors
the counting is done with go-template: {{ len .items }}

Google Cloud Endpoint Error when creating service config

I am trying to configure Google Cloud Endpoints using Cloud Functions. For the same I am following instructions from: https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-functions
I have followed the steps given and have come to the point of building the service config into a new ESPv2 Beta docker image. When I give the command:
chmod +x gcloud_build_image
./gcloud_build_image -s CLOUD_RUN_HOSTNAME \
-c CONFIG_ID -p ESP_PROJECT_ID
after replacing the hostname and configid and projectid I get the following error
> -c service-host-name-xxx -p project-id
Using base image: gcr.io/endpoints-release/endpoints-runtime-serverless:2
++ mktemp -d /tmp/docker.XXXX
+ cd /tmp/docker.5l3t
+ gcloud endpoints configs describe service-host-name-xxx.run.app --project=project-id --service=service-host-name-xxx.app --format=json
ERROR: (gcloud.endpoints.configs.describe) NOT_FOUND: Service configuration 'services/service-host-name-xxx.run.app/configs/service-host-name-xxx' not found.
+ error_exit 'Failed to download service config'
+ echo './gcloud_build_image: line 46: Failed to download service config (exit 1)'
./gcloud_build_image: line 46: Failed to download service config (exit 1)
+ exit 1
Any idea what am I doing wrong? Thanks
My bad. I repeated the steps and got it working. So I guess there must have been some mistake I did while trying it out. The document works as it states.
I had the same error. When running the script twice it works. This means you have to already have a service endpoint configured, which does not exist yet when the script tries to fetch the endpoint information with:
gcloud endpoints configs describe service-host-name-xxx.run.app
What I would do (in cloudbuild) is to supply some sort of an "empty" container first. I used the following example on top of my cloudbuild.yaml:
gcloud run services list \
--platform managed \
--project ${PROJECT_ID} \
--region europe-west1 \
--filter=${PROJECT_ID}-esp-svc \
--format yaml | grep . ||
gcloud run deploy ${PROJECT_ID}-esp-svc \
--image="gcr.io/endpoints-release/endpoints-runtime-serverless:2" \
--allow-unauthenticated \
--platform managed \
--project=${PROJECT_ID} \
--region=europe-west1 \
--timeout=120

Google cloud's glcoud compute instance create gives error "The resource projects/{ourID}/global/images/family/debian-8 was not found

We are using a server I created on Google Cloud Platform to create and manage the other servers over there. But when trying to create a new server from the Linux command line with the GCloud compute instances create function we receive the following error:
marco#ans-mgmt-01:~/gcloud$ ./create_gcloud_instance.sh app-tst-04 tst,backend-server,bootstrap home-tst 10.20.22.104
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- The resource 'projects/REMOVED_OUR_PROJECTID/global/images/family/debian-8' was not found
Our script looks like this:
#!/bin/bash
if [ "$#" -ne 4 ]; then
echo "Usage: create_gcloud_instance <instance_name> <tags> <subnet_name> <server_ip>"
exit 1
fi
set -e
INSTANCE_NAME=$1
TAGS=$2
SERVER_SUBNET=$3
SERVER_IP=$4
gcloud compute --project "REMOVED OUR PROJECT ID" instances create "$INSTANCE_NAME" \
--zone "europe-west1-c" \
--machine-type "f1-micro" \
--network "cloudnet" \
--subnet "$SERVER_SUBNET" \
--no-address \
--private-network-ip="$SERVER_IP" \
--maintenance-policy "MIGRATE" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--service-account "default" \
--tags "$TAGS" \
--image-family "debian-8" \
--boot-disk-size "10" \
--boot-disk-type "pd-ssd" \
--boot-disk-device-name "bootdisk-$INSTANCE_NAME" \
./clean_known_hosts.sh $INSTANCE_NAME
On the google cloud console (console.cloud.google.com) I enabled the cloud api access scope for the ans-mgmt-01 server and also tried to create a server from there. That's working without problems.
The problem is that gcloud is looking for the image family in your project and not the debian-cloud project where it really exists.
This can be fixed by simply using --image-project debian-cloud.
This way instead of looking for projects/{yourID}/global/images/family/debian-8, it will look for projects/debian-cloud/global/images/family/debian-8.
For me the problem was debian-8(and now debian-9) reached the end of life and no longer supported. Updating to debian-10 or debian-11 fixed the issue
For me the problem was debian-9 after so much time came to an end and tried updating to debian-10 fixed the issue
you could run below command to see if the image is available
gcloud compute images list | grep debian
Below is the result from the command
NAME: debian-10-buster-v20221206
PROJECT: debian-cloud
FAMILY: debian-10
NAME: debian-11-bullseye-arm64-v20221102
PROJECT: debian-cloud
FAMILY: debian-11-arm64
NAME: debian-11-bullseye-v20221206
PROJECT: debian-cloud
FAMILY: debian-11
So you could have some idea from your result

ECS Service - Automating deploy with new Docker image

I want to automate the deployment of my application by having my ECS service launch with the latest Docker image. From what I've read, the way to deploy a new image version is as follows:
Create a new task revision (after updating the image on your Docker repository).
Update the service and specify the new revision.
This seems to work, but I want to do this all through CLI so I can script it. #2 seems easy enough to do through the AWS CLI with update-service, but I don't see a way to do #1 without specifying the entire Task JSON all over again as with register-task-definition (my JSON will include credentials in environment variables, so I want to have that in as few places as possible).
Is this how I should be automating deployment of my ECS Service updates? And if so, is there a "good" way to have the Task Definition launch a new revision (i.e. without duplicating everything)?
Yes, that is the correct approach.
And no, with the current API, you can't register a new revision of an existing task definition without duplicating it.
If you didn't use the CLI to generate the original task definition (or don't want to reuse the original commands that generated it), you could try something like the following through the CLI:
OLD_TASK_DEF=$(aws ecs describe-task-definition --task-definition <task_family_name>)
NEW_CONTAINER_DEFS=$(echo $OLD_TASK_DEF | jq '.taskDefinition.containerDefinitions' | jq '.[0].image="<new_image_name>"')
aws ecs register-task-definition --family <task_family_name> --container-definitions "'$(echo $NEW_CONTAINER_DEFS)'"
Not 100% secure as the last command's --container-defintions argument (which includes "environment" entries) will still be visible through processes like ps. One of the AWS SDKs would give better peace of mind.
The answer provided by Matt Callanan did not work for me: I received an error on this part:
--container-definitions "'$(echo $NEW_CONTAINER_DEFS)'"
Resulted in: Error parsing parameter '--container-definitions': Expected: '=', received: ''' for input:
'{ environment: [ { etc etc....
What I did to resolve it was:
TASK_FAMILY=<task familiy name>
DOCKER_IMAGE=<new_image_name>
LATEST_TASK_DEFINITION=$(aws ecs describe-task-definition --task-definition ${TASK_FAMILY})
echo $LATEST_TASK_DEFINITION \
| jq '{containerDefinitions: .taskDefinition.containerDefinitions, volumes: .taskDefinition.volumes}' \
| jq '.containerDefinitions[0].image='\"${DOCKER_IMAGE}\" \
> /tmp/tmp.json
aws ecs register-task-definition --family ${TASK_FAMILY} --cli-input-json file:///tmp/tmp.json
I take both the containerDefinitions and volumes elements from the original json document, because my containerDefinition uses these volumes (so it's not needed if you don't use volumes).
#!/bin/bash
SERVICE_NAME="your service name"
IMAGE_VERSION="v_"${BUILD_NUMBER}
TASK_FAMILY="your task defination name"
CLUSTER="your cluster name"
REGION="your region"
echo "=====================Create a new task definition for this build==========================="
sed -e "s;%BUILD_NUMBER%;${BUILD_NUMBER};g" taskdef.json > ${TASK_FAMILY}-${IMAGE_VERSION}.json
echo "=================Resgistring the task defination==========================================="
aws ecs register-task-definition --family ${TASK_FAMILY} --cli-input-json file://${TASK_FAMILY}-${IMAGE_VERSION}.json --region ${REGION}
echo "================Update the service with the new task definition and desired count================"
TASK_REVISION=`aws ecs describe-task-definition --task-definition ${TASK_FAMILY} --region ${REGION} | egrep "revision" | tr "/" " " | awk '{print $2}' | sed 's/"$//'`
DESIRED_COUNT=`aws ecs describe-services --cluster ${CLUSTER} --services ${SERVICE_NAME} --region ${REGION} | jq .services[].desiredCount`
if [ ${DESIRED_COUNT} = "0" ]; then
DESIRED_COUNT="1"
fi
echo "===============Updating the service=============================================================="
aws ecs update-service --cluster ${CLUSTER} --service ${SERVICE_NAME} --task-definition ${TASK_FAMILY}:${TASK_REVISION} --desired-count ${DESIRED_COUNT} --region ${REGION}
enter code here