Kubernetes metrics-server doesn't provide all metrics or scale HPA

Kubernetes metrics-server doesn't provide all metrics or scale HPA - kubernetes

Following the example here https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects, I have created installed metrics-server and modified it as follows:
spec:
containers:
- command:
- metrics-server
- --secure-port=8443
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
image: docker.io/bitnami/metrics-server:0.3.7-debian-10-r89
imagePullPolicy: IfNotPresent
name: metrics-server
ports:
- containerPort: 8443
name: https
protocol: TCP
resources: {}
My nodes are listed when queried:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[{"metadata":{"name":"eo-test-metrics-35lks","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/eo-test-metrics-35lks","creationTimestamp":"2020-11-04T04:05:58Z"},"timestamp":"2020-11-04T04:05:28Z","window":"30s","usage":{"cpu":"770120208n","memory":"934476Ki"}},{"metadata":{"name":"eo-test-metrics-35lkp","selfLink":"/apis/metrics.k8s.io/v1beta1/nodes/eo-test-metrics-35lkp","creationTimestamp":"2020-11-04T04:05:58Z"},"timestamp":"2020-11-04T04:05:25Z","window":"30s","usage":{"cpu":"483763591n","memory":"850756Ki"}}]}
But, the HPA targets remain 'unknown':
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 10 1 31m
Running top nodes works but top pods does not
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
eo-test-metrics-35lkp 93m 4% 934Mi 30%
eo-test-metrics-35lks 166m 8% 1053Mi 33%
Top pods return error error: Metrics not available:
kubectl top pods
W1104 11:19:27.854485 62798 top_pod.go:266] Metrics not available for pod default/php-apache-d4cf67d68-blt2c, age: 13h1m51.854463s
error: Metrics not available for pod default/php-apache-d4cf67d68-blt2c, age: 13h1m51.854463s
This is on Kubernetes server version v1.19.3 and metrics server version 0.3.6
The logs from metrics-server
E1104 21:21:56.496129 1 reststorage.go:160] unable to fetch pod metrics for pod default/php-apache-d4cf67d68-blt2c: no metrics known for pod
E1104 21:22:10.945091 1 reststorage.go:160] unable to fetch pod metrics for pod default/php-apache-d4cf67d68-blt2c: no metrics known for pod
E1104 21:22:26.496814 1 reststorage.go:160] unable to fetch pod metrics for pod default/php-apache-d4cf67d68-blt2c: no metrics known for pod

The issue is resolved when Docker 19.03 is used on Kubernetes version 1.19 in relation to this upstream issue: https://github.com/kubernetes/kubernetes/issues/94281

Use this version of metrics server
git clone https://github.com/kodekloudhub/kubernetes-metrics-server
kubectl apply -f kubernetes-metrics-server
Then follow the same instructions as mentioned in this link
It will increase the number of pods as the load goes up
BUT I noticed the autoscaler does not scale down the deployment upon the load stopping. It might take some time until it scales down again
Note: This version of metrics server could only be used in a dev or learning environment.
I used Katakoda environment for testing the answer.

Related

Kubernetes resource quota, have non schedulable pod staying in pending state

So I wish to limit resources used by pod running for each of my namespace, and therefor want to use resource quota.
I am following this tutorial.
It works well, but I wish something a little different.
When trying to schedule a pod which will go over the limit of my quota, I am getting a 403 error.
What I wish is the request to be scheduled, but waiting in a pending state until one of the other pod end and free some resources.
Any advice?

Instead of using straight pod definitions (kind: Pod) use deployment.
Why?
Pods in Kubernetes are designed as relatively ephemeral, disposable entities:
You'll rarely create individual Pods directly in Kubernetes—even singleton Pods. This is because Pods are designed as relatively ephemeral, disposable entities. When a Pod gets created (directly by you, or indirectly by a controller), the new Pod is scheduled to run on a Node in your cluster. The Pod remains on that node until the Pod finishes execution, the Pod object is deleted, the Pod is evicted for lack of resources, or the node fails.
Kubernetes assumes that for managing pods you should a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
By using deployment you will get very similar behaviour to the one you want.
Example below:
Let's suppose that I created pod quota for a custom namespace, set to "2" as in this example and I have two pods running in this namespace:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 75s
quota-demo-2 1/1 Running 0 6s
Third pod definition:
apiVersion: v1
kind: Pod
metadata:
name: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
Now I will try to apply this third pod in this namespace:
kubectl apply -f pod.yaml -n quota-demo
Error from server (Forbidden): error when creating "pod.yaml": pods "quota-demo-3" is forbidden: exceeded quota: pod-demo, requested: pods=1, used: pods=2, limited: pods=2
Not working as expected.
Now I will change pod definition into deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: quota-demo-3-deployment
labels:
app: quota-demo-3
spec:
selector:
matchLabels:
app: quota-demo-3
template:
metadata:
labels:
app: quota-demo-3
spec:
containers:
- name: quota-demo-3
image: nginx
ports:
- containerPort: 80
I will apply this deployment:
kubectl apply -f deployment-v3.yaml -n quota-demo
deployment.apps/quota-demo-3-deployment created
Deployment is created successfully, but there is no new pod, Let's check this deployment:
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 0/1 0 0 12s
We can see that a pod quota is working, deployment is monitoring resources and waiting for the possibility to create a new pod.
Let's now delete one of the pod and check deployment again:
kubectl delete pod quota-demo-2 -n quota-demo
pod "quota-demo-2" deleted
kubectl get deploy -n quota-demo
NAME READY UP-TO-DATE AVAILABLE AGE
quota-demo-3-deployment 1/1 1 1 2m50s
The pod from the deployment is created automatically after deletion of the pod:
kubectl get pods -n quota-demo
NAME READY STATUS RESTARTS AGE
quota-demo-1 1/1 Running 0 5m51s
quota-demo-3-deployment-7fd6ddcb69-nfmdj 1/1 Running 0 29s
It works the same way for memory and CPU quotas for namespace - when the resources are free, deployment will automatically create new pods.

kubectl delete deployment not removing pods and replicasets

We run the following command in k8s
kubectl delete deployment ${our-deployment-name}
And this seems to delete the deployment called our-deployment-name fine. However we also want to delete the replicasets and pods that below to 'our-deployment-name'.
Reading the documents it is not clear if the default behaviour should cascade delete replicasets and pods. Does anybody know how do delete the deployment and all related replicasets and pods? Or do I have to manually delete all of those resources as well?
When I delete a deployment I have an orphaned replicaset like this...
dev#jenkins:~$ kubectl describe replicaset.apps/wc-892-74697d58d9
Name: wc-892-74697d58d9
Namespace: default
Selector: app=wc-892,pod-template-hash=74697d58d9
Labels: app=wc-892
pod-template-hash=74697d58d9
Annotations: deployment.kubernetes.io/desired-replicas: 1
deployment.kubernetes.io/max-replicas: 2
deployment.kubernetes.io/revision: 1
Controlled By: Deployment/wc-892
Replicas: 1 current / 1 desired
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=wc-892
pod-template-hash=74697d58d9
Containers:
wc-892:
Image: registry.digitalocean.com/galatea/wastecoordinator-wc-892:1
Port: 8080/TCP
Host Port: 0/TCP
Limits:
memory: 800Mi
Environment: <none>
Mounts: <none>
Volumes: <none>
Priority Class Name: dev-lower-priority
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 11m replicaset-controller Created pod: wc-892-74697d58d9-jtj9t
dev#jenkins:~$

As you can see in the replicaset Controlled By: Deployment/wc-892 which means deleting the deployment wc-892 should delete the replicaset which would in turn delete the pods with label app=wc-892

First get the deployments which you want to delete
kubectl get deployments
and delete the deployment which wou want
kubectl delete deployment yourdeploymentname
This will delete the replicaset and pods associted with it.

kubectl delete deployment <deployment> will delete all ReplicaSets associated with the deployment AND the active pods associated with those ReplicaSets.
The controller-manager or API Server might be having issue handling the delete request. So I'd advise looking at those logs to verify.
Note, it's possible the older replicasets are attached to something else in the namespace? Try listing and look at the metadata. Using kubectl describe rs <rs> or kubectl get rs -o yaml

Subnetting within Kubernetes Cluster

I have couple of deployments - say Deployment A and Deployment B. The K8s Subnet is 10.0.0.0/20.
My requirement : Is it possible to get all pods in Deployment A to get IP from 10.0.1.0/24 and pods in Deployment B from 10.0.2.0/24.
This helps the networking clean and with help of IP itself a particular deployment can be identified.

Deployment in Kubernetes is a high-level abstraction that rely on controllers to build basic objects. That is different than object itself such as pod or service.
If you take a look into deployments spec in Kubernetes API Overview, you will notice that there is no such a thing as defining subnets, neither IP addresses that would be specific for deployment so you cannot specify subnets for deployments.
Kubernetes idea is that pod is ephemeral. You should not try to identify resources by IP addresses as IPs are randomly assigned. If the pod dies it will have another IP address. You could try to look on something like statefulsets if you are after unique stable network identifiers.
While Kubernetes does not support this feature I found workaround for this using Calico: Migrate pools feature.
First you need to have calicoctl installed. There are several ways to do that mentioned in the install calicoctl docs.
I choose to install calicoctl as a Kubernetes pod:
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
To make work faster you can setup an alias :
alias calicoctl="kubectl exec -i -n kube-system calicoctl /calicoctl -- "
I have created two yaml files to setup ip pools:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool1
spec:
cidr: 10.0.0.0/24
ipipMode: Always
natOutgoing: true
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: pool2
spec:
cidr: 10.0.1.0/24
ipipMode: Always
natOutgoing: true
Then you you have apply the following configuration but since my yaml were being placed in my host filesystem and not in calico pod itself I placed the yaml as an input to the command:
➜ cat ippool1.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
➜ cat ippool2.yaml | calicoctl apply -f-
Successfully applied 1 'IPPool' resource(s)
Listing the ippools you will notice the new added ones:
➜ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR
default-ipv4-ippool 192.168.0.0/16 true Always Never false all()
pool1 10.0.0.0/24 true Always Never false all()
pool2 10.0.1.0/24 true Always Never false all()
Then you can specify what pool you want to choose for you deployment:
---
metadata:
labels:
app: nginx
name: deployment1-pool1
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
annotations:
cni.projectcalico.org/ipv4pools: "[\"pool1\"]"
---
I have created similar one called deployment2 that used ippool2 with the results below:
deployment1-pool1-6d9ddcb64f-7tkzs 1/1 Running 0 71m 10.0.0.198 acid-fuji
deployment1-pool1-6d9ddcb64f-vkmht 1/1 Running 0 71m 10.0.0.199 acid-fuji
deployment2-pool2-79566c4566-ck8lb 1/1 Running 0 69m 10.0.1.195 acid-fuji
deployment2-pool2-79566c4566-jjbsd 1/1 Running 0 69m 10.0.1.196 acid-fuji
Also its worth mentioning that while testing this I found out that if your default deployment will have many replicas and will ran out of ips Calico will then use different pool.

What is the difference between kubectl autoscale vs kubectl scale

I am new to kubernetes and trying to understand when to use kubectl autoscale and kubectl scale commands

Scale in deployment tells how many pods should be always running to ensure proper working of the application. You have to specify it manually.
In YAMLs you have to define it in spec.replicas like in example below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Second way to specify scale (replicas) of deployment is use command.
$ kubectl run nginx --image=nginx --replicas=3
deployment.apps/nginx created
$ kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 3 3 3 3 11s
It means that deployment will have 3 pods running and Kubernetes will always try to maintain this number of pods (If any of the pods will crush, K8s will recreate it). You can always change it with in spec.replicas and use kubectl apply -f <name-of-deployment> or via command
$ kubectl scale deployment nginx --replicas=10
deployment.extensions/nginx scaled
$ kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 10 10 10 10 4m48s
Please read in documentation about scaling and replicasets.
Horizontal Pod Autoscaling (HPA) was invented to scale deployment based on metrics produced by pods. For example, if your application have about 300 HTTP request per minute and each your pod allows to 100 HTTP requests for minute it will be ok. However if you will receive a huge amount of HTTP request ~ 1000, 3 pods will not be enough and 70% of request will fail. When you will use HPA, deployment will autoscale to run 10 pods to handle all requests. After some time, when number of request will drop to 500/minute it will scale down to 5 pods. Later depends on request number it might go up or down depends on your HPA configuration.
Easiest way to apply autoscale is:
$ kubectl autoscale deployment <your-deployment> --<metrics>=value --min=3 --max=10
It means that autoscale will automatically scale based on metrics to maximum 10 pods and later it will downscale minimum to 3.
Very good example is shown at HPA documentation with CPU usage.
Please keep in mind that Kubernetes can use many types of metrics based on API (HTTP/HTTP request, CPU/Memory load, number of threads, etc.)
Hope it help you to understand difference between Scale and Autoscaling.

Does updating a Deployment (rolling update) keep both new and old coexisting replicas receiving traffic at the same time?

I just want to find out if I understood the documentation right:
Suppose I have an nginx server configured with a Deployment, version 1.7.9 with 4 replicas.
apiVersion: apps/v1beta1 # for versions before 1.6.0 use extensions/v1beta1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 4
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Now I update the image to version 1.9.1:
kubectl set image deployment/nginx-deployment nginx=nginx:1.9.1
Withkubectl get pods I see the following:
> kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-2100875782-c4fwg 1/1 Running 0 3s
nginx-2100875782-vp23q 1/1 Running 0 3s
nginx-390780338-bl97b 1/1 Terminating 0 17s
nginx-390780338-kq4fl 1/1 Running 0 17s
nginx-390780338-rx7sz 1/1 Running 0 17s
nginx-390780338-wx0sf 1/1 Running 0 17s
2 new instances (c4fwg, vp23q) of 1.9.1 have been started coexising for a while with 3 instances of the 1.7.9 version.
What happens to the request made to the service at this moment? Do all request go to the old pods until all the new ones are available? Or are the requests load balanced between the new and the old pods?
In the last case, is there a way to modify this behaviour and ensure that all traffic goes to the old versions until all new pods are started?

The answer to "what happens to the request" is that they will be round-robin-ed across all Pods that match the selector within the Service, so yes, they will all receive traffic. I believe kubernetes considers this to be a feature, not a bug.
The answer about the traffic going to the old Pods can be answered in two ways: perhaps Deployments are not suitable for your style of rolling out new Pods, since that is the way they operate. The other answer is that you can update the Pod selector inside the Service to more accurately describe "this Service is for Pods 1.7.9", which will pin that Service to the "old" pods, and then after even just one of the 1.9.1 Pods has been started and is Ready, you can update the selector to say "this Service is for Pods 1.9.1"
If you find all this to be too much manual labor, there are a whole bunch of intermediary traffic managers that have more fine-grained control than just using pod selectors, or you can consider a formal rollout product such as Spinnaker that will automate what I just described (presuming, of course, you can get Spinnaker to work; I wish you luck with it)