Rancher, Prometheus reporting double memory usage - kubernetes

We have a cluster built via Rancher, I have monitoring enabled and I can see the monitoring showing exactly the double of used memory as the "kubectl top pod" command, and this is causing memory used higher than limit and impossible to deploy an HorizontalPodAutoScaler on memory usage...
Anyone has ever been in this problem or knows where should I star looking for the problem?
# kubectl top pod xxxxxxx-api-66f8446df9-drccw
NAME CPU(cores) MEMORY(bytes)
xxxxxxx-api-66f8446df9-drccw 113m 310Mi
And Rancher UI:
Rancher v2.3.5
User Interface v2.3.36
Helm v2.14.3-rancher1
Machine v0.15.0-rancher29

Related

filebeat pod restarting multiple times and not getting logs in kibana

We are using ELK for logging and monitoring of my AKS Cluster. but sometimes filebeat pod is restarting and unable to pick the log into elastic search.
[![enter image description here][1]][1]
here the pod log as well.
2021-08-09T12:10:04.191Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":5128640,"time":{"ms":722}},"total":{"ticks":10563900,"time":{"ms":1266},"value":10563900},"user":{"ticks":5435260,"time":{"ms":544}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":12},"info":{"ephemeral_id":"12737862-0ffc-4805-8e49-d06e61ae95ad","uptime":{"ms":228300048}},"memstats":{"gc_next":75621616,"memory_alloc":43143552,"memory_total":11734470608},"runtime":{"goroutines":30}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":1}}},"registrar":{"states":{"current":15061}},"system":{"load":{"1":2.41,"15":2.22,"5":2.36,"norm":{"1":0.6025,"15":0.555,"5":0.59}}}}}}
So, Could anybody suggest that what might be the reasons for restarting the pod multiple times and what are the ways to resolve this?
[1]: https://i.stack.imgur.com/h5ABD.png
Sometimes this would happen due to CPU and Memory limits that we set. Describe your pod to know the reason for restarts. if the reason is OOMKilled then check the below command to verify the current utilizations of the filebeat pod.
kubectl top pods -n namespace
According to the output of the top command, change(increase or describe) the CPU and Memory limits of your pod manifest.

Is there a known method to decide auto scaling threshold value?

Is there a known method / keyword/ topic to solve how to decide auto scale threshold value?
Take K8s HPA for example below, I only know I can install some monitoring tools then check memory usage showing on the graph by my eyes to decide a proper threshold value 100Mi. But why not to set it 99Mi, why not to set it 101Mi? I think this method is too manual.
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
As I am not mastering in computer science, I want to ask
Is there a known method on solving this kind of problem?
Or what kind of course will cover this problem?
Or what is the keyword to search from academic article?
In order to display this information without any graph you can use metrics server. Running it in your cluster makes it possible to get usage for nodes and individual pods through the kubectl top command.
Here`s an example where I'm checking the node resouces:
➜ ~ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
minikube 580m 28% 1391Mi 75%
And for a pod:
➜ ~ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
front-end 0m 28Mi
You can also see resource usages across individual containers instead of pods using the --containers option.
I assume that if you use HPA you have this already installed but it's worth to know that If you use minikube you can easily enable metrics server with minikube addons enable metrics-server. If you bootstrap your server using kubeadm then you have to install it and configure with all of it`s requirements in order to run correctly.
Lastly you can always check manually your pod usage with exec into it:
kubectl exec -it <name_of_the_pod> top
You can here for more prod information about autoscalers.

Profiling Kubernetes Deployment Process

I'm new in Kubernetes and currenlty I'm researching about profiling in Kubernetes. I want to log deployment process in Kubernetes (creating pod, restart pod, etc) and want to know the time and resources(RAM, CPU) needed in each process (for example when downloading image, building deployment, pod, etc).
Is there a way or tool for me to log this process? Thank you!
I am not really sure you can achieve the outcome you want without extensive knowledge about certain components and some deep dive coding.
What can be retrieved from Kubernetes:
Information about events
Like pod creation, termination, allocation with timestamps:
$ kubectl get events --all-namespaces
Even in the json format there is nothing about CPU/RAM usage in this events.
Information about pods
$ kubectl get pods POD_NAME -o json
No information about CPU/RAM usage.
$ kubectl describe pods POD_NAME
No information about CPU/RAM usage either.
Information about resource usage
There is some tools to monitor and report basic resource usage:
$ kubectl top node
With output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
MASTER 90m 9% 882Mi 33%
WORKER1 47m 5% 841Mi 31%
WORKER2 37m 3% 656Mi 24%
$ kubectl top pods --all-namespaces
With output:
NAMESPACE NAME CPU(cores) MEMORY(bytes)
default nginx-local-84ddb99b55-2nzdb 0m 1Mi
default nginx-local-84ddb99b55-nxfh5 0m 1Mi
default nginx-local-84ddb99b55-xllw2 0m 1Mi
There is CPU/RAM usage but in basic form.
Information about deployments
$ kubectl describe deployment deployment_name
Provided output gives no information about CPU/RAM usage.
Getting information about resources
Getting resources like CPU/RAM usage specific to some actions like pulling the image or scaling the deployment could be problematic. Not all processes are managed by Kubernetes and additional tools at OS level might be needed to fetch that information.
For example pulling an image for deployment engages the kubelet agent as well as the CRI to talk to Docker or other Container Runtime your cluster is using. Adding to that, the Container Runtime not only downloads the image, it does other actions that are not directly monitored by Kubernetes.
For another example HPA (Horizontal Pod Autoscaler) is Kubernetes abstraction and getting it's metrics would be highly dependent on how the metrics are collected in the cluster in order to determine the best way to fetch them.
I would highly encourage you to share what exactly (case by case) you want to monitor.
You can find these in the events feed for the pod, check kubectl describe pod.

HPA could not get CPU metric during GKE node auto-scaling

Cluster information:
Kubernetes version: 1.12.8-gke.10
Cloud being used: GKE
Installation method: gcloud
Host OS: (machine type) n1-standard-1
CNI and version: default
CRI and version: default
During node scaling, HPA couldn't get CPU metric.
At the same time, kubectl top pod and kubectl top node output is:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
For more details, I'll show you the flow of my problem occurs:
Suddenly many requests arrive at the GKE server. (Using testing tool)
HPA detects current CPU usage above target CPU usage(50%), thus try pod scale up
incrementally.
Insufficient CPU warning occurs when creating pods, thus GKE try node scalie up
incrementally.
Soon the HPA fails to get the metric, and kubectl top node or kubectl top pod
doesn’t get a response.
- At this time one or more OutOfcpu pods are found, and several pods are in
ContainerCreating (from Pending state).
After node scale-up is complete and some time has elapsed (about a few minutes),
HPA starts to fetch the CPU metric successfully and try to scale up/down based on
metric.
Same situation happens when node scale down.
This causes pod scaling to stop and raises some failures on responding to client’s requests. Is this normal?
I think HPA should get CPU metric(or other metrics) on running pods even during node scaling, to keep track of the optimal pod size at the moment. So when node scaling done, HPA create the necessary pods at once (rather than incrementally).
Can I make my cluster work like this?
Maybe your node runs out of one resource either memory or cpu, there are config maps that describe how addons are scaled depending on the cluster size. You need to edit metrics-server-config config map in kube-system namespace:
kubectl edit cm/metrics-server-config -n kube-system
you should add
baseCPU
cpuPerNode
baseMemory
memoryPerNode
to NannyConfiguration, here you can find extensive manual:
Also heapster suffers from the same OOM issue: too many pods to handle all metrics within assigned resources please modify heapster's config map in accordingly:
kubectl edit cm/heapster-config -n kube-system

How to find out the minimum and maximum usable CPU and memory space left on a kubernetes node

I'm trying to deploy Magento on a GCE n1-standard-1 machine, but I keep getting the following error message.
pod (magento-magento-1486272877-zd34d) failed to fit in any node fit failure summary on nodes : Insufficient cpu (1)
I'm using the official Magento helm chart, and I've configured the values.yml file to contain very low CPU requests: cpu: 25m
When I look at the node details on the kubernetes dashboard, I see that my CPU is already spinning at 0.728 (72.80%) while it's not even doing anything besides the system containers. Also see image below:
Does this mean I have 1 - 0.728 = 0.272m left for container requests? Then why is kubernetes still telling me that it has insufficient CPU when specifying 0.25m?
Thanks for your help.
I didn't see that the CPU limits were 0.248 according to the picture in my post, so I put cpu: 20m and it worked.
There is a nifty kubectl command to get information about your nodes resources...
kubectl top nodes
And pods...
kubectl top pods
Pods with containers
kubectl top pods --containers=true