Monitoring Rancher containers by hosts through Prometheus cAdvisor NodeExporter - grafana

I have a setup where I manage to monitor every container of my Rancher 1.6 environnement with a stack Prometheus(2.4.3)/Grafana (with cAdvisor v0.27.4 and NodeExporter v0.16.0).
Here is my issue. I manage to monitor every container consumption but I can't relate the consumption of a container based on the host.
For example, if I want to show information about CPU usage I use container_cpu_user_seconds_total from cAdvisor which provides cpu usage of the container in percentage related to its host but I can't find which host is concerned (I Have 4 hosts on this environnement) so the cpu consumption cumulative tends to go over 100%.
I would like to either show charts by host (I saw I could create dynamic charts in Grafana but it seems a bit hard so manually creating them doesn't bother me).
Should I try to create my own metrics in prom-conf file ? Seems a bit overkill for such stuff
I find this very strange that this information only interests me. That's why I ask it here.
I'm new to all of these tools.
Thank you in advance.

Related

Kubernetes - How to calculate resources we need for each container?

How to figure out how much min and max resources to allocate for each application deployment? I'm setting up a cluster and I haven't setup any resources and letting it run freely.
I guess I could use top command to figure out the load during the peak time and work on that but still top says like 6% or 10% but then I'm not sure how to calculate them to produce something like 0.5 cpu or 100 MB. Is there a method/formula to determine max and min based on top command usage?
I'm running two t3.medium nodes and I have the following pods httpd and tomcat in namespace1, mysql in namepsace2, jenkins and gitlab in namespace3. Is there any guide to minimum resources it needs? or Do I have to figure it based on top or some other method?
There are few things to discuss here:
Unix top and kubectl top are different:
Unix top uses proc virtual filesystem and reads /proc/meminfo file to get an actual information about the current memory usage.
kubectl top shows metrics information based on reports from cAdvisor, which collects the resource usage. For example: kubectl top pod POD_NAME --containers: Show metrics for a given pod and its containers or kubectl top node NODE_NAME: Show metrics for a given node.
You can use the metrics-server to get the CPU and memory usage of the pods. With it you will be able to Assign CPU Resources to Containers and Pods.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it
depends on many factors related to the application itself, the
demand model, the tolerance to errors etc.
As a supplement I recommend going through the Managing Resources for Containers docs.

Kubernetes - Monitoring pod IO

I would like to monitor the IO which my pod is doing. Using commands like 'kubectl top pods/nodes', i can monitor CPU & Memory. But I am not sure how to monitor IO which my pod is doing, especially disk IO.
Any suggestions ?
Since you already used kubectl top command I assume you have metrics server. In order to have more advanced monitoring solution I would suggest to use cAdvisor, Prometheus or Elasticsearch.
For getting started with Prometheus you can check this article.
Elastic search has System diskio and Docker diskio metrics set. You can easily deploy it using helm chart.
Part 3 of the series about kubernetes monitoring is especially focused on monitoring container metrics using cAdvisor. Allthough it is worth checking whole series.
Let me know if this helps.

Realtime monitoring of CPU usage/CPU limit in k8s container

I am trying to measure CPU usage of a container in Kubernetes, represented as a ratio between actual usage and usage limit in a short time window. This should be ideally close to real-time (up to 5s delay).
I have full control of the container code and I can also extend the pod with a sidecar container to do reporting for me.
I have looked at Prometheus deployed using Prometheus operator, but I am seeing the data landing with large delays or even not showing up at all for some pods.
I was hoping somebody could shed some light on how to implement any of those:
sidecar container that can query cpu usage/cpu limit and send the data to another service (I am worried that this is impossible, because containers run in isolated file systems).
another process within main container, that can do the reporting. Maybe dividing $(cat /sys/fs/cgroup/cpu/cpuacct.usage) / $(/sys/fs/cgroup/cpu/cpu.cfs_quota_us) would do the trick?
use some existing software tool/service to achieve this. Any recommendation would be appreciated.
Thank you very much!
Deploy a sidecar container along with the container that you want to monitor. The sidecar container should monitor the cpu of the main container and push the status to prometheus or some other monitoring service. With alerting you can set thresholds and if the cpu is over the threshold then prometheus would trigger an alert action through alert manager service

How do you monitor kubernetes nodes deployed using kops?

We have some Kubernetes clusters that have been deployed using kops in AWS.
We really like using the upstream/official images.
We have been wondering whether or not there was a good way to monitor the systems without installing software directly on the hosts? Are there docker containers that can extract the information from the host? I think that we are likely concerned with:
Disk space (this seems to be passed through to docker via df
Host CPU utilization
Host memory utilization
Is this host/node level information already available through heapster?
Not really a question about kops, but a question about operating Kubernetes. kops stops at the point of having a functional k8s cluster. You have networking, DNS, and nodes have joined the cluster. From there your world is your oyster.
There are many different options for monitoring with k8s. If you are a small team I usually recommend offloading monitoring and logging to a provider.
If you are a larger team or have more specific needs then you can look at such options as Prometheus and others. Poke around in the https://github.com/kubernetes/charts repository, as I know there is a Prometheus chart there.
As with any deployment of any form of infrastructure you are going to need Logging, Monitoring, and Metrics. Also, do not forget to monitor the monitoring ;)
I am using https://prometheus.io/, it goes naturally with kubernetes.
Kubernetes api already exposes a bunch of metrics in prometheus format,
https://github.com/kubernetes/ingress-nginx also exposes prometheus metrics (enable-vts-status: "true"), and you can also install https://github.com/prometheus/node_exporter as a daemonset to monitor CPU, disk, etc...
I install one prometheus inside the cluster to monitor internal metrics and one outside the cluster to monitor LBs and URLs.
Both send alerts to the same https://github.com/prometheus/alertmanager that MUST be outside the cluster.
It took me about a week to configure everything properly.
It was worth it.

Kubernetes Heapster not working correctly

I set up kubernetes-dashboard and heapster according to the docs in the github. My kubernetes client and server version are both 1.5.4 and the whole cluster is deployed on 10 physical servers with ubuntu 14.04 OS. I used heapster 1.3.
I can access the dashboard and see some figures about cpu and memory usage, but I do not know why some pods do not have such figures, i.e., the cpu and memory usage of some pods are not displayed in figure format. The two pictures below are examples.
dashboard display 1
dashboard display 2
Also, I find that sometimes for a pod, it only displays memory, without cpu usage.
I checked cAdvisor on each node by accessing its web ui, and it works quite well on each node. This problem troubles me several days. Can someone help me figure out this issue or give some hints? I'd be very grateful to you.