Kubernetes monitoring and self-healing - kubernetes

I am new to Kubernetes monitoring and self-healing. I wonder what kind of self-healing Kubernetes can provide, such as restart failed pod if necessary? anything else? what Kubernetes cannot provide.
As for Kubernetes monitoring, what kind of metrics we need to monitor in order to operate on Kubernetes instead of Kubernetes self-healing?
Any ideas welcomed. Thanks.

I'm afraid your question goes beyond what is possible to answer here on stackoverflow.
Yes, k8s is able to restart/reschedule pods. If you are already a bit familiar with key concepts, maybe pod-lifecycle is a point to start.
If you have little knowledge about k8s basics, I suggest you study Deployments, DaemonSets, Services etc. because Monitoring in k8s relies heavily on them!
You did not say what kind of metrics you are interested in.
For system metrics like io/cpu time etc. you can start with e.g. Kubernetes Metrics Server.
If you want to get insights into k8s metrics (how many services, uptime, etc.) have a look at kube-state-metrics which is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
Have fun with k8s
Cheers

Configure the liveness and readiness probes for pod health. And the Restart policy along with it. You can do more with services and replica sets.

Related

Does Kubernetes K8s use multple server for load balancing?

Kubernetes will be using the same server or we can use multiple servers with k8s. if yes then how it will be work ?
In case of one instance full then would it create a new instance to route everything to the new server?
If anyone can show a real example of K8s then it would be great!
For this I can suggest Kubernetes docs to start reading from but briefly,
Kubernetes deals with resources or networking in the Master nodes (Control Plane).
Worker nodes simply have the kube-proxy and basic control mechanisms coming from kubelet service. You still can not control your cluster from worker nodes.
And yes K8s can use multiple servers for LoadBalancing. This is a Possibility.
When it comes to K8s you do not have to work in a single zone so therefore you do not have to have all the pods in the same server.
So, in a single zone if you have one master and multiple worker nodes you will be using Master's scheduler and LoadBalancer to manage the resources or the traffic if necessary. If you have multiple Master nodes, then you will be using Masters' schedulers and etc.
For a real example of K8s just search for Highly-Available Kubernetes Clusters and switch to Images section. You can have a visualized opinion about them that way.
I hope I was a little bit of help. But the docs could be more helpful I suppose.

Prometheus HA in Kubernetes (AKS)

I'm running the following helm chart (https://github.com/helm/charts/tree/master/stable/prometheus) with server.replicaCount =2 and server.statefulSet.enabled = true.
For storage i use two Manage Disks (not Azure Files that is not POSIX) (2 PV and 2 PVC) are created during the deployment of the chart.
My question is:
Is this an HA solution? Are the metrics written to both prometheus instances (a service with a public ip and and headless "service" are created) and replicated to both disks?
How this replicas really work?
Thanks,
Sadly, as Piotr noted, this is not a true HA offering and Thanos is generally the preferred way to go for this kind of setup, but not without it's own gotchas. The amount of clusters you have is a factor, and you might need some sort of tooling account to be able to follow changes all the way through.
What I can offer you is this excellent talk, which includes a live demo and shows how this works in practice.
No, this is not HA solution. This only scales the deployment to have 2 replicas at all times which both are on statefulsets.
In order to achieve HA monitoring on Kuberetes there needs to be dynamic failure detection and routing tools involved.
There are couple of articles about getting prometheus work with HA:
Deploying an HA Prometheus in Kubernetes on AWS — Multiple Availability Zone Gotchas
HA Kubernetes Monitoring using Prometheus and Thanos
The number of replicas only instructs deployment to always have at least 2 running instances of the deployment pods. You can find more information about replicas in Kubernetes documentation.
In the helm chart documentation, there seems to be other options like server.service.statefulsetReplica.enabled and server.service.statefulsetReplica.replica but I think those are just tools that can help to create HA prometheus. Not a ready from the get go solution.
Hope it helps.

Live monitoring of container, nodes and cluster

we are using k8s cluster for one of our application, cluster is owned by other team and we dont have full control over there… We are trying to find out metrics around resource utilization (CPU and memory), detail about running containers/pods/nodes etc. Need to find out how many parallel containers are running. Problem is they have exposed monitoring of cluster via Prometheus but with Prometheus we are not getting live data, it does not have info about running containers.
My query is , what is that API which is by default available in k8s cluster and can give all what we need. We dont want to read data form another client like Prometheus or anything else, we want to read metrics directly from cluster so that data is not stale. Any suggestions?
As you mentioned you will need metrics-server (or heapster) to get those information.
You can confirm if your metrics server is running kubectl top nodes/pods or just by checking if there is a heapster or metrics-server pod present in kube-system namespace.
Also the provided command would be able to show you the information you are looking for. I wont go into details as here you can find a lot of clues and ways of looking at cluster resource usage. You should probably take a look at cadvisor too which should be already present in the cluster. It exposes a web UI which exports live information about all the containers on the machine.
Other than that there are probably commercial ways of acheiving what you are looking for, for example SignalFx and other similar projects - but this will probably require the cluster administrator involvement.

Get request count from Kubernetes service

Is there any way to get statistics such as service / endpoint access for services defined in Kubernetes cluster?
I've read about Heapster, but it doesn't seem to provide these statistics. Plus, the whole setup is tremendously complicated and relies on a ton of third-party components. I'd really like something much, much simpler than that.
I've been looking into what may be available in kube-system namespace, and there's a bunch of containers and services, there, Heapster including, but they are effectively inaccessible because they require authentication I cannot provide, and kubectl doesn't seem to have any API to access them (or does it?).
Heapster is the agent that collects data, but then you need a monitoring agent to interpret these data. On GCP, for example, that's fluentd who gets these metrics and sends to Stackdriver.
Prometheus is an excellent monitoring tool. I would recommend this one, if youare not on GCP.
If you would be on GCP, then as mentioned above you have Stackdriver Monitoring, that is configured by default for K8s clusters. All you have to do is to create a Stackdriver accound (this is done by one click from GCP Console), and you are good to go.

How do you monitor kubernetes nodes deployed using kops?

We have some Kubernetes clusters that have been deployed using kops in AWS.
We really like using the upstream/official images.
We have been wondering whether or not there was a good way to monitor the systems without installing software directly on the hosts? Are there docker containers that can extract the information from the host? I think that we are likely concerned with:
Disk space (this seems to be passed through to docker via df
Host CPU utilization
Host memory utilization
Is this host/node level information already available through heapster?
Not really a question about kops, but a question about operating Kubernetes. kops stops at the point of having a functional k8s cluster. You have networking, DNS, and nodes have joined the cluster. From there your world is your oyster.
There are many different options for monitoring with k8s. If you are a small team I usually recommend offloading monitoring and logging to a provider.
If you are a larger team or have more specific needs then you can look at such options as Prometheus and others. Poke around in the https://github.com/kubernetes/charts repository, as I know there is a Prometheus chart there.
As with any deployment of any form of infrastructure you are going to need Logging, Monitoring, and Metrics. Also, do not forget to monitor the monitoring ;)
I am using https://prometheus.io/, it goes naturally with kubernetes.
Kubernetes api already exposes a bunch of metrics in prometheus format,
https://github.com/kubernetes/ingress-nginx also exposes prometheus metrics (enable-vts-status: "true"), and you can also install https://github.com/prometheus/node_exporter as a daemonset to monitor CPU, disk, etc...
I install one prometheus inside the cluster to monitor internal metrics and one outside the cluster to monitor LBs and URLs.
Both send alerts to the same https://github.com/prometheus/alertmanager that MUST be outside the cluster.
It took me about a week to configure everything properly.
It was worth it.