How to get prometheus to monitor kubernetes service? - kubernetes

I'd like to monitor my Kubernetes Service objects to ensure that they have > 0 Pods behind them in "Running" state.
However, to do this I would have to first group the Pods by service and then group them further by status.
I would also like to do this programatically (e.g. for each service in namespace ...)
There's already some code that does this in the Sensu kubernetes plugin: https://github.com/sensu-plugins/sensu-plugins-kubernetes/blob/master/bin/check-kube-service-available.rb but I haven't seen anything that shows how to do it with Prometheus.
Has anyone setup kubernetes service level health checks with Prometheus? If so, how did you group by service and then group by Pod status?

The examples I have seen for Prometheus service checks relied on the blackbox exporter:
The blackbox exporter will try a given URL on the service. If that succeeds, at least one pod is up and running.
See here for an example: https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml in job kubernetes-service-endpoints
The URL to probe might be your liveness probe or something else. If your services don't talk HTTP, you can make the blackbox exporter test other protocols as well.

Related

Is there a metric in kubernetes cAdvisor that can tell how many endpoints are there for a NodePort/ClusterIP?

I'm looking to monitor the number of active Endpoints in NodePort and ClusterIP service. There are several cases when my pods restart or get destroyed. So its important to know if atleast one Endpoint is there to serve the incoming request.
Is there some metric for it in cAdvisor that I can expose via Prometheus? If not is there some way to track this?
I have no idea about cadvicer for monitoring endpoints but by using prometheus blackbox we can achieve this.
The Prometheus Blackbox exporter allows endpoints exploration over several protocols, such as HTTP(S), DNS, TCP, and ICMP. This exporter generates multiple metrics on your configured targets, like general endpoint status, response time, redirect information,Detecting endpoint failures, or certificate expiration dates.
Refer to this link for installation and to configure the setup.

How to solve "Ingress Error: Some backend services are in UNHEALTHY state"?

I am working on deploying a certain pod to GKE but I am having an unhealthy state for my backend services.
The deployment went through via helm install process but the ingress reports a certain warning error that says Some backend services are in UNHEALTHY state. I have tried to access the logs but do not know exactly what to look out for. Also, I already have liveness and readiness probes running.
What could I do to make the ingress come back to a healthy state? Thanks
Picture of warning error on GKE UI
Without more details it is hard to determine the exact cause.
As first point I want to mention, that your error message is Some backend services are in UNHEALTHY state, not All backend services are in UNHEALTHY state. It indicates that only a few of your backends are affected.
There might be tons of reasons, if you are using GCP Ingress or Nginx Ingress, your configuration of externalTrafficPolicy, if you are using preemptive nodes, your livenessProbe and readinessProbe, health checks, etc.
In your scenario, only a few backends are affected, the only thing with current information I can suggest you some debug options.
Using $ kubectl get po -n <namespace> check if all your pods are working correctly, that all containers within pods are Ready and pod status is Running. Eventually check logs of suspicious pod $ kubectl logs <podname> -c <containerName>. In general you should check all pods the load balancer is pointing to,
Confirm if livenessProbe and readinessProbe are configured properly and response is 200,
Describe your ingress $ kubectl describe ingress <yourIngressName> and check backends,
Check if you've configured your health checks properly according to GKE Ingress for HTTP(S) Load Balancing - Health Checks guide.
If you still won't be able to solve this issue with above debug options, please provide more details about your env with logs (without private information).
Useful links:
kubernetes unhealthy ingress backend
GKE Ingress shows unhealthy backend services
In GKE you can define BackendConfig. To define custom health checks. you can configure this using the below link to make the ingress backend to be in a HEALTHY state.
https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health
If you have kubectl access to your pods, you can run kubectl get pod, and then kubctl logs -f <pod-name>.
Review the logs and find the error(s).

Load Balancing between PODS

Is there a way to do active and passive load balancing between 2 PODs of a micro-service. Say I have 2 instance(PODs) running of Micro-service, which is exposed using a K8s service object. Is there a way to configure the load balancing such a way that one pod will always get the request and when that pod is down , the other pod will start receiving the request?
I have ingress object also on top of that service.
This is what the Kubernetes Service object does, which you already mentioned you are using. Make sure you set up a readiness probe in your pod template so that the system can tell when your app is healthy.

How does the failover mechanism work in kubernetes service?

According to some of the tech blogs (e.g. Understanding kubernetes networking: services), k8s service dispatch all the requests through iptable rules.
What if one of the upstream pods crashed when a request happened to be routed on that pods.
Is there a failover mechanism in kubernetes service?
Will the request will be forwarded to next pod automatically?
How does kubernetes solve this through iptable?
Kubernetes offers a simple Endpoints API that is updated whenever the set of Pods in a Service changes. For non-native applications, Kubernetes offers a virtual-IP-based bridge to Services which redirects to the backend Pods
Here is the detail k8s service & endpoints
So your answer is endpoint Object
kubectl get endpoints,services,pods
There are liveness and readiness checks which decides if the pod is able to process the request or not. Kubelet with docker has mechanism to control the life cycle of pods. If the pod is healthy then its the part of the endpoint object.

Traefik health checks via kubernetes annotation

I want setup Traefik backend health check via Kubernetes annotation, but looks like Kubernetes Ingress does not support that functionality according to official documentation.
Is any particular reason why Traefik does not support that functionality for Kubernetes Ingress? I'm wondering because Mesos support health checks for a backend.
I know that in Kubernetes you can configure readiness/liveness probe for the pods, but I have leader/follower fashion service, so Traefik should route the traffic only to the leader.
UPD:
The only leader can accept the connection from Traefik; a follower will refuse the connection.
I have two readiness checks in my mind:
Service is up and running, and ready to be elected as a leader (kubernetes readiness probe)
Service is up and running and promoted as a leader (traefik health check)
Traefik relies on Kubernetes to provide an indication of the health of the underlying pods to ascertain whether they are ready to provide service. Kubernetes exposes two mechanisms in a pod to communicate information to the orchestration layer:
Liveness checks to provide an indication to Kubernetes when the process(es) running in the pod have transitioned to a broken state. A failing liveness check will cause Kubernetes to destroy the pod and recreate it.
Readiness checks to determine when a pod is ready to provide service. A failing readiness check will cause the Endpoint Controller to remove the pod from the list of endpoints of any services it provides. However, it will remain running.
In this instance, you would expose information to Traefik via a readiness check. Configure your pods with a readiness check which fails if they are in a state in which they should not receive any traffic. When the readiness state changes, Kubernetes will update the list of endpoints against any services which route traffic to the pod to add or remove the pod. Traefik will accordingly update its view of the world to add or remove the pod from the list of endpoints backing the Ingress.
There is no reason for this model to be incompatible with your master/follower architecture, provided each pod can ascertain whether it is the master or follower and provide an appropriate indication in its readiness check. However, without taking special care, there will be races between the master/follower state changing and Kubernetes updating its endpoints, as readiness probes are only made periodically. I recommend assuming this will be the case and building-in logic to reject requests received by non-master pods.
As a future consideration to increase robustness, you might split the ingress layer of your service from the business logic implementing the master/follower system, allowing all instances to communicate with Traefik and enqueue work for consideration by whatever is the "master" node at this point.