Istio has several default metrics, such as istio_requests_total, istio_request_bytes, istio_tcp_connections_opened_total. Istio envoy proxy computes and exposes these metrics. On the Istio website, it shows that istio_requests_total is a COUNTER incremented for every request handled by an Istio proxy.
We made some experiments where we let a lot of requests go through the Istio envoy to reach a microservice behind Istio envoy, and at the same time we monitored the metric from Istio envoy. However, we found that istio_requests_total does not include the requests that have got through Istio envoy to the backend microservice but their responses have not arrived at Istio envoy from the backend microservice. In other words, istio_requests_total only includes the number of served requests, and does not include the requests in flight.
My question is: is our observation right? Why does istio_requests_total not include the requests in flight?
As mentioned here
The default metrics are standard information about HTTP, gRPC and TCP requests and responses. Every request is reported by the source proxy and the destination proxy as well and these can provide a different view on the traffic. Some requests may not be reported by the destination (if the request didn't reach the destination at all), but some labels (like connection_security_policy) are only available on the destination side. Here are some of the most important HTTP metrics:
istio_requests_total is a COUNTER that aggregates request totals between Kubernetes workloads, and groups them by response codes, response flags and security policy.
As mentioned here
When Mixer collects metrics from Envoy, it assigns dimensions that downstream backends can use for grouping and filtering. In Istio’s default configuration, dimensions include attributes that indicate where in your cluster a request is traveling, such as the name of the origin and destination service. This gives you visibility into traffic anywhere in your cluster.
Metric to watch: requests_total
The request count metric indicates the overall throughput of requests between services in your mesh, and increments whenever an Envoy sidecar receives an HTTP or gRPC request. You can track this metric by both origin and destination service. If the count of requests between one service and another has plummeted, either the origin has stopped sending requests or the destination has failed to handle them. In this case, you should check for a misconfiguration in Pilot, the Istio component that routes traffic between services. If there’s a rise in demand, you can correlate this metric with increases in resource metrics like CPU utilization, and ensure that your system resources are scaled correctly.
Maybe it's worth to check envoy docs about that, because of what's written here
The queries above use the istio_requests_total metric, which is a standard Istio metric. You can observe other metrics, in particular, the ones of Envoy (Envoy is the sidecar proxy of Istio). You can see the collected metrics in the insert metric at cursor drop-down menu.
Based on above docs I agree with that what #Joel mentioned in comments
I think you're correct, and I imagine the "why" is because of response flags that are expected to be found on the metric labels. This can be written only when a response is received. If they wanted to do differently, I guess it would mean having 2 different counters, one for request sent and one for response received.
Related
I have read this question which is very similar to what I am asking, but still wanted to write a new question since the accepted answer there seems very incomplete and also potentially wrong.
Basically, it seems like there is some missing or contradictory information regarding built in load-balancing for regular Kubernetes Services (I am not talking about LoadBalancer services). For example, the official Cilium documentation states that "Kubernetes doesn't come with an implementation of Load Balancing". In addition, I couldn't find any information in the official Kubernetes documentation about load balancing for internal services (there was only a section discussing this under ingresses).
So my question is - how does load balancing or distribution of requests work when we make a request from within a Kubernetes cluster to the internal address of a Kubernetes service?
I know there's a Kubernetes proxy on each node that creates the DNS records for such services, but what about services that span multiple pods and nodes? There's got to be some form of request distribution or load-balancing, or else this just wouldn't work at all, no?
A standard Kubernetes Service provides basic load-balancing. Even for a ClusterIP-type Service, the Service has its own cluster-internal IP address and DNS name, and forwards requests to the collection of Pods specified by its selector:.
In normal use, it is enough to create a multiple-replica Deployment, set a Service to point at its Pods, and send requests only to the Service. All of the replicas will receive requests.
The documentation discusses the implementation of internal load balancing in more detail than an application developer normally needs. Unless your cluster administrator has done extra setup, you'll probably get round-robin request routing – the first Pod will receive the first request, the second Pod the second, and so on.
... the official Cilium documentation states ...
This is almost certainly a statement about external load balancing. As a cluster administrator (not a programmer) a "plain" Kubernetes installation doesn't include an external load-balancer implementation, and a LoadBalancer-type Service behaves identically to a NodePort-type Service.
There are obvious deficiencies to round-robin scheduling, most notably if you do wind up having individual network requests that take a long time and a lot of resource to service. As an application developer the best way to address this is to make these very-long-running requests run asynchronously; return something like an HTTP 201 Created status with a unique per-job URL, and do the actual work in a separate queue-backed worker.
I've seen scenarios where requests from one workload, sent to a ClusterIP service for another workload with no affinities set, only get routed to a subset of the associated pods. The Endpoints object for this service does show all of the pod IPs.
I did a little experiment to figure out what is happening.
Experiment
I set up minikube to have a "router" workload with 3 replicas sending requests to a "backend" workload also with 3 pods. The router just sends a request to the service name like http://backend.
I sent 100 requests to the router service via http://$MINIKUBE_IP:$NODE_PORT, since it's exposed as a NodePort service. Then I observed which backend pods actually handled requests. I repeated this test multiple times.
In most cases, only 2 backend pods handled any requests, with the occasional case where all 3 did. I didn't see any where all requests went to one in these experiments, though I have seen it happen before running other tests in AKS.
This led me to the theory that the router is keeping a persistent connection to the backend pod it connects to. Given there are 3 routers and 3 backends, there's an 11% chance all 3 routers "stick" to a single backend, a 67% chance that between the 3 routers, they stick to 2 of the backends, and a 22% chance that each router sticks to a different backend pod (1-to-1).
Here's one possible combination of router-to-backend connections (out of 27 possible):
Disabling HTTP Keep-Alive
If I use a Transport disabling HTTP Keep-Alives in the router's http client, then any requests I make to the router are uniformly distributed between the different backends on every test run as desired.
client := http.Client{
Transport: &http.Transport{
DisableKeepAlives: true,
},
}
resp, err := client.Get("http://backend")
So the theory seems accurate. But here's my question:
How does the router using HTTP KeepAlive / persistent connections actually result in a single connection between one router pod and one backend pod?
There is a kube-proxy in the middle, so I'd expect any persistent connections to be between the router pod and kube-proxy as well as between kube-proxy and the backend pods.
Also, when the router does a DNS lookup, it's going to find the Cluster IP of the backend service every time, so how can it "stick" to a Pod if it doesn't know the Pod IP?
Using Kubernetes 1.17.7.
This excellent article covers your question in detail.
Kubernetes Services indeed do not load balance long-lived TCP connections.
Under the hood Services (in most cases) use iptables to distribute connections between pods. But iptables wasn't designed as a balancer, it's a firewall. It isn't capable to do high-level load balancing.
As a weak substitution iptables can create (or not create) a connection to a certain target with some probability - and thus can be used as L3/L4 balancer. This mechanism is what kube-proxy employs to somewhat imitate load balancing.
Does iptables use round-robin?
No, iptables is primarily used for firewalls, and it is not designed to do load balancing.
However, you could craft a smart set of rules that could make iptables behave like a load balancer.
And this is precisely what happens in Kubernetes.
If you have three Pods, kube-proxy writes the following rules:
select Pod 1 as the destination with a likelihood of 33%. Otherwise, move to the next rule
choose Pod 2 as the destination with a probability of 50%. Otherwise, move to the following rule
select Pod 3 as the destination (no probability)
What happens when you use keep-alive with a Kubernetes Service?
Let's imagine that front-end and backend support keep-alive.
You have a single instance of the front-end and three replicas for the backend.
The front-end makes the first request to the backend and opens the TCP connection.
The request reaches the Service, and one of the Pod is selected as the destination.
The backend Pod replies and the front-end receives the response.
But instead of closing the TCP connection, it is kept open for subsequent HTTP requests.
What happens when the front-end issues more requests?
They are sent to the same Pod.
Isn't iptables supposed to distribute the traffic?
It is.
There is a single TCP connection open, and iptables rule were invocated the first time.
One of the three Pods was selected as the destination.
Since all subsequent requests are channelled through the same TCP connection, iptables isn't invoked anymore.
Also it's not quite correct to say that kube-proxy is in the middle.
It isn't - kube-proxy by itself doesn't manage any traffic.
All that it does - it creates iptables rules.
It's iptables who actually listens, distributes, does DNAT etc.
Similar question here.
We have a gRPC application deployed in a cluster (v 1.17.6) with Istio (v 1.6.2) setup. The cluster has istio-ingressgateway setup as the edge LB, with SSL termination. The istio-ingressgateway is fronted by an AWS ELB (classic LB) in passthrough mode. This setup is fully functional and the traffic flows as intended, in general. So the setup looks like:
ELB => istio-ingressgateway => virtual service => app service => [(envoy)pods]
We are running load tests on this setup using GHZ (ghz.sh), running external to the application cluster. From the tests we’ve run, we have observed that each of the app container seems to get about 300 RPS routed to it, no matter the configuration of the GHZ test. For reference, we have tried various combos of --concurrency and --connection settings for the tests. This ~300 RPS is lower than what we expect from the app and, hence, requires a lot more PODs to provide the required throughput.
We are really interested in understanding the details of the physical connection (gRPC/HTTP2) setup in this case, all the way from the ELB to the app/envoy and the details of the load balancing being done. Of particular interest is the the case when the same client, GHZ e.g., opens up multiple connections (specified via the --connection option). We have looked at Kiali and it doesn’t give us the appropriate visibility.
Questions:
How can we get visibility into the physical connections being setup from the ingress gateway to the pod/proxy?
How is the “per request gRPC” load balancing happening?
What options might exist to optimize the various components involved in this setup?
Thanks.
1.How can we get visibility into the physical connections being setup from the ingress gateway to the pod/proxy?
If Kiali doesn't show what exactly you need, maybe you could try with Jaeger?
Jaeger is an open source end to end distributed tracing system, allowing users to monitor and troubleshoot transactions in complex distributed systems.
There is istio documentation about Jaeger.
Additionally Prometheus and Grafana might be helpful here, take a look here.
2.How is the “per request gRPC” load balancing happening?
As mentioned here
By default, the Envoy proxies distribute traffic across each service’s load balancing pool using a round-robin model, where requests are sent to each pool member in turn, returning to the top of the pool once each service instance has received a request.
If you wan't to change the default round-robin model you can use Destination Rule for that. Destination rules let you customize Envoy’s traffic policies when calling the entire destination service or a particular service subset, such as your preferred load balancing model, TLS security mode, or circuit breaker settings.
There is istio documentation about that.
More about load balancing in envoy here.
3.What options might exist to optimize the various components involved in this setup?
I'm not sure if there is anything to optimize in istio components, maybe some custom configuration in Destination Rule?
Additional Resources:
itnext.io
medium.com
programmaticponderings.com
My application has a /health Http endpoint, configured for Kubernetes liveness check probe. The API returns a json, containing the health indicators.
Kubernetes only cares about the returned http status, but I would like to store the json responses in Prometheus for monitoring purposes.
Is it possible to catch the response once Kubernetes calls the API? I do not want to add the feature to the application itself but use an external component.
What is the recommended way of doing it?
Answering to what you've asked:
Make a sidecar that calls localhost:port/health every N seconds and stores the most recent reply. N should be equal to the prometheus scraping interval for accurate results.
A sidecar then exposes the most recent reply in the form of metric in /metrics endpoint, on a separate port of a pod. You could use https://github.com/prometheus/client_python to implement the sidecar. Prometheus exporter sidecar is actually a widely used pattern, try searching for it.
Point Prometheus to service /metrics endpoint, which is now served by a sidecar, on a separate port. You will need a separate port in a Service object, to point to your sidecar port. The scraping interval can be adjusted at this stage, to be in sync with your N, or otherwise, just adjust N. For scrape_config details refer to: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#service
If you need an automatic Prometheus target discovery, say you have a bunch of deployments like this and their number varies - refer to my recent answer: https://stackoverflow.com/a/64269434/923620
Proposing a simpler solution:
In your app ensure you log everything you return in /health
Implement centralized logging (log aggregation): https://kubernetes.io/docs/concepts/cluster-administration/logging/
Use a log processor, e.g. ELK, to query/analyze the results.
I want to control/intercept the load balancer traffic using Istio. Istio gives you the ability to add a mixer on a service level but I want to add some code on a higher level just before the request traffic rules get executed.
Thus instead of adding actions per service I want to have some actions executed just after the request was received from the load balancer.
As per official Istio Documentation istio-ingressgateway is the main entry point for exposing nested services outside the cluster. Therefore, Istio Gateway collects information about incoming or outgoing HTTP/TCP connections and also specifies the set of ports that should be exposed, the type of protocol to use, etc. Gateway can be applied on the corresponded Envoy sidecar in the Pod through the labels.
Keep in mind that Istio Gateway operates within L4-L6 layers of load balancing and it's not aware of network source provider.
More information about Istio load balancing architecture you can find here.