How to get the latency of an application deployed in Kubernetes? - kubernetes

I have a simple java based application deployed in Kubernetes. I want to get the average latency of requests sent to the application(GET and POST).
Stackdriver Monitoring API has the latency details of loadbalancer. But that can only be collected after 210 seconds which is not sufficient in my case. How can I configure in Kubernetes to get the latency details every 30 seconds (or 1 minute) immediately.
I wish the solution to be independent of Java so that I can use it for any application I deploy.

On GKE, you can use Stackdriver Trace, which is GCP specific. I am currently fighting with python client library. Hopefully Java is more mature.
Or you can use Jaeger, which is CNCF project.

Use a Service Mesh
A Service Mesh will let you observe things like latency between your services without extra code for this in each applications. Istio is such an implementation that is available on Google Kubernetes Engine.
Get uniform metrics and traces from any running applications without requiring developers to manually instrument their applications.
Istio’s monitoring capabilities let you understand how service performance impacts things upstream and downstream
See Istio on GCP
use a service mesh: software that helps you orchestrate, secure, and collect telemetry across distributed applications. A service mesh transparently oversees and monitors all traffic for your application, typically through a set of network proxies that sit alongside each microservice.
Welcome to the service mesh era

Related

Gateway vs Agent OpenTelemetry collector deployment on Kubernetes

there is 2 ways to deploy OpenTelemetry collector on Kubernetes
https://opentelemetry.io/docs/collector/deployment/
Agent and Gateway
my question is when deploying OpenTelemetry collector as a deamonset
why we still need the Agent ?
https://www.aspecto.io/blog/opentelemetry-collector-guide/
agent mode
and also is it good approach to deploy OpenTelemetry as a deamonset without the Agent ?
Before I get to the core of your question, note that the agent/gateway modes are not Kubernetes specific. This pattern is equally valid and applicable in case you're running your microservices on, say, virtual machines or ECS.
Now, let's dive deeper into why there are cases where it makes sense to use the OpenTelemetry collector in agent/gateway mode. At its core, it's about separation of concerns, enabling different teams to focus on things they care about.
The agent mode means running an OpenTelemetry collector "close" to the workload. In the context of Kubernetes, this could be a sidecar (another container running the collector in the app pod), you could run a deployment per namespace, and indeed you could run the collector as a DaemonSet. No matter how you run these collectors, the expectation would be that the dev or product team owns the collector (config) and with it decides what receivers are needed for workloads. For example, one team needs a Prometheus receiver, another team needs a Statsd receiver for their app. We get back to the outbound (exporter) side of the collector in a moment.
The gateway mode means a central (standalone) operation of the collector, typically owned by the platform team, enabling them to:
Centrally enforce policies such as filtering sensitive log items, making sampling decisions for traces, dropping certain metrics, and so forth.
Manage permissions and credentials in a central place. For example, in order to ingest metrics using the Prometheus Remote Write exporter into Amazon Managed Service for Prometheus, the collector needs to use an IAM role with a specific IAM policy attached. The same applies for OTLP exporters, requiring you to provide an API key.
Scale the collector: either by running a bunch of OpenTelemetry collectors behind a loadbalancer and scale them horizontally or by vertically scaling a single collector instance. Through managing the collector scaling, the platform team can ensure that all signals (traces, metrics, logs) are reliable delivered to the backend destinations.
Now we get back to the communication between agents and gateway: this is done via the OpenTelemetry Protocol (OTLP). That is, using the OTLP exporter on the agent side and the OTLP receiver on the gateway side, which further simplifies both the task on the side of the dev/product teams and the platform team, providing for a secure and performant telemetry data transfer.

Kubernetes - is Service Mesh a must?

Recently I have built several microservices within a k8s cluster with Nginx ingress controller and they are working normally.
When dealing with communications among microservices, I attempted gRPC and it worked. Then I discover when microservice A -> gRPC -> microservice B, all requests were only occurred at 1 pod of microservice B (e.g. total 10 pods available for microservice B). In order to load balance the requests to all pods of microservice B, I attempted linkerd and it worked. However, I realized gRPC sometimes will produce internal error (e.g. 1 error out of 100 requests), making me changed to using the k8s DNS way (e.g. my-svc.my-namespace.svc.cluster-domain.example). Then, the requests never fail. I started to hold up gRPC and linkerd.
Later, I was interested in istio. I successfully deployed it to the cluster. However, I observe it always creates its own load balancer, which is not so matching with the existing Nginx ingress controller.
Furthermore, I attempted prometheus and grafana, as well as k9s. These tools let me have better understanding on cpu and memory usage of the pods.
Here I have several questions that I wish to understand:-
If I need to monitor cluster resources, we have prometheus, grafana and k9s. Are they doing the same monitoring role as service mesh (e.g. linkerd, istio)?
if k8s DNS can already achieve load balancing, do we still need service mesh?
if using k8s without service mesh, is it lag behind the normal practice?
Actually I also want to use service mesh every day.
The simple answer is
Service mesh for a kubernetes server is not necessary
Now to answer your questions
If I need to monitor cluster resources, we have prometheus, grafana and k9s. Are they doing the same monitoring role as service mesh (e.g. linkerd, istio)?
K9s is a cli tool that is just a replacement to the kubectl cli tool. It is not a monitor tool. Prometheus and grafana are monitoring tools that will need use the data provided by applications(pods) and builds the time-series data which can be visualized as charts, graphs etc. However the applications have to provide the monitoring data to Prometheus. Service meshes may use a sidecar and provide some default metrics useful for monitoring such as number of requests handled in a second. Your application doesn't need to have any knowledge or implementation of the metrics. Thus service meshes are optional and it offloads the common things such as monitoring or authorization.
if k8s DNS can already achieve load balancing, do we still need service mesh?
Service meshes are not needed for load balancing. When you have multiple services running in the cluster and want to use a single entry point for all your services to simplify maintenance and to save cost, Ingress controllers such as Nginx, Traefik, HAProxy are used. Also, service meshes such as Istio comes with its own ingress controller.
if using k8s without service mesh, is it lag behind the normal practice?
No, there can be clusters that don't have service meshes today and still use Kubernetes.
In the future, Kubernetes may bring some functionalities from service meshes.
Service mesh is not a silver bullet and it doesn't fit into every use case. Service mesh will not do everything for you, it also have bugs and limited features.
You can use Prometheus without Istio and have a very nice app monitoring. Service mesh can simplify some monitoring tasks for you, but it doesn't mean you cannot do it yourself.
Please don't think of DNS as load balancing solution. Kubernetes have Services and Ingresses to do load balancing. Nginx Ingress today is very powerful and have many advanced features.
It heavily depends on your use case.

Can Kubernetes work like a compute farm and route one request per pod

I've dockerized a legacy desktop app. This app does resource-intensive graphical rendering from a command line interface.
I'd like to offer this rendering as a service in a "compute farm", and I wondered if Kubernetes could be used for this purpose.
If so, how in Kubernetes would I ensure that each pod only serves one request at a time (this app is resource-intensive and likely not thread-safe)? Should I write a single-threaded wrapper/invoker app in the container and thus serialize requests? Would K8s then be smart enough to route subsequent requests to idle pods rather than letting them pile up on an overloaded pod?
Interesting question.
The inbuilt default Service object along with kube-proxy does route the requests to different pods, but only does so in a round-robin fashion which does not fit our use case.
Your use-case would require changes to be made to the kube-proxy setup during the cluster setup. This approach is tedious and will require you to have your own cluster setup (not supported by cloud services). As described here.
Best bet would be to setup a service-mesh like Istio which provides the features with little configuration along with a lot of other useful functionalities.
See if this helps.

Triggering a Kubernetes-based application from AppEngine

I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.

Spring boot and prometheus

I am trying to figure out how to best collect metrics from a set of spring boot based services running within a Kubernetes cluster. Looking at the various docs, it seems that the choice for internal monitoring is between Actuator or Spectator with metrics being pushed to an external collection store such as Redis or StatsD or pulled, in the case of Prometheus.
Since the number of instances of a given service is going to vary, I dont see how Prometheus can be configured to poll those running services since it will lack knowledge of them. I am also building around a Eureka service registry so not sure if that is polled first in this configuration.
Any real world insight into this kind of approach would be welcome.
You should use the Prometheus java client (https://www.robustperception.io/instrumenting-java-with-prometheus/) for instrumenting. Approaches like redis and statsd are to be avoided, as they mean hitting the network on every single event - greatly limiting what you can monitor.
Use file_sd service discovery in Prometheus to provide it with a list of targets from Eureka (https://www.robustperception.io/using-json-file-service-discovery-with-prometheus/), though if you're using Kubernetes like your tag hints Prometheus has a direct integration there.