Kubernetes network utilization metrics

Kubernetes network utilization metrics - kubernetes

I'm running a Kubernetes cluster GKE on Google Compute Engine GCE.
Through Heapster I am able to get different network metrics such as sent or received bytes, or error rate.
However, in order to better understand my application (Pods) bottlenecks, it would be essential to understand how utilized is the Node's network. Is it possible to query Network Utilization, otherwise what metrics do indicate my network health?

There is not a direct way on your side to monitor the VPC network but there are some tools that might help you to check that it is behaving as expected.
The only documented limit on VPC networks are the egress throughput caps which will depend on the number of cores the nodes have.
You can see the graphs for “Networks Bytes” and “Network Packets” in your Google Cloud Console. They can be retrieve by going to:
Cloud Console -> Instance Groups -> Managed_Intance_Group_Name
or
Cloud Console-> VM Instances -> Node_Name
Network graphs for the pools and the nodes can also be found in the Stackdriver Account
(https://app.google.stackdriver.com) -> Resources -> Container Engine -> Cluster Name
Analyzing those graphs might help you to determine if your traffic is being throttled.
To obtain additional visibility you could also use cAdvisor or other tools mentioned here

Related

How to get network information from metrics server

I know how to get information(limit, request, usage) of cpu and memory from metrics-server.
But I don't know how to get network information(Memory Distribution, Network Traffic(KBps), Network Utilization) from metrics-server.
Essentially, metrics-server don't provide network information?
Is there another way?
I should develop with python.

Metrics server does not export these kind of metrics but Prometheus Node Exporter does. Node exporter run as a systemd service that will periodically (every 1 second) gather all the metrics of your system.
The Prometheus Node
Exporter exposes a
wide variety of hardware- and kernel-related metrics.
You can check all the list of the collectors here. You may also want to check this article about monitoring network with prometheus.

GKE is built by default in Anthos solution ? Getting Anthos Metrics

I have a cluster with 7 nodes and a lot of services, nodes, etc in the Google Cloud Platform. I'm trying to get some metrics with StackDriver Legacy, so in the Google Cloud Console -> StackDriver -> Metrics Explorer I have all the set of anthos metrics listed but when I try to create a chart based on that metrics it doesn't show the data, actually the only response that I get in the panel is no data is available for the selected time frame even changing the time frame and stuffs.
Is right to think that with anthos metrics I can retrieve information about my cronjobs, pods, services like failed initializations, jobs failures ? And if so, I can do it with StackDriver Legacy or I need to Update to StackDriver kubernetes Engine Monitoring ?

Anthos solution, includes what’s called GKE-on prem. I’d take a look at the instructions to use logging and monitoring on GKE-on prem. Stackdriver monitors GKE On-Prem clusters in a similar way as cloud-based GKE clusters.
However, there’s a note where they say that currently, Stackdriver only collects cluster logs and system component metrics. The full Kubernetes Monitoring experience will be available in a future release.
You can also check that you’ve met all the configuration requirements.

Kubernetes sizing consideration

Need some understanding on sizing consideration of k8s cluster master components, in order to handle maximum 1000 pods how many master will work out and do the job specially in case of multi master mode having load balancer in front to route request to api server.
Will 3 master node(etcd, apiserver, controller, scheduler) enough to handle or require more to process the load.

There is no strict solution for this. As per documentation in Kubernetes v. 1.15 you can create your cluster in many ways, but you must follow below rules:
No more than 5000 nodes
No more than 150000 total pods
No more than 300000 total containers
No more than 100 pods per node
You did not provide any information about infrastructure, if you want to deploy it local or in cloud.
One of advantages of cloud is that cloud kubernetes kube-up automatically configures the proper VM size for your master depending on the number of nodes in your cluster.
You cannot forget about provide proper Quota for CPU, Memory etc.
Please check this documentation for more detailed information.

Geting custom metrics from Kubernetes pods

I was looking into Kubernetes Heapster and Metrics-server for getting metrics from the running pods. But the issue is, I need some custom metrics which might vary from pod to pod, and apparently Heapster only provides cpu and memory related metrics. Is there any tool already out there, which would provide me the functionality I want, or do I need to build one from scratch?

What you're looking for is application & infrastructure specific metrics. For this, the TICK stack could be helpful! Specifically Telegraf can be set up to gather detailed infrastructure metrics like Memory- and CPU pressure or even the resources used by individual docker containers, network and IO metrics etc... But it can also scrape Prometheus metrics from pods. These metrics are then shipped to influxdb and visualized using either chronograph or grafana.

Not sure if this is still open.
I would classify metrics into 3 types.
Events or Logs - System and Applications events which are sent to logs. These are non-deterministic.
Metrics - CPU and Memory utilization on the node the app is hosted. This is deterministic and are collected periodically.
APM - Applicaton Performance Monitoring metrics - these are application level metrics like requests received vs failed vs responded etc.
Not all the platforms do everything. ELK for instance does both the Metrics and Log Monitoring and does not do APM. Some of these tools have plugins into collect daemons which collect perfmon metrics of the node.
APM is a completely different area as it requires developer tool to provider metrics as Springboot does Actuator, Nodejs does AppMetrics etc. This carries the request level data. Statsd is an open source library which application can consume to provide APM metrics too Statsd agents installed in the node.
AWS offers CloudWatch agents for log shipping and sink and Xray for distributed tracing which can be used for APM.

How do you monitor kubernetes nodes deployed using kops?

We have some Kubernetes clusters that have been deployed using kops in AWS.
We really like using the upstream/official images.
We have been wondering whether or not there was a good way to monitor the systems without installing software directly on the hosts? Are there docker containers that can extract the information from the host? I think that we are likely concerned with:
Disk space (this seems to be passed through to docker via df
Host CPU utilization
Host memory utilization
Is this host/node level information already available through heapster?

Not really a question about kops, but a question about operating Kubernetes. kops stops at the point of having a functional k8s cluster. You have networking, DNS, and nodes have joined the cluster. From there your world is your oyster.
There are many different options for monitoring with k8s. If you are a small team I usually recommend offloading monitoring and logging to a provider.
If you are a larger team or have more specific needs then you can look at such options as Prometheus and others. Poke around in the https://github.com/kubernetes/charts repository, as I know there is a Prometheus chart there.
As with any deployment of any form of infrastructure you are going to need Logging, Monitoring, and Metrics. Also, do not forget to monitor the monitoring ;)

I am using https://prometheus.io/, it goes naturally with kubernetes.
Kubernetes api already exposes a bunch of metrics in prometheus format,
https://github.com/kubernetes/ingress-nginx also exposes prometheus metrics (enable-vts-status: "true"), and you can also install https://github.com/prometheus/node_exporter as a daemonset to monitor CPU, disk, etc...
I install one prometheus inside the cluster to monitor internal metrics and one outside the cluster to monitor LBs and URLs.
Both send alerts to the same https://github.com/prometheus/alertmanager that MUST be outside the cluster.
It took me about a week to configure everything properly.
It was worth it.