Is there any way to get statistics such as service / endpoint access for services defined in Kubernetes cluster?
I've read about Heapster, but it doesn't seem to provide these statistics. Plus, the whole setup is tremendously complicated and relies on a ton of third-party components. I'd really like something much, much simpler than that.
I've been looking into what may be available in kube-system namespace, and there's a bunch of containers and services, there, Heapster including, but they are effectively inaccessible because they require authentication I cannot provide, and kubectl doesn't seem to have any API to access them (or does it?).
Heapster is the agent that collects data, but then you need a monitoring agent to interpret these data. On GCP, for example, that's fluentd who gets these metrics and sends to Stackdriver.
Prometheus is an excellent monitoring tool. I would recommend this one, if youare not on GCP.
If you would be on GCP, then as mentioned above you have Stackdriver Monitoring, that is configured by default for K8s clusters. All you have to do is to create a Stackdriver accound (this is done by one click from GCP Console), and you are good to go.
Related
In some project there are scaling and orchestration implemented using technologies of a local cloud provider, with no Docker & Kubernetes. But the project has poor logging and monitoring, I'd like to instal Prometheus, Loki, and Grafana for metrics, logs, and visualisation respectively. Unfortunately, I've found no articles with instructions about using Prometheus without K8s.
But is it possible? If so, is it a good way? And how to do this? I also know that Prometheus & Loki can automatically detect services in the K8s to extract metrics and logs, but will the same work for a custom orchestration system?
Can't comment about Loki, but Prometheus is definitely doable.
Prometheus supports a number of service discovery mechanisms, k8s being just on of them. If you look at the list of options (the ones ending with _sd_config) you can see if your provider is there.
If it is not then a generic service discovery can be used. Maybe DNS-based discovery will work with your custom system? If not then with some glue code a file based service discovery will almost certainly work.
Yes, I'm running Prometheus, Loki etc. just fine in a AWS ECS cluster. It just requires a bit more configuration especially regarding service discovery (if you are not already using something like ECS Service Disovery or Hashicorp Consul)
I have set up container insights as described in the Documentation
Is there a way to remove some of the metrics sent over to CloudWatch ?
Details :
I have a small cluster ( 3 client facing namespaces, ~ 8 services per namespace ) with some custom monitoring, logging, etc in their own separate namespaces, and I just want to use CloudWatch for critical client facing metrics.
The problem I am having is that the Agent sends over 500 metrics to CloudWatch, where I am really only interested in a few of the important ones, especially as AWS bills per metric.
Is there any way to limit which metrics get sent to CloudWatch?
It would be especially helpful if I could only sent metrics from certain namespaces, for example, exclude the kube-system namespace
My configmap is:
cwagentconfig.json: |
{
"logs": {
"metrics_collected": {
"kubernetes": {
"cluster_name": "*****",
"metrics_collection_interval": 60
}
},
"force_flush_interval": 5
}
}
I have searched for a while now, but clouldn't really find anything on:
"metrics_collected": {
"kubernetes": {
I've looked as best I can and you're right, there's little or nothing to find on this topic. Before I make the obvious-but-unhelpful suggestions of either using Prometheus or asking on the AWS forums, a quick look at what the CloudWatch agent actually does.
The Cloudwatch agent gets container metrics either from from cAdvisor, which runs as part of kubelet on each node, or from the kubernetes metrics-server API (which also gets it's metrics from kubelet and cAdvisor). cAdvisor is well documented, and it's likely that the Cloudwatch agent uses the Prometheus format metrics cAdvisor produces to construct it's own list of metrics.
That's just a guess though unfortunately, since the Cloudwatch agent doesn't seem to be open source. That also means it may be possible to just set a 'measurement' option within the kubernetes section and select metrics based on Prometheus metric names, but probably that's not supported. (if you do ask AWS, the Premium Support team should keep an eye on the forums, so you might get lucky and get an answer without paying for support)
So, if you can't cut down metrics created by Container Insights, what are your other options? Prometheus is easy to deploy, and you can set up recording rules to cut down on the number of metrics it actually saves. It doesn't push to Cloudwatch by default, but you can keep the metrics locally if you have some space on your node for it, or use a remote storage service like MetricFire (the company I work for, to be clear!) which provides Grafana to go along with it. You can also export metrics from Cloudwatch and use Prometheus as your single source of truth, but that means more storage on your cluster.
If you prefer to view your metrics in Cloudwatch, there are tools like Prometheus-to-cloudwatch which actually scrape Prometheus endpoints and send data to Cloudwatch, much like (I'm guessing) the Cloudwatch Agent does. This service actually has include and exclude settings for deciding which metrics are sent to Cloudwatch.
I've written a blog post on EKS Architecture and Monitoring in case that's of any help to you. Good luck, and let us know which option you go for!
we are using k8s cluster for one of our application, cluster is owned by other team and we dont have full control over there… We are trying to find out metrics around resource utilization (CPU and memory), detail about running containers/pods/nodes etc. Need to find out how many parallel containers are running. Problem is they have exposed monitoring of cluster via Prometheus but with Prometheus we are not getting live data, it does not have info about running containers.
My query is , what is that API which is by default available in k8s cluster and can give all what we need. We dont want to read data form another client like Prometheus or anything else, we want to read metrics directly from cluster so that data is not stale. Any suggestions?
As you mentioned you will need metrics-server (or heapster) to get those information.
You can confirm if your metrics server is running kubectl top nodes/pods or just by checking if there is a heapster or metrics-server pod present in kube-system namespace.
Also the provided command would be able to show you the information you are looking for. I wont go into details as here you can find a lot of clues and ways of looking at cluster resource usage. You should probably take a look at cadvisor too which should be already present in the cluster. It exposes a web UI which exports live information about all the containers on the machine.
Other than that there are probably commercial ways of acheiving what you are looking for, for example SignalFx and other similar projects - but this will probably require the cluster administrator involvement.
We are unable to grab logs from our GKE cluster running containers if StackDriver is disabled on GCP. I understand that it is proxying stderr/stdout but it seems rather heavy handed to block these outputs when Stackdriver is disabled.
How does one get an ELF stack going on GKE without being billed for StackDriver aka disabling it entirely? or is it so much a part of GKE that this is not doable?
From the article linked on a similar question regarding GCP:
"Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node." (https://kubernetes.io/docs/concepts/cluster-administration/logging/#exposing-logs-directly-from-the-application)
Perhaps our understanding of Stackdriver billing is wrong?
But we don't want to be billed for Stackdriver as the 150MB of logs outside of the GCP metrics is not going to be enough and we have some expertise in setting up ELF for logging that we'd like to use.
You can disable Stackdriver logging/monitoring on Kubernetes by editing your cluster, and setting "Stackdriver Logging" and "Stackdriver Monitoring" to disable.
I would still suggest sticking to GCP over AWS as you get the whole Kube as a service experience. Amazon's solution is still a little way off, and they are planning charging for the service in addition to the EC2 node prices (Last I heard).
We have some Kubernetes clusters that have been deployed using kops in AWS.
We really like using the upstream/official images.
We have been wondering whether or not there was a good way to monitor the systems without installing software directly on the hosts? Are there docker containers that can extract the information from the host? I think that we are likely concerned with:
Disk space (this seems to be passed through to docker via df
Host CPU utilization
Host memory utilization
Is this host/node level information already available through heapster?
Not really a question about kops, but a question about operating Kubernetes. kops stops at the point of having a functional k8s cluster. You have networking, DNS, and nodes have joined the cluster. From there your world is your oyster.
There are many different options for monitoring with k8s. If you are a small team I usually recommend offloading monitoring and logging to a provider.
If you are a larger team or have more specific needs then you can look at such options as Prometheus and others. Poke around in the https://github.com/kubernetes/charts repository, as I know there is a Prometheus chart there.
As with any deployment of any form of infrastructure you are going to need Logging, Monitoring, and Metrics. Also, do not forget to monitor the monitoring ;)
I am using https://prometheus.io/, it goes naturally with kubernetes.
Kubernetes api already exposes a bunch of metrics in prometheus format,
https://github.com/kubernetes/ingress-nginx also exposes prometheus metrics (enable-vts-status: "true"), and you can also install https://github.com/prometheus/node_exporter as a daemonset to monitor CPU, disk, etc...
I install one prometheus inside the cluster to monitor internal metrics and one outside the cluster to monitor LBs and URLs.
Both send alerts to the same https://github.com/prometheus/alertmanager that MUST be outside the cluster.
It took me about a week to configure everything properly.
It was worth it.