I need to monitoring a lot of legacy containers in my eks cluster that having a nfs mountpath. To map nfs directory in container i using nfs-client helm chart.
I need to monitor when my mountpath for some reason is lost, and the only way that i find to do that is exec a command in container.
#!/bin/bash
df -h | grep ip_of_my_nfs_server | wc -l
if the output above returns 1 i know that my nfs mountpath is ok.
Anybody knows some whay that monitoring an output script exec in container with prometheus?
Thanks!
As Matt has pointed out in the comments: first order of business should be to see if you can simply facilitate your monitoring requirement from node_exporter.
Below is a more generic answer on collecting metrics from arbitrary shell commands.
Prometheus is a pull-based monitoring system. You configure it with "scrape targets": these are effectively just HTTP endpoints that expose metrics in a specific format. Some target needs to be alive for long enough to allow it to be scraped.
The two most obvious options you have are:
Wrap your logic in a long-running process that exposes this metric on an HTTP endpoint, and configure it as a scrape target
Spin up an instance of pushgateway, and configure it as a scrape target , and have your command push its metrics there
Based on the little information you provided, the latter option seems like the most sane one. Important and relevant note from the README:
The Prometheus Pushgateway exists to allow ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus.
Your command would look something like:
#!/bin/bash
printf "mount_path_up %d" $(df -h | grep ip_of_my_nfs_server | wc -l) | curl --data-binary #- http://pushgateway.example.org:9091/metrics/job/some_job_name
Related
We have cadvisor in daemon sets to collect pod data, when I try to fetch the metrics at node level with below mentioned command, I am able to find the data of certain pod but the same is not available in prometheus. Kindly help in getting data of all pods in prometheus.
Thanks in advance.
command used:
curl <node-ip>:<port>/metrics | grep container_cpu_user_seconds_total
I have tried the below command at node level, as data is visible in the terminal I am expecting same data in prometheus.
curl <node-ip>:<port>/metrics | grep container_cpu_user_seconds_total
What would be the best way to set up a GCP monitoring alert policy for a Kubernetes CronJob failing? I haven't been able to find any good examples out there.
Right now, I have an OK solution based on monitoring logs in the Pod with ERROR severity. I've found this to be quite flaky, however. Sometimes a job will fail for some ephemeral reason outside my control (e.g., an external server returning a temporary 500) and on the next retry, the job runs successfully.
What I really need is an alert that is only triggered when a CronJob is in a persistent failed state. That is, Kubernetes has tried rerunning the whole thing, multiple times, and it's still failing. Ideally, it could also handle situations where the Pod wasn't able to come up either (e.g., downloading the image failed).
Any ideas here?
Thanks.
First of all, confirm the GKE’s version that you are running. For that, the following commands are going to help you to identify the GKE’s
default version and the available versions too:
Default version.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.defaultVersion)"
Available versions.
gcloud container get-server-config --flatten="channels" --filter="channels.channel=RAPID" \
--format="yaml(channels.channel,channels.validVersions)"
Now that you know your GKE’s version and based on what you want is an alert that is only triggered when a CronJob is in a persistent failed state, GKE Workload Metrics was the GCP’s solution that used to provide a fully managed and highly configurable solution for sending to Cloud Monitoring all Prometheus-compatible metrics emitted by GKE workloads (such as a CronJob or a Deployment for an application). But, as it is right now deprecated in GKE 1.24 and was replaced with Google Cloud Managed Service for Prometheus, then this last is the best option you’ve got inside of GCP, as it lets you monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.
Plus, you have 2 options from the outside of GCP: Prometheus as well and Ranch’s Prometheus Push Gateway.
Finally and just FYI, it can be done manually by querying for the job and then checking it's start time, and compare that to the current time, this way, with bash:
START_TIME=$(kubectl -n=your-namespace get job your-job-name -o json | jq '.status.startTime')
echo $START_TIME
Or, you are able to get the job’s current status as a JSON blob, as follows:
kubectl -n=your-namespace get job your-job-name -o json | jq '.status'
You can see the following thread for more reference too.
Taking the “Failed” state as the medullary point of your requirement, setting up a bash script with kubectl to send an email if you see a job that is in “Failed” state can be useful. Here I will share some examples with you:
while true; do if `kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Failed")].status}' | grep True`; then mail email#address -s jobfailed; else sleep 1 ; fi; done
For newer K8s:
while true; do kubectl wait --for=condition=failed job/myjob; mail#address -s jobfailed; done
How to force all kubernetes services (proxy, kublet, apiserver..., containers) to write logs to /var/logs?
For example:
/var/logs/apiServer.log
or:
/var/logs/proxy.log
Can I use syslog config to do that? What would be an example of that config?
I have already tried journald configuration forward to syslogs=yes.
Just first what comes to my mind - create sidecar container that will gather all the logs in 1 place.
The Complete Guide to Kubernetes Logging.
That's a pretty wide question that should be divided on few parts. Kubernets stores different types of logs in different places.
Kubernetes Container Logs (out of this question, but simply kubectl logs <podname> + -n for namespace, if its not default + -c for specifying container inside the pod)
Kubernetes Node Logs
Kubernetes Cluster Logs
Kubernetes Node Logs
Depending on your operating system and services, there are various
node-level logs you can collect, such as kernel logs or systemd logs.
On nodes with systemd both the kubelet and container runtime write to
journald. If systemd is not present, they write to .log files in the
/var/log directory.
You can access systemd logs with the journalctl command.
Tutorial: Logging with journald have a huge explanation how can you configure journalctl to gather logs. With agrregation logs tools like ELK and without them. journald log filtering can simplify your life.
There are two ways of centralizing journal entries via syslog:
syslog daemon acts as a journald client (like journalctl or Logstash or Journalbeat)
journald forwards messages to syslog (via socket)
Option 1) is slower – reading from the journal is slower than reading from the socket – but captures all the fields from the journal.
Option 2) is safer (e.g. no issues with journal corruption), but the journal will only forward traditional syslog fields (like severity, hostname, message..)
Talking about ForwardToSyslog=yes in /etc/systemd/journald.conf --> it will write messages, in syslog format, to /run/systemd/journal/syslog. You can pass processing then this file to rsyslog for example. Either you can manually process logs or move them to desired place..
Kubernetes Cluster Logs
By default, system components outside a container write files to journald, while components running in containers write to /var/log directory. However, there is the option to configure the container engine to stream logs to a preferred location.
Kubernetes doesn’t provide a native solution for logging at cluster level. However, there are other approaches available to you:
Use a node-level logging agent that runs on every node
Add a sidecar container for logging within the application pod
Expose logs directly from the application.
P.S. I have NOT tried below approach, but it looks promising - check it and maybe it will help you in your not easiest task.
The easiest way of setting up a node-level logging agent is to
configure a DaemonSet to run the agent on each node
helm install --name st-agent \
--set infraToken=xxxx-xxxx \
--set containerToken=xxxx-xxxx \
--set logsToken=xxxx-xxxx \
--set region=US \
stable/sematext-agent
This setup will, by default, send all cluster and container logs to a
central location for easy management and troubleshooting. With a tiny
bit of added configuration, you can configure it to collect node-level
logs and audit logs as well.
I've setup Prometheus to monitor Kubernetes. However when i watch the Prometheus dashboard I see kubernetes-cadvisor DOWN
I would want to know if we need it to monitor Kubernetes because on Grafana i already get different information as memory usage, disk space ...
Would it be used to monitor containers in order to make precise requests such as the use of memory used by a pod of a specific namespace?
The error you have provided means that the cAdvisor's content does not comply with the Prometheus exposition format.[1] But to be honest, it is one of the possibilities and as you did not provide more information we will have to leave it for now (I mean the information asked by Oliver + versions of Prometheus and Grafana and environment in which you are running the cluster).
Answering your question, although you don't need to use cAdvisor for monitoring, it does provide some important metrics and is pretty well integrated with Kubernetes. So until you need container level metrics, then you should use cAdvisor.
As specified in this article(you can find configuration tutorial there):
you can’t access cAdvisor directly (through 4194). You can (!) access
cAdvisor by duplicating the job_name (called “k8s”) in the
prometheus.yml file, calling the copy “cAdvisor” (perhaps) and
inserting an additional line to define “metrics_path”. Prometheus
assumes exporters are on “/metrics” but, for cAdvisor, our metrics are
on “/metrics/cadvisor”.
I think that could be the reason, but if this does not solve your issue I will try to recreate it in my cluster.
Update:
Judging from your yaml file, you did not configure Prometheus to scrape metrics from the cAdvisor. Add this to your yaml file:
scrape_configs:
- job_name: cadvisor
scrape_interval: 5s
static_configs:
- targets:
- cadvisor:8080
As specified here.
To get the metrics of container we need CADVISOR !!
to setup it i just follow the procedure below
https://github.com/google/cadvisor
i installed it on each of my nodes !
i run on each
sudo docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
i hope this will help you guys ;)
Without using Heapster is there any way to collect like CPU or Disk metrics about a node within a Kubernetes cluster?
How does Heapster even collect those metrics in the first place?
Kubernetes monitoring is detailed in the documentation here, but that mostly covers tools using heapster.
Node-specific information is exposed through the cAdvisor UI which can be accessed on port 4194 (see the commands below to access this through the proxy API).
Heapster queries the kubelet for stats served at <kubelet address>:10255/stats/ (other endpoints can be found in the code here).
Try this:
$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ NODE=$(kubectl get nodes -o=jsonpath="{.items[0].metadata.name}")
$ curl -X "POST" -d '{"containerName":"/","subcontainers":true,"num_stats":1}' localhost:8001/api/v1/proxy/nodes/${NODE}:10255/stats/container
...
Note that these endpoints are not documented as they are intended for internal use (and debugging), and may change in the future (we eventually want to offer a more stable versioned endpoint).
Update:
As of Kubernetes version 1.2, the Kubelet exports a "summary" API that aggregates stats from all Pods:
$ kubectl proxy &
Starting to serve on 127.0.0.1:8001
$ NODE=$(kubectl get nodes -o=jsonpath="{.items[0].metadata.name}")
$ curl localhost:8001/api/v1/proxy/nodes/${NODE}:10255/stats/summary
...
I would recommend using heapster to collect metrics. It's pretty straight forward. However, in order to access those metrics, you need to add "type: NodePort" in hepaster.yml file. I modified the original heapster files and you can found them here. See my readme file how to access metrics. More metrics are available here.
Metrics can be accessed via a web browser by accessing http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/cpu/usage_rate. The Same result can be seen by executing following command.
$ curl -L http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/cpu/usage_rate