I'm trying to create a Bar Chart that shows the Utilization of a "service" item in Anylogic. I've created a resource pool and linked it to the service beforehand.
The usual "propertyname.statsUtilization.mean()" doesn't work
Can anyone help me ?
If you want to use a bar chart to show utilization of a resourcePool... notice there that a service doesn't have an utilization, but the resources have.
So add data to the bar chart and use on value:
resourcePool.utilization()
Related
I would like to create a Prometheus alert that notifies me via Alertmanager when my Kubernetes volumes are for example 80% or 90% occupied. Currently I am using the following expression:
100 / kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="pvc-name"} * kubelet_volume_stats_available_bytes{persistentvolumeclaim="pvc-name"} > 80
The problem is, however, that I have to create the same alarm again in a slightly modified form for each claim. If in addition the name of a claim changes I have to adjust the rules as well.
Question: Is there a clever way to create an alert that takes into account all available claims? So that there is no need to change the alert if a claim changes.
Kind regards!
Problem:
I have a dashboard in Grafana which monitors the healthiness of my monitoring services: Prometheis, Alertmanagers, Pushgateways and Grafana itself. It shows simple Up/Down status of these services in Singlestat panels.
When one of my Premetheus (I have one in each datacenter) is down, Singlestat panel which is backed with this Prometheus as a datasource is loading 30s, until it shows "Request error".
Even worse, when I want to have only one panel for each Prometheus instance and combine results from all Prometheis that monitor them (Prometheis in my setup monitor each other). For this I use --mixed-- data source, and in this case, when one of used datasources is down Singlestat panel loads forever, and as down datasource is added in all my Singlestat panels for Prometheis, all these panels load forever.
Also when one of Prometheis stops working, I have a very long loading time of some Grafana pages:
Configuration -> Datasources
and
Dashboards -> Home.
But this is not always, sometimes it loads normally.
Investigations:
I investigated Query timeout in Grafana datasource (set it for 1s), but without any effect on this problem.
I have also tried to add datasource variable. It solves the problem only partially and I am not satisfied with it:
I have a combo box with datasources in Dashboard and Singlestat panel for each Prometheus backed with this variable dastasource. Problem is that I have to change through all the Prometheis in a combo box to see the whole picture for Prometheus services.
Similar it is possible to create Singlestat panels for all combinations of datasources and Prometheus instances (in my case 3 x 3 panels) but it is not intuitive and gets worse and worse with each Prometheus servers I will add in the future.
Question:
Is there any way how to handle unreachable datasources, that dashboards will continue to work?
Maybe I have to add some component to my setup, but I think it should be done in Grafana (although it seems it is not possible).
Is there any sample code available to scale up / down pods in kubernetes dynamically through go client.
Maybe check out this sample github project with kube-start-stop custom controller, that can schedule your resources to automatically scale down/up based on time period.
I am trying to implement alerting using grafana and prometheus.
As Grafana does not allow template variables in metrics to be used in alerting, I am currently forced to hardcode the IP's if I want to collect the memory metrics.
But that's not a solution that can long last, as the nodes in my setup can terminate and get recreated as auto-scaling is enabled.
Is there any better alternative than hardcoding each instance IP in the metric and still enable alerting on memory usage of each node?
Any help will be really appreciated.
Yeah, that's why we've given up on using alerts in Grafana and decided to use Alertmanager. For that you'll need to create alert rules and add them to PrometheusRule resource on the cluster and configure alertmanager itself.
if you can figure out how to add your required info into labels, you can reference labels in your alert message using the template like so:
{{$labels.instance}}
Anything that's reported in the instance as a label should be available, however, it's only available if the alert ends in a math expression. It isn't available for alerts that use a classic expression.
I'm using Kubernetes on Google Compute Engine and Stackdriver. The Kubernetes metrics show up in Stackdriver as custom metrics. I successfully set up a dashboard with charts that show a few custom metrics such as "node cpu reservation". I can even set up an aggregate mean of all node cpu reservations to see if my total Kubernetes cluster CPU reservation is getting too high. See screenshot.
My problem is, I can't seem to set up an alert on the mean of a custom metric. I can set up an alert on each node, but that isn't what I want. I can also set up "Group Aggregate Threshold Condition", but custom metrics don't seem to work for that. Notice how "Custom Metric" is not in the dropdown.
Is there a way to set an alert for an aggregate of a custom metric? If not, is there some way I can alert when my Kubernetes cluster is getting too high on CPU reservation?
alerting on an aggregation of custom metrics is currently not available in Stackdriver. We are considering various solutions to the problem you're facing.
Note that sometimes it's possible to alert directly on symptoms of the problem rather than monitoring the underlying resources. For example, if you're concerned about cpu because X happens and users notice, and X is bad - you could consider alerting on symptoms of X instead of alerting on cpu.